Page Comparison

...

Integration with main codebase.

Process Flow

User targets the ‘createInstruction’ end point in the web services, the same as it is done now
The serialized instruction is redirected to a digester service
(Optional) a record of the instruction is placed in the database
The digester service resides on the compute node, and spins up digesters in a thread pool
The digester has callbacks built into it, updating the calling machine with changes in its status

→ Using the createInstruction end point means no changes to the loaderUI will be needed

→ The instruction in the database is for retrospection purposes

→ This means if an instruction fails, the instruction can be inspected

→ Database schema can be a very simple file store. That is the whole table will be

‘key → instructionJson’

Callback API (Prototoype)

The Digester expect a target to an api for calling the webservice on change to status.

Here’s the API:

Code Block
POST .../job Body: {Job JSON} Creates a Job described the the given body JSON PATCH .../job/{job_id} Body: {Job Prototype JSON} Updates the job with the given prototype

This way, whenever the pertinent information for a job changes, the relevant data can be updated in the host service

Digester Service

The digester services acts as a mediator between the submitted jobs, and the digesting instances. It lives on the compute node, and accepts Digest instructions. It self regulates scheduling, with an abstracted regulator, allowing for configuration between systems on strategies.

When the digester service starts up, it will request all uncompleted jobs from the web service. It will execute those that are deemed ‘unsettled'. Being unsettled is intentionally vaguely stated as there are many situations where a digest job may be unfinished, but may or may not want to be executed again. The base of this requirement is all jobs with the ‘submitted’ tag should be executed. This allows the digester to hold onto instructions, without executing them and without the worry on critical failure that these jobs will never be executed.

It should be noted that, although this is design specific to the digester, there is not much in the way of implementing the same system for the extractor. In fact, this will start to make a lot of sense as we see we want to balance the entire compute node’s activities.

API

The Core of the digester API:

Code Block
POST: /job Body: {job: Job to be executed, config: {Digester Configuration}}

Further additions to the api can be discussed, but this is all that is needed to make it functional.

...

Versions Compared

Old Version 2

New Version 3

Key

Process Flow

Callback API (Prototoype)

Digester Service

API