Instruction File Persistence Refactor
Document status | DRAFT |
---|---|
Document owner | @Joshua Lamos-Sweeney + @Luke Cook |
Designer | @Luke Cook |
Tech lead | @Joshua Lamos-Sweeney |
Technical writers | TBD |
QA | TBD |
Objective
Development of an instruction-free process layer for Digest and Extract layer of GDM wherein there is no file-system direct access for instruction file ‘passing’. By doing this, we can reduce the reliance on the file system, as well as develop better process handling (e.g. scheduling) systems without a requirement for a file-system-based cron solution.
This will also allow for a more direct calling structure, reducing wait-times on short jobs, and allow for more precise knowledge of the state of each job. (See sub-enhancements for future tasks in that space).
Concept
The Process layer (for now we’ll look at the Digest, as it and the Extract are functionally similar), currently passes the ‘job’ to be completed as a file, placed after all the ancillary files and folders are created and in place. This ‘job file' moves through the system as a crude status mechanism. Now that we have more complex job status tracking, and less reliance on manually kicking off jobs internally, this system gives very little benefit to the user, while complicating the file system structure, and being in itself a cause of confusion.
This also hurts any chances of dynamic instances, which would rely on the same filesystem, and makes it difficult to do monitoring and inter-process communication.
The proposed system will allow a ‘user’ program, such as the LoaderUI, to make a call to the digest instance on a specific receiving port, which will accept a serialized instruction (currently our instruction file format, serialized), and directly kick off a job this way.
Implementation Decisions
Currently, the DigestListener receives a job object through the reception port and places the job into a four thread queue, limiting the running jobs to four concurrent jobs, FIFO. This limits the chances of system overutilization while remaining simplistic, and is isolated in DigestListener so a more comprehensive scheduler can be easily ‘slotted in’. As is this is more optimal than the ‘slow ramping’ cron job ‘pseudo-scheduler’, which can allow for many jobs should all the jobs be ‘long’.
Success metrics
Goal | Metric |
---|---|
Integrates into existing GDM UIs | Does everything still work? |
Does not fail under high load scenarios | Does it actually limit system load? |
Assumptions