Schedule Execution Lifecycle¶
Last updated: 2026-04-22, JIM v0.10.0
This diagram shows how schedules are triggered, how step groups are queued and advanced, and how the scheduler and worker collaborate to drive multi-step execution to completion.
Three-Service Collaboration¶
JIM uses three services that collaborate on scheduled execution:
| Service | Role | Polling Interval |
|---|---|---|
| JIM.Scheduler | Detects due schedules, creates executions, queues tasks, recovery | 30 seconds |
| JIM.Worker | Executes tasks, drives step advancement on completion | 2 seconds |
| JIM.Web | Manual run requests (creates worker tasks directly) | On-demand |
Scheduler Polling Cycle¶
flowchart TD
Start([Scheduler Polling Cycle]) --> WaitDb[Wait for database to be ready<br/>Retry every 2 seconds]
WaitDb --> PollLoop{Shutdown<br/>requested?}
PollLoop -->|Yes| End([Scheduler Stopped])
PollLoop -->|No| Step1[Step 1: Update cron next-run-times<br/>Parse cron expressions<br/>Set NextRunTime on schedules]
Step1 --> Step2[Step 2: Process due schedules<br/>See Due Schedule Processing below]
Step2 --> Step3[Step 3: Recover stuck executions<br/>Safety net for worker crashes<br/>See Recovery section below]
Step3 --> Step4[Step 4: Recover stale worker tasks<br/>Heartbeat-based crash detection]
Step4 --> Sleep[Sleep 30 seconds]
Sleep --> PollLoop
Due Schedule Processing¶
flowchart TD
GetDue[Get schedules where<br/>NextRunTime <= UtcNow] --> Loop{More due<br/>schedules?}
Loop -->|No| Done([Done])
Loop -->|Yes| CheckOverlap{Active execution<br/>already exists?}
CheckOverlap -->|Yes| SkipLog[Log warning: schedule<br/>already running, skip]
SkipLog --> Loop
CheckOverlap -->|No| StartExec[StartScheduleExecutionAsync]
StartExec --> CreateExec[Create ScheduleExecution<br/>Status = InProgress<br/>CurrentStepIndex = 0]
CreateExec --> UpdateLastRun[Update Schedule.LastRunTime]
UpdateLastRun --> QueueAll[Queue ALL step groups upfront]
QueueAll --> StepLoop{More step<br/>indices?}
StepLoop -->|Yes| IsFirst{First step<br/>index?}
IsFirst -->|Yes| QueueQueued[Queue tasks with<br/>Status = Queued<br/>Ready to run immediately]
IsFirst -->|No| QueueWaiting[Queue tasks with<br/>Status = WaitingForPreviousStep<br/>Visible on queue but blocked]
QueueQueued --> StepLoop
QueueWaiting --> StepLoop
StepLoop -->|No| CalcNext[Calculate and set<br/>next cron run time]
CalcNext --> Loop
Step Group Queuing Detail¶
Steps with the same StepIndex form a parallel group and execute concurrently.
flowchart TD
QueueGroup([Queue Step Group<br/>at StepIndex N]) --> GetSteps[Get all steps at this index<br/>May be 1 sequential or many parallel]
GetSteps --> IsParallel{Multiple steps<br/>at same index?}
IsParallel -->|Yes| LogParallel[Log parallel group<br/>with step count]
IsParallel -->|No| QueueStep
LogParallel --> ForEach{More steps<br/>at index?}
QueueStep --> ForEach
ForEach -->|No| Done([Done])
ForEach -->|Yes| CheckType{Step<br/>type?}
CheckType -->|RunProfile| CreateSyncTask[Create SynchronisationWorkerTask<br/>Set ConnectedSystemId + RunProfileId<br/>Set ExecutionMode: Parallel/Sequential<br/>Set ContinueOnFailure from step<br/>Link to ScheduleExecution]
CheckType -->|PowerShell<br/>Executable<br/>SqlScript| NotImpl[Log warning:<br/>not yet implemented<br/>Skip step]
CreateSyncTask --> CreateActivity[TaskingServer.CreateWorkerTaskAsync<br/>Creates Activity with initiator triad<br/>Associates Activity with WorkerTask]
CreateActivity --> ForEach
NotImpl --> ForEach
Worker-Driven Step Advancement¶
After the worker completes a task, it drives schedule advancement via TryAdvanceScheduleExecutionAsync. This is the primary advancement mechanism (the scheduler has a safety net for the case where the worker crashes between task completion and advancement).
flowchart TD
TaskDone([Worker task completes]) --> DeleteTask[Delete WorkerTask from database<br/>Activity persists as audit record]
DeleteTask --> IsScheduled{Task linked to<br/>ScheduleExecution?}
IsScheduled -->|No| Done([Done])
IsScheduled -->|Yes| CheckRemaining[Count remaining tasks<br/>at this step index]
CheckRemaining --> StillActive{Remaining<br/>tasks > 0?}
StillActive -->|Yes| Wait([Wait for other<br/>parallel tasks to finish])
StillActive -->|No| LastTask[This was the last task<br/>in the step group]
LastTask --> CheckFailures[Query Activities for this step<br/>Check for FailedWithError<br/>CompleteWithError or Cancelled]
CheckFailures --> AnyFailed{Any activities<br/>failed?}
%% --- Happy path ---
AnyFailed -->|No| FindNext[Find next WaitingForPreviousStep<br/>step index]
FindNext --> HasNext{Next step<br/>exists?}
HasNext -->|No| ExecComplete[Execution complete<br/>Status = Completed<br/>CompletedAt = UtcNow]
ExecComplete --> Done
HasNext -->|Yes| Advance[Transition next step group:<br/>WaitingForPreviousStep --> Queued<br/>Update CurrentStepIndex]
Advance --> WorkerPicksUp([Worker picks up<br/>newly queued tasks<br/>on next poll cycle])
%% --- Failure path ---
AnyFailed -->|Yes| LoadSteps[Load Schedule Steps<br/>at this index]
LoadSteps --> CheckContinue{Any step has<br/>ContinueOnFailure<br/>= false?}
CheckContinue -->|No| FindNext
CheckContinue -->|Yes| FailExec[Execution failed<br/>Status = Failed<br/>ErrorMessage = step name + reason]
FailExec --> Cleanup[Delete all remaining<br/>WaitingForPreviousStep tasks]
Cleanup --> Done
Recovery Mechanisms¶
Three safety nets ensure schedules complete even when services crash.
flowchart TD
subgraph "1. Worker Startup Recovery"
WS([Worker starts]) --> RecoverAll[RecoverStaleWorkerTasksAsync<br/>TimeSpan.Zero<br/>ALL Processing tasks are<br/>orthaned at startup]
RecoverAll --> ReQueue1[Re-queue as Queued<br/>Fail associated Activities]
end
subgraph "2. Scheduler: Stuck Execution Recovery"
SE([Every 30 seconds]) --> GetActive[Get InProgress executions]
GetActive --> ForEach{For each<br/>execution}
ForEach --> CheckTasks{Has Queued or<br/>Processing tasks?}
CheckTasks -->|Yes| Normal([Normal operation<br/>Worker is handling it])
CheckTasks -->|No| HasWaiting{Has Waiting<br/>tasks?}
HasWaiting -->|Yes| SafetyNet[Worker likely crashed after<br/>completing a step<br/>Run CheckAndAdvanceExecutionAsync<br/>to advance to next step]
HasWaiting -->|No, zero tasks| Complete[No tasks at all<br/>Mark execution complete]
end
subgraph "3. Scheduler: Stale Task Recovery"
ST([Every 30 seconds]) --> FindStale[Find Processing tasks where<br/>Heartbeat older than<br/>stale threshold]
FindStale --> HasStale{Stale tasks<br/>found?}
HasStale -->|No| Skip([Skip])
HasStale -->|Yes| ReQueue2[Re-queue stale tasks<br/>Fail associated Activities<br/>Worker will pick up<br/>on next poll]
end
Execution State Diagram¶
stateDiagram-v2
[*] --> InProgress: Scheduler creates execution<br/>Queues all step groups
InProgress --> InProgress: Worker completes step<br/>Advances to next step group
InProgress --> Completed: Last step group completes<br/>No more waiting tasks
InProgress --> Failed: Step group has failures<br/>ContinueOnFailure = false
InProgress --> Cancelled: User cancels execution<br/>All tasks deleted
Completed --> [*]
Failed --> [*]
Cancelled --> [*]
Example: Multi-Step Schedule¶
A typical schedule with sequential and parallel steps:
Schedule: "Nightly HR Sync"
| Index | Steps | Execution |
|---|---|---|
| 0 | HR System - Full Import | Sequential |
| 1 | HR System - Full Sync | Sequential |
| 2 | AD - Export, LDAP - Export | Parallel (2 tasks) |
| 3 | AD - Confirming Import, LDAP - Confirming Import | Parallel (2 tasks) |
Timeline:
- Scheduler creates execution, queues ALL 6 tasks
- Index 0: 1 task as Queued
- Index 1: 1 task as WaitingForPreviousStep
- Index 2: 2 tasks as WaitingForPreviousStep
- Index 3: 2 tasks as WaitingForPreviousStep
- Worker picks up index 0 task, executes Full Import
- Worker completes → TryAdvance → transitions index 1 to Queued
- Worker picks up index 1 task, executes Full Sync
- Worker completes → TryAdvance → transitions index 2 (2 tasks) to Queued
- Worker dispatches BOTH index 2 tasks in parallel (AD Export + LDAP Export)
- First export completes → TryAdvance → remaining count > 0, wait
- Second export completes → TryAdvance → transitions index 3 to Queued
- Worker dispatches BOTH index 3 tasks in parallel
- Both confirming imports complete → TryAdvance → no more steps
- Execution marked Completed
Key Design Decisions¶
-
All steps queued upfront
The scheduler creates all worker tasks at execution start, with subsequent steps asWaitingForPreviousStep. This makes the full execution plan visible in the task queue from the beginning. -
Worker drives advancement
Step transitions are driven by the worker (viaTryAdvanceScheduleExecutionAsync) for minimal latency. The scheduler provides a safety net for crash recovery only. -
Activity-based outcome detection
Since worker tasks are deleted upon completion, the system uses Activities (immutable audit records) to determine whether a step succeeded or failed. -
Overlap prevention
The scheduler checks for active executions before starting a new one for the same schedule. This prevents concurrent execution of the same schedule. -
ContinueOnFailure
Each step can be configured to continue or halt on failure. When any step at an index hasContinueOnFailure = falseand its activity failed, the entire execution stops and remaining waiting tasks are cleaned up.