Skip to content

Schedule Execution Lifecycle

Last updated: 2026-04-22, JIM v0.10.0

This diagram shows how schedules are triggered, how step groups are queued and advanced, and how the scheduler and worker collaborate to drive multi-step execution to completion.

Three-Service Collaboration

JIM uses three services that collaborate on scheduled execution:

Service Role Polling Interval
JIM.Scheduler Detects due schedules, creates executions, queues tasks, recovery 30 seconds
JIM.Worker Executes tasks, drives step advancement on completion 2 seconds
JIM.Web Manual run requests (creates worker tasks directly) On-demand

Scheduler Polling Cycle

flowchart TD
    Start([Scheduler Polling Cycle]) --> WaitDb[Wait for database to be ready<br/>Retry every 2 seconds]
    WaitDb --> PollLoop{Shutdown<br/>requested?}

    PollLoop -->|Yes| End([Scheduler Stopped])
    PollLoop -->|No| Step1[Step 1: Update cron next-run-times<br/>Parse cron expressions<br/>Set NextRunTime on schedules]

    Step1 --> Step2[Step 2: Process due schedules<br/>See Due Schedule Processing below]
    Step2 --> Step3[Step 3: Recover stuck executions<br/>Safety net for worker crashes<br/>See Recovery section below]
    Step3 --> Step4[Step 4: Recover stale worker tasks<br/>Heartbeat-based crash detection]
    Step4 --> Sleep[Sleep 30 seconds]
    Sleep --> PollLoop

Due Schedule Processing

flowchart TD
    GetDue[Get schedules where<br/>NextRunTime <= UtcNow] --> Loop{More due<br/>schedules?}
    Loop -->|No| Done([Done])
    Loop -->|Yes| CheckOverlap{Active execution<br/>already exists?}

    CheckOverlap -->|Yes| SkipLog[Log warning: schedule<br/>already running, skip]
    SkipLog --> Loop

    CheckOverlap -->|No| StartExec[StartScheduleExecutionAsync]
    StartExec --> CreateExec[Create ScheduleExecution<br/>Status = InProgress<br/>CurrentStepIndex = 0]
    CreateExec --> UpdateLastRun[Update Schedule.LastRunTime]

    UpdateLastRun --> QueueAll[Queue ALL step groups upfront]
    QueueAll --> StepLoop{More step<br/>indices?}

    StepLoop -->|Yes| IsFirst{First step<br/>index?}
    IsFirst -->|Yes| QueueQueued[Queue tasks with<br/>Status = Queued<br/>Ready to run immediately]
    IsFirst -->|No| QueueWaiting[Queue tasks with<br/>Status = WaitingForPreviousStep<br/>Visible on queue but blocked]
    QueueQueued --> StepLoop
    QueueWaiting --> StepLoop

    StepLoop -->|No| CalcNext[Calculate and set<br/>next cron run time]
    CalcNext --> Loop

Step Group Queuing Detail

Steps with the same StepIndex form a parallel group and execute concurrently.

flowchart TD
    QueueGroup([Queue Step Group<br/>at StepIndex N]) --> GetSteps[Get all steps at this index<br/>May be 1 sequential or many parallel]
    GetSteps --> IsParallel{Multiple steps<br/>at same index?}
    IsParallel -->|Yes| LogParallel[Log parallel group<br/>with step count]
    IsParallel -->|No| QueueStep

    LogParallel --> ForEach{More steps<br/>at index?}
    QueueStep --> ForEach

    ForEach -->|No| Done([Done])
    ForEach -->|Yes| CheckType{Step<br/>type?}

    CheckType -->|RunProfile| CreateSyncTask[Create SynchronisationWorkerTask<br/>Set ConnectedSystemId + RunProfileId<br/>Set ExecutionMode: Parallel/Sequential<br/>Set ContinueOnFailure from step<br/>Link to ScheduleExecution]
    CheckType -->|PowerShell<br/>Executable<br/>SqlScript| NotImpl[Log warning:<br/>not yet implemented<br/>Skip step]

    CreateSyncTask --> CreateActivity[TaskingServer.CreateWorkerTaskAsync<br/>Creates Activity with initiator triad<br/>Associates Activity with WorkerTask]
    CreateActivity --> ForEach
    NotImpl --> ForEach

Worker-Driven Step Advancement

After the worker completes a task, it drives schedule advancement via TryAdvanceScheduleExecutionAsync. This is the primary advancement mechanism (the scheduler has a safety net for the case where the worker crashes between task completion and advancement).

flowchart TD
    TaskDone([Worker task completes]) --> DeleteTask[Delete WorkerTask from database<br/>Activity persists as audit record]
    DeleteTask --> IsScheduled{Task linked to<br/>ScheduleExecution?}
    IsScheduled -->|No| Done([Done])
    IsScheduled -->|Yes| CheckRemaining[Count remaining tasks<br/>at this step index]

    CheckRemaining --> StillActive{Remaining<br/>tasks > 0?}
    StillActive -->|Yes| Wait([Wait for other<br/>parallel tasks to finish])

    StillActive -->|No| LastTask[This was the last task<br/>in the step group]
    LastTask --> CheckFailures[Query Activities for this step<br/>Check for FailedWithError<br/>CompleteWithError or Cancelled]

    CheckFailures --> AnyFailed{Any activities<br/>failed?}

    %% --- Happy path ---
    AnyFailed -->|No| FindNext[Find next WaitingForPreviousStep<br/>step index]
    FindNext --> HasNext{Next step<br/>exists?}
    HasNext -->|No| ExecComplete[Execution complete<br/>Status = Completed<br/>CompletedAt = UtcNow]
    ExecComplete --> Done

    HasNext -->|Yes| Advance[Transition next step group:<br/>WaitingForPreviousStep --> Queued<br/>Update CurrentStepIndex]
    Advance --> WorkerPicksUp([Worker picks up<br/>newly queued tasks<br/>on next poll cycle])

    %% --- Failure path ---
    AnyFailed -->|Yes| LoadSteps[Load Schedule Steps<br/>at this index]
    LoadSteps --> CheckContinue{Any step has<br/>ContinueOnFailure<br/>= false?}
    CheckContinue -->|No| FindNext
    CheckContinue -->|Yes| FailExec[Execution failed<br/>Status = Failed<br/>ErrorMessage = step name + reason]
    FailExec --> Cleanup[Delete all remaining<br/>WaitingForPreviousStep tasks]
    Cleanup --> Done

Recovery Mechanisms

Three safety nets ensure schedules complete even when services crash.

flowchart TD
    subgraph "1. Worker Startup Recovery"
        WS([Worker starts]) --> RecoverAll[RecoverStaleWorkerTasksAsync<br/>TimeSpan.Zero<br/>ALL Processing tasks are<br/>orthaned at startup]
        RecoverAll --> ReQueue1[Re-queue as Queued<br/>Fail associated Activities]
    end

    subgraph "2. Scheduler: Stuck Execution Recovery"
        SE([Every 30 seconds]) --> GetActive[Get InProgress executions]
        GetActive --> ForEach{For each<br/>execution}
        ForEach --> CheckTasks{Has Queued or<br/>Processing tasks?}
        CheckTasks -->|Yes| Normal([Normal operation<br/>Worker is handling it])
        CheckTasks -->|No| HasWaiting{Has Waiting<br/>tasks?}
        HasWaiting -->|Yes| SafetyNet[Worker likely crashed after<br/>completing a step<br/>Run CheckAndAdvanceExecutionAsync<br/>to advance to next step]
        HasWaiting -->|No, zero tasks| Complete[No tasks at all<br/>Mark execution complete]
    end

    subgraph "3. Scheduler: Stale Task Recovery"
        ST([Every 30 seconds]) --> FindStale[Find Processing tasks where<br/>Heartbeat older than<br/>stale threshold]
        FindStale --> HasStale{Stale tasks<br/>found?}
        HasStale -->|No| Skip([Skip])
        HasStale -->|Yes| ReQueue2[Re-queue stale tasks<br/>Fail associated Activities<br/>Worker will pick up<br/>on next poll]
    end

Execution State Diagram

stateDiagram-v2
    [*] --> InProgress: Scheduler creates execution<br/>Queues all step groups

    InProgress --> InProgress: Worker completes step<br/>Advances to next step group

    InProgress --> Completed: Last step group completes<br/>No more waiting tasks

    InProgress --> Failed: Step group has failures<br/>ContinueOnFailure = false

    InProgress --> Cancelled: User cancels execution<br/>All tasks deleted

    Completed --> [*]
    Failed --> [*]
    Cancelled --> [*]

Example: Multi-Step Schedule

A typical schedule with sequential and parallel steps:

Schedule: "Nightly HR Sync"

Index Steps Execution
0 HR System - Full Import Sequential
1 HR System - Full Sync Sequential
2 AD - Export, LDAP - Export Parallel (2 tasks)
3 AD - Confirming Import, LDAP - Confirming Import Parallel (2 tasks)

Timeline:

  1. Scheduler creates execution, queues ALL 6 tasks
  2. Index 0: 1 task as Queued
  3. Index 1: 1 task as WaitingForPreviousStep
  4. Index 2: 2 tasks as WaitingForPreviousStep
  5. Index 3: 2 tasks as WaitingForPreviousStep
  6. Worker picks up index 0 task, executes Full Import
  7. Worker completes → TryAdvance → transitions index 1 to Queued
  8. Worker picks up index 1 task, executes Full Sync
  9. Worker completes → TryAdvance → transitions index 2 (2 tasks) to Queued
  10. Worker dispatches BOTH index 2 tasks in parallel (AD Export + LDAP Export)
  11. First export completes → TryAdvance → remaining count > 0, wait
  12. Second export completes → TryAdvance → transitions index 3 to Queued
  13. Worker dispatches BOTH index 3 tasks in parallel
  14. Both confirming imports complete → TryAdvance → no more steps
  15. Execution marked Completed

Key Design Decisions

  • All steps queued upfront
    The scheduler creates all worker tasks at execution start, with subsequent steps as WaitingForPreviousStep. This makes the full execution plan visible in the task queue from the beginning.

  • Worker drives advancement
    Step transitions are driven by the worker (via TryAdvanceScheduleExecutionAsync) for minimal latency. The scheduler provides a safety net for crash recovery only.

  • Activity-based outcome detection
    Since worker tasks are deleted upon completion, the system uses Activities (immutable audit records) to determine whether a step succeeded or failed.

  • Overlap prevention
    The scheduler checks for active executions before starting a new one for the same schedule. This prevents concurrent execution of the same schedule.

  • ContinueOnFailure
    Each step can be configured to continue or halt on failure. When any step at an index has ContinueOnFailure = false and its activity failed, the entire execution stops and remaining waiting tasks are cleaned up.