Environments¶

ScriptHut resolves the environment for each task by walking an ordered chain of env rules from five layers — backend → server → workflow (config) → workflow (document) → task — against a seed of SCRIPTHUT_* runtime variables. Every layer contributes rules to the same list; later rules see earlier rules' effects, so conditionals can branch on any value set upstream (including the seed).

The `EnvRule` shape¶

Every entry in any env: list is an EnvRule:

env:
  - set:                              # always-applied
      PROJECT: training
      LOG_LEVEL: info

  - if:                               # optional guard
      SCRIPTHUT_BACKEND: mercury      # single value = equality
    set:
      SCRATCH: /scratch/${USER}       # ${name} expanded against env-so-far
    init: "module load gcc/12 cuda/11"

  - if:
      SCRIPTHUT_BACKEND: [anvil, delta]   # list value = OR
    set:
      SCRATCH: /tmp/work/${USER}
    init: "module load gcc cuda-toolkit"

  - if:
      SCRIPTHUT_BACKEND: mercury
      GPU: "1"                        # multiple keys = AND
    append:
      PATH: /opt/cuda/bin             # joined with ":" to existing value

Field	Type	Description
`if`	object	Optional guard. Each key must match the env-so-far. A list value matches if the actual value is in the list (OR). When all keys match, the rule applies.
`set`	object	Variables to write. Overwrites any prior value. `${name}` is expanded against env-so-far.
`append`	object	Variables to extend. Joined to the existing value with `:` (creates the value if absent). `${name}` is expanded.
`init`	string	Bash text appended (newline-joined) into the `extra_init` block. Runs before `set:`/`append:` exports in the generated script, so user-set vars override anything `module load` / `source` placed into the env. Followed by `cd <working_dir>` and the task command. `${name}` is expanded.
`include`	list of strings	Names of `env_groups` to inline at this position. See Reusable groups.

Where rules live¶

env: is a list of rules accepted at five locations, evaluated in this order:

Layer	Defined in	Typical use
Backend	`env:` on each backend entry in `scripthut.yaml`	Cluster-specific facts — scratch paths, module-system bootstrap
Server	top-level `env:` in `scripthut.yaml`	Org-wide defaults across all clusters
Workflow (config)	`env:` on each workflow entry in `scripthut.yaml`	Workflow-specific overrides for ops-controlled config
Workflow (document)	top-level `env:` inside the JSON the generator emits	Project-controlled config — lives in your repo, ships with your code. See Task JSON → Environment Variables
Task	`env:` on each task in the JSON generator output	Per-task adjustments

Rules from each layer are concatenated in that order, then evaluated top-to-bottom. There is no separate "merge" step — set overwrites and append extends, so layer X overrides layer X-1 simply by being written later.

`SCRIPTHUT_*` runtime seed¶

Before any user rule runs, ScriptHut seeds the env with run-context variables. Any rule's if: can branch on these:

Variable	Always present	Description
`SCRIPTHUT_BACKEND`	yes	Name of the backend this task runs on.
`SCRIPTHUT_WORKFLOW`	yes	Name of the workflow.
`SCRIPTHUT_RUN_ID`	yes	Unique 8-char run identifier.
`SCRIPTHUT_CREATED_AT`	yes	ISO 8601 timestamp of run creation.
`SCRIPTHUT_GIT_REPO`	git workflows only	Repository URL.
`SCRIPTHUT_GIT_BRANCH`	git workflows only	Branch name.
`SCRIPTHUT_GIT_SHA`	git workflows only	Resolved commit hash.

These keys are protected: any rule attempting to set: or append: to a key starting with SCRIPTHUT_ is rejected with a warning and ignored.

`${name}` expansion¶

String values in set:, append:, and init: may reference other variables with ${name}. Expansion happens against the env as resolved so far, so it sees the seed plus everything earlier rules have written. Unknown names expand to an empty string and log a warning. Only the ${name} form is interpolated — bare $name is left alone, so shell expressions written into init: are passed through unchanged.

Reusable groups¶

Define a rule list once with env_groups: and inline it from any env: rule using include::

env_groups:
  monitoring:
    - set:
        OTEL_EXPORTER: otlp
        OTEL_ENDPOINT: "http://collector:4318"

backends:
  - name: mercury
    env_groups:
      gpu-stack:                      # mercury's flavor of "gpu-stack"
        - init: "module load gcc/12 cuda/11"
        - append: { PATH: /opt/cuda/bin }
    env:
      - include: [gpu-stack, monitoring]

env_groups: is accepted on each backend, on the server (top-level), on each workflow in scripthut.yaml, and on the workflow JSON document itself (top-level alongside tasks: — see Task JSON → Environment Variables). Scoping: a group defined at layer X is visible to that layer's own env: and to all later layers. Names defined at a later layer shadow earlier ones, so each backend can declare its own gpu-stack group and any workflow's include: [gpu-stack] resolves to the right one based on SCRIPTHUT_BACKEND.

Includes can be guarded — an if: clause on the include-rule is inherited by every inlined rule (ANDed with any guard the inlined rule already carries):

env:
  - if:
      SCRIPTHUT_BACKEND: mercury
    include: [gpu-stack]              # only fires on mercury

Cycles between groups are detected and raise a clear error. Unknown group names log a warning and are skipped.

Inspecting the resolved env¶

The task detail page in the UI has an Env tab that shows the resolved env for the selected task, with per-key provenance — every set / append operation lists its source layer (and which group, if any) and contribution. The same information is available as JSON at:

GET /runs/{run_id}/tasks/{task_id}/env

Use this when a value isn't what you expect: the provenance pinpoints which layer wrote it.

Worked example: cluster-specific module loads¶

The canonical case — "Mercury needs module load gcc/12 cuda/11, Anvil needs module load gcc cuda-toolkit, the laptop needs nothing":

backends:
  - name: mercury
    type: slurm
    ssh: { host: mercury.example.edu, user: alice }
    env:
      - set: { SCRATCH: /scratch/${USER} }
      - init: "source /etc/profile.d/modules.sh"

  - name: anvil
    type: pbs
    ssh: { host: anvil.example.edu, user: alice }
    env:
      - set: { SCRATCH: /tmp/work/${USER} }

workflows:
  - name: train
    backend: mercury                  # this workflow targets mercury
    command: "python generate_tasks.py"
    env:
      - if: { SCRIPTHUT_BACKEND: mercury }
        init: "module load gcc/12 cuda/11"
      - if: { SCRIPTHUT_BACKEND: anvil }
        init: "module load gcc cuda-toolkit"

The same workflow definition, ported to a different backend (e.g. by adding a second train-anvil workflow that points to anvil), gets the right module loads automatically — no per-backend duplication of the workflow.