Skip to content

Workflows and Sources

These two sections of scripthut.yaml configure task generators — mechanisms for telling ScriptHut "here's a way to get a list of tasks to run".

  • Workflows — a fixed SSH command (optionally inside a cloned git repo) that prints task JSON
  • Sources — a git repo or backend filesystem path containing one or more workflow JSON files, discovered via glob

The legacy projects: section was removed in scripthut 0.6.0. Convert any project entry to an equivalent sources: entry (type path for a directory on a backend, type git for a remote repo).

See Task JSON Format for the JSON shape every generator must emit.


Workflows

Workflows are the primary mechanism for submitting batch jobs. A workflow defines an SSH command that runs on a backend and returns a JSON list of tasks.

Basic Workflow

workflows:
  - name: ml-training
    backend: hpc-cluster
    command: "python /shared/scripts/get_training_tasks.py"
    max_concurrent: 5
    description: "ML model training pipeline"
Field Type Default Description
name string required Unique identifier for this workflow. Shown in the UI.
backend string required Name of a backend defined in the backends section.
command string required Shell command executed via SSH that must print JSON to stdout.
max_concurrent integer null Max concurrent tasks per run. If null, only the backend-level limit applies.
description string "" Human-readable description shown in the UI.
git object null Optional git repository to clone on the backend before running the command.
env list [] Workflow-level env rules applied to every task in the workflow. See Environments.
env_groups object {} Named, reusable env-rule lists local to this workflow (also visible to its tasks).

Git Workflows

Git workflows clone a repository on the remote backend before executing the command. The command runs inside the cloned directory. This is useful when your task generator script lives in a repository.

workflows:
  - name: ml-training-git
    backend: hpc-cluster
    git:
      repo: git@github.com:your-org/ml-pipelines.git
      branch: main
      deploy_key: ~/.ssh/ml-deploy-key
      clone_dir: ~/scripthut-repos
      postclone: "rm -rf large_files"
    command: "python get_tasks.py"
    max_concurrent: 5
    description: "ML training from git repo"

Git Config Fields:

Field Type Default Description
repo string required Git repository URL. SSH format recommended.
branch string "main" Branch to clone.
deploy_key path null Path to deploy key on the local machine. It is uploaded to the backend temporarily during the clone operation.
clone_dir string "~/scripthut-repos" Parent directory on the backend. The repo is cloned into <clone_dir>/<commit_hash>/.
postclone string null Shell command to run in the clone directory after cloning (e.g., to remove large files or install dependencies).

When using a git workflow:

  • The command runs with the clone directory as its working directory.
  • Task working_dir values using ~ or relative paths are resolved relative to the clone directory.
  • Git metadata is injected as environment variables into every task (see Environments → SCRIPTHUT_* runtime seed).

Sources

Sources are git repositories or backend filesystem paths containing workflow definitions. ScriptHut discovers workflow JSON files using the workflows_glob pattern (default: .hut/workflows/*.json). You can use glob wildcards like **/*.hut.json to match files recursively across any subdirectory. Each matched JSON file appears as a triggerable workflow on the Sources page.

For git sources, the repository is cloned locally for workflow discovery, and also cloned on the backend when a workflow is triggered (tasks run inside the cloned directory, just like git-based workflows).

For path sources, workflows are discovered via SSH on the backend, and tasks run with working_dir resolved relative to the source path.

Git Source

sources:
  - name: ml-jobs
    type: git
    url: git@github.com:your-org/ml-pipelines.git
    branch: main
    deploy_key: ~/.ssh/ml-jobs-deploy-key
    backend: hpc-cluster
    # workflows_glob: "**/*.hut.json"  # default: .hut/workflows/*.json
    # clone_dir: ~/scripthut-repos     # default
    # postclone: "rm -rf large_files"  # optional
Field Type Default Description
name string required Unique identifier for this source.
type string required Must be "git".
url string required Git repository URL. SSH format recommended.
branch string "main" Branch to track.
deploy_key path null Path to deploy key for this repository.
backend string required Backend to submit discovered workflow tasks to.
workflows_glob string ".hut/workflows/*.json" Glob pattern to find workflow JSON files (supports ** for recursive matching).
clone_dir string "~/scripthut-repos" Parent directory on the backend. The repo is cloned into <clone_dir>/<commit_hash>/.
postclone string null Shell command to run in the clone directory after cloning.

Path Source

sources:
  - name: shared-workflows
    type: path
    path: /shared/project-workflows
    backend: hpc-cluster
    # workflows_glob: "**/*.hut.json"  # default: .hut/workflows/*.json
Field Type Default Description
name string required Unique identifier for this source.
type string required Must be "path".
path string required Directory on the backend filesystem.
backend string required Backend where this path exists and where tasks are submitted.
workflows_glob string ".hut/workflows/*.json" Glob pattern to find workflow JSON files (supports ** for recursive matching).