Slurm

`SlurmSchema` ¶

Bases: BaseModel, YamlIO, Render

Context manager for Slurm job submission scripts.

This class provides a structured way to define and manage the configuration for submitting jobs to a Slurm workload manager. Each attribute corresponds to a specific Slurm configuration parameter or job setup step.

`account = None` ¶

Charge resources used by this job to specified account. The account is an arbitrary string.

Set this to your project's account name to properly attribute resource usage, or leave it as None if not needed.

More information

Example

"research_project_123"

`cluster = 'smp'` ¶

Cluster name where the job will run.

Ensure this matches the available cluster names in your Slurm environment. This helps direct the job to the appropriate set of resources.

Example

"hpc_cluster"

`commands_post = []` ¶

List of commands to run after the main job command.

Use this for cleanup tasks or additional processing. These commands will be executed after the main job tasks are completed.

Example

["cp /scratch/my_job/output.dat .", "rm -rf /scratch/my_job"]

`commands_pre = []` ¶

List of commands to run before the main job command.

Useful for setup tasks like copying files, creating directories, or loading additional software. These commands will be executed before the main job starts.

Example

["mkdir -p /scratch/my_job", "cp input.dat /scratch/my_job/"]

`commands_run = []` ¶

List of main commands to run for the job.

This should include the primary executable or script for the job. These are the main tasks that the job will perform.

Example

["python my_script.py", "./run_simulation.sh"]

`constraint = None` ¶

Nodes can have features assigned to them by the Slurm administrator. Users can specify which of these features are required by their job using the constraint option.

More information

`cores_per_socket = None` ¶

Restrict node selection to nodes with at least the specified number of cores per socket.

More information

`cpus_per_gpu = None` ¶

Request that ncpus processors be allocated per allocated GPU. Steps inheriting this value will imply --exact. Not compatible with the [--cpus-per-task][schemas.workflow.slurm.SlurmSchema.cpus_per_task] option.

More information

`cpus_per_task = None` ¶

Advise the Slurm controller that ensuing job steps will require ncpus number of processors per task. Without this option, the controller will just try to allocate one processor per task.

More information

`env_vars = {}` ¶

Dictionary of environment variables to set before running the job.

Use this to configure the job's environment, setting any necessary environment variables.

Example

{"OMP_NUM_THREADS": "16", "MY_VARIABLE": "value"}

`error = 'slurm-%j.err'` ¶

Path for the job's error output file.

Similar to output_path, using %j in the filename ensures that errors for each job are logged separately.

Example

"logs/job_errors_%j.err"

`gpus = None` ¶

Specify the total number of GPUs required for the job.

More information

`gpus_per_node = None` ¶

Specify the number of GPUs required for the job on each node included in the job's resource allocation.

More information

`gpus_per_socket = None` ¶

Specify the number of GPUs required for the job on each socket included in the job's resource allocation.

More information

`gpus_per_task = None` ¶

Specify the number of GPUs required for the job on each task to be spawned in the job's resource allocation.

More information

`gres = None` ¶

Specifies a comma-delimited list of generic consumable resources. The format for each entry in the list is "name[[:type]:count]". The name is the type of consumable resource (e.g. gpu). The type is an optional classification for the resource (e.g. a100). The count is the number of those resources with a default value of 1.

More information

`job_name = 'job'` ¶

Specify a name for the job allocation. The specified name will appear along with the job id number when querying running jobs on the system. The default is the name of the batch script, or just "sbatch" if the script is read on sbatch's standard input.

More information

Example

"data_analysis_job"

`mem = None` ¶

Specify the real memory required per node. Default units are megabytes. Different units can be specified using the suffix [K|M|G|T].

More information

`mem_per_cpu = None` ¶

Minimum memory required per usable allocated CPU. Default units are megabytes. The default value is DefMemPerCPU and the maximum value is MaxMemPerCPU.

More information

`modules = []` ¶

List of modules to load before running the job.

Include all necessary software modules that your job requires. This ensures the environment is correctly set up before execution.

Example

["python/3.8", "gcc/9.2"]

`nodes = 1` ¶

The minimum number of nodes to use for the Slurm job.

Adjust this based on the job's resource requirements. For instance, a large parallel job might need several nodes, while a smaller job might only need one.

More information

Example

4

`ntasks = None` ¶

sbatch does not launch tasks, it requests an allocation of resources and submits a batch script. This option advises the Slurm controller that job steps run within the allocation will launch a maximum of number tasks and to provide for sufficient resources.

More information

Example

16

`ntasks_per_node = None` ¶

Request that ntasks be invoked on each node.

This typically corresponds to the number of CPU cores to use on each node. Adjust this based on the node's capabilities and the parallelism of your job.

More information

Example

16

`output = 'slurm-%j.out'` ¶

Path for the job's standard output file.

Use %j to include the job ID in the filename, ensuring that each job's output is saved to a unique file.

Example

"logs/job_output_%j.out"

`partition = 'smp'` ¶

Partition name to submit the job to.

Choose an appropriate partition based on resource needs and availability. Partitions can have different resource limits and policies.

More information

Example

"short"

`time = '1-00:00:00'` ¶

Maximum time for the job.

Specified in the format D-HH:MM:SS. Adjust this based on the expected runtime of your job.

Example

"0-12:00:00" for a 12-hour job.

Slurm

SlurmSchema ¶

account = None ¶

cluster = 'smp' ¶

commands_post = [] ¶

commands_pre = [] ¶

commands_run = [] ¶

constraint = None ¶

cores_per_socket = None ¶

cpus_per_gpu = None ¶

cpus_per_task = None ¶

env_vars = {} ¶

error = 'slurm-%j.err' ¶

gpus = None ¶

gpus_per_node = None ¶

gpus_per_socket = None ¶

gpus_per_task = None ¶

gres = None ¶

job_name = 'job' ¶

mem = None ¶

mem_per_cpu = None ¶

modules = [] ¶

nodes = 1 ¶

ntasks = None ¶

ntasks_per_node = None ¶

output = 'slurm-%j.out' ¶

partition = 'smp' ¶

time = '1-00:00:00' ¶

render(with_newlines=False) ¶

`SlurmSchema` ¶

`account = None` ¶

`cluster = 'smp'` ¶

`commands_post = []` ¶

`commands_pre = []` ¶

`commands_run = []` ¶

`constraint = None` ¶

`cores_per_socket = None` ¶

`cpus_per_gpu = None` ¶

`cpus_per_task = None` ¶

`env_vars = {}` ¶

`error = 'slurm-%j.err'` ¶

`gpus = None` ¶

`gpus_per_node = None` ¶

`gpus_per_socket = None` ¶

`gpus_per_task = None` ¶

`gres = None` ¶

`job_name = 'job'` ¶

`mem = None` ¶

`mem_per_cpu = None` ¶

`modules = []` ¶

`nodes = 1` ¶

`ntasks = None` ¶

`ntasks_per_node = None` ¶

`output = 'slurm-%j.out'` ¶

`partition = 'smp'` ¶

`time = '1-00:00:00'` ¶

`render(with_newlines=False)` ¶