Slurm
SlurmSchema
¶
Bases: BaseModel
, YamlIO
, Render
Context manager for Slurm job submission scripts.
This class provides a structured way to define and manage the configuration for submitting jobs to a Slurm workload manager. Each attribute corresponds to a specific Slurm configuration parameter or job setup step.
account = None
¶
Charge resources used by this job to specified account. The account is an arbitrary string.
Set this to your project's account name to properly attribute resource usage, or
leave it as None
if not needed.
Example
"research_project_123"
cluster = 'smp'
¶
Cluster name where the job will run.
Ensure this matches the available cluster names in your Slurm environment. This helps direct the job to the appropriate set of resources.
Example
"hpc_cluster"
commands_post = []
¶
List of commands to run after the main job command.
Use this for cleanup tasks or additional processing. These commands will be executed after the main job tasks are completed.
Example
["cp /scratch/my_job/output.dat .", "rm -rf /scratch/my_job"]
commands_pre = []
¶
List of commands to run before the main job command.
Useful for setup tasks like copying files, creating directories, or loading additional software. These commands will be executed before the main job starts.
Example
["mkdir -p /scratch/my_job", "cp input.dat /scratch/my_job/"]
commands_run = []
¶
List of main commands to run for the job.
This should include the primary executable or script for the job. These are the main tasks that the job will perform.
Example
["python my_script.py", "./run_simulation.sh"]
constraint = None
¶
Nodes can have features assigned to them by the Slurm administrator. Users can specify which of these features are required by their job using the constraint option.
cores_per_socket = None
¶
Restrict node selection to nodes with at least the specified number of cores per socket.
cpus_per_gpu = None
¶
Request that ncpus processors be allocated per allocated GPU. Steps inheriting
this value will imply --exact
. Not compatible with the
[--cpus-per-task
][schemas.workflow.slurm.SlurmSchema.cpus_per_task] option.
cpus_per_task = None
¶
Advise the Slurm controller that ensuing job steps will require ncpus
number
of processors per task. Without this option, the controller will just try to
allocate one processor per task.
env_vars = {}
¶
Dictionary of environment variables to set before running the job.
Use this to configure the job's environment, setting any necessary environment variables.
Example
{"OMP_NUM_THREADS": "16", "MY_VARIABLE": "value"}
error = 'slurm-%j.err'
¶
Path for the job's error output file.
Similar to output_path
, using %j
in the filename ensures that errors for each
job are logged separately.
Example
"logs/job_errors_%j.err"
gpus = None
¶
Specify the total number of GPUs required for the job.
gpus_per_node = None
¶
Specify the number of GPUs required for the job on each node included in the job's resource allocation.
gpus_per_socket = None
¶
Specify the number of GPUs required for the job on each socket included in the job's resource allocation.
gpus_per_task = None
¶
Specify the number of GPUs required for the job on each task to be spawned in the job's resource allocation.
gres = None
¶
Specifies a comma-delimited list of generic consumable resources. The format
for each entry in the list is "name[[:type]:count]"
. The name is the type
of consumable resource (e.g. gpu). The type is an optional classification for
the resource (e.g. a100). The count is the number of those resources with a
default value of 1.
job_name = 'job'
¶
Specify a name for the job allocation. The specified name will appear along with the job id number when querying running jobs on the system. The default is the name of the batch script, or just "sbatch" if the script is read on sbatch's standard input.
Example
"data_analysis_job"
mem = None
¶
Specify the real memory required per node. Default units are megabytes. Different units can be specified using the suffix [K|M|G|T].
mem_per_cpu = None
¶
Minimum memory required per usable allocated CPU. Default units are megabytes.
The default value is DefMemPerCPU
and the maximum value is MaxMemPerCPU
.
modules = []
¶
List of modules to load before running the job.
Include all necessary software modules that your job requires. This ensures the environment is correctly set up before execution.
Example
["python/3.8", "gcc/9.2"]
nodes = 1
¶
The minimum number of nodes to use for the Slurm job.
Adjust this based on the job's resource requirements. For instance, a large parallel job might need several nodes, while a smaller job might only need one.
Example
4
ntasks = None
¶
sbatch does not launch tasks, it requests an allocation of resources and submits a batch script. This option advises the Slurm controller that job steps run within the allocation will launch a maximum of number tasks and to provide for sufficient resources.
Example
16
ntasks_per_node = None
¶
Request that ntasks
be invoked on each node.
This typically corresponds to the number of CPU cores to use on each node. Adjust this based on the node's capabilities and the parallelism of your job.
Example
16
output = 'slurm-%j.out'
¶
Path for the job's standard output file.
Use %j
to include the job ID in the filename, ensuring that each job's output is
saved to a unique file.
Example
"logs/job_output_%j.out"
partition = 'smp'
¶
Partition name to submit the job to.
Choose an appropriate partition based on resource needs and availability. Partitions can have different resource limits and policies.
Example
"short"
time = '1-00:00:00'
¶
Maximum time for the job.
Specified in the format D-HH:MM:SS
. Adjust this based on the expected runtime of
your job.
Example
"0-12:00:00"
for a 12-hour job.