cluster
An element in [[cluster]] is a table that defines the configuration of a single
cluster.
For example:
[[cluster]]
name = "cluster1"
identify.by_environment = ["CLUSTER_NAME", "cluster1"]
scheduler = "slurm"
[[cluster.partition]]
name = "shared"
maximum_cpus_per_job = 127
maximum_gpus_per_job = 0
[[cluster.partition]]
name = "gpu-shared"
minimum_gpus_per_job = 1
[[cluster.partition]]
name = "compute"
require_cpus_multiple_of = 128
maximum_gpus_per_job = 0
[[cluster.partition]]
name = "debug"
maximum_gpus_per_job = 0
prevent_auto_select = true
name
cluster.name: string - The name of the cluster.
identify
cluster.identify: table - Set a condition to identify when row is executing
on this cluster. The table must have one of the following keys:
by_environment: array of two strings - Identify the cluster when the environment variableby_environment[0]is set and equal toby_environment[1].always: bool - Set totrueto always identify this cluster. Whenfalse, this cluster may only be chosen by an explicit--clusteroption.
caution
The first cluster in the list that sets identify.always = true will prevent
any later cluster from being identified (except by explicit --cluster=name).
scheduler
cluster.scheduler: string - Set the job scheduler to use on this cluster. Must
be one of:
"slurm""bash"
slurm_gpus_per_task
cluster.slurm_gpus_per_task: string - Set the sbatch command line option that
selects the number of gpus per task (used only by the slurm scheduler). When omitted,
slurm_gpus_per_task defaults to --gpus-per-task=.
submit_options
cluster.submit_options: array of strings - Scheduler submission options that
are passed to every job on this cluster.
partition
cluster.partition: array of tables - Define the scheduler partitions that
row may select from when submitting jobs. Row will check the partitions in the
order provided and choose the first partition where the job matches all the
provided conditions. All conditions are optional.
name
cluster.partition.name: string - The name of the partition as it should be passed
to the cluster batch submission command.
maximum_cpus_per_job
cluster.partition.maximum_cpus_per_job: integer - The maximum number of CPUs that
can be used by a single job on this partition:
total_cpus <= maximum_cpus_per_job
require_cpus_multiple_of
cluster.partition.require_cpus_multiple_of: integer - All jobs submitted to this
partition must use an integer multiple of the given number of cpus:
total_cpus % require_cpus_multiple_of == 0
warn_cpus_not_multiple_of
cluster.partition.warn_cpus_not_multiple_of: integer - All jobs submitted to this
partition should use an integer multiple of the given number of cpus:
if total_cpus % warn_cpus_not_multiple_of != 0:
warn! ...
This is a nonblocking variant of require_cpus_multiple_of that allows for submission
of jobs that underutilize resources.
memory_per_cpu_mb
cluster.partition.memory_per_cpu: integer - CPU Jobs submitted to this partition
will pass this option to the scheduler. For example SLURM schedulers will set
--mem-per-cpu=<memory_per_cpu_mb>M.
cpus_per_node
cluster.partition.cpus_per_node: string - Number of CPUs per node.
When cpus_per_node is not set, row will request n_processes tasks. In this case,
some schedulers are free to spread tasks among any number of nodes (for example, shared
partitions on Slurm schedulers).
When cpus_per_node is set, row will also request the minimal number of nodes
needed to satisfy n_nodes * cpus_per_node >= total_cpus. This may result in longer
queue times, but will lead to more stable performance for users.
tip
Set cpus_per_node only when all nodes in the partition have the same number
of CPUs.
minimum_gpus_per_job
cluster.partition.minimum_gpus_per_job: integer - The minimum number of gpus that
must be used by a single job on this partition:
total_gpus >= minimum_gpus_per_job
maximum_gpus_per_job
cluster.partition.maximum_gpus_per_job: integer - The maximum number of gpus that
can be used by a single job on this partition:
total_gpus <= maximum_gpus_per_job
require_gpus_multiple_of
cluster.partition.require_gpus_multiple_of: integer - All jobs submitted to this
partition must use an integer multiple of the given number of gpus:
total_gpus % require_gpus_multiple_of == 0
warn_gpus_not_multiple_of
cluster.partition.warn_gpus_not_multiple_of: integer - All jobs submitted to this
partition should use an integer multiple of the given number of gpus:
if total_gpus % warn_gpus_not_multiple_of != 0:
warn! ...
This is a nonblocking variant of require_gpus_multiple_of that allows for submission
of jobs that underutilize resources.
memory_per_gpu_mb
cluster.partition.memory_per_gpu_mb: integer - GPU Jobs submitted to this partition
will pass this option to the scheduler. For example SLURM schedulers will set
--mem-per-gpu=<memory_per_gpu_mb>M.
gpus_per_node
cluster.partition.gpus_per_node: string - Number of GPUs per node. Like
cpus_per_node but used when jobs request GPUs.
prevent_auto_select
cluster.partition.prevent_auto_select: boolean - Set to true to prevent row from
automatically selecting this partition.
account_suffix
cluster.partition.account_suffix: string - An account suffix when submitting jobs
to this partition. Useful when clusters define separate account-cpu and account-gpu
accounts.
Development of row is led by the Glotzer Group at the University of Michigan.
Copyright © 2024-2025 The Regents of the University of Michigan.