17. Running Horace in Parallel
17.1. Controlling MPI
Certain operations in Horace have parallelised variants which can be used to speed up large operations.
Warning
Due to the nature of parallelism, some parallel operations require extra
memory which may be up to twice the size of the sqw
objects you’re
operating on.
Warning
Be aware that for small jobs or some combinations of parameters, parallel calculation may, in fact, be slower than serial execution due to startup times and message sending. In future we hope to bring these times down and efficiencies up.
17.1.1. Enabling Locally
To run a particular function in parallel there is a function called
parallel_call
in Horace which runs a single task in parallel. Currently,
parallel_call
is available for:
fit
cut
func_eval
sqw_eval
The syntax for using parallel_call
is as follows:
result = parallel_call(@function, {argument, arguments})
% Equivalent to
result = function(argument, arguments)
% E.g.
proj = line_proj([1, 0, 0], [0, 1, 0]);
w_out = parallel_call(@cut, {w_in, proj, [], [1 0.2 4], [0 1], [10 20]});
In order to use parallel_call
you must ensure MPI has been configured
correctly.
17.1.2. Enabling Globally
Parallel operations can be enabled globally using:
hpc('on')
Alternatively different parallel components can be enabled/disabled separately
through hpc_config
(see: changing Horace settings)
hpc_config with properties:
build_sqw_in_parallel: 0
parallel_workers_number: 2
combine_sqw_using: 'matlab'
combine_sqw_options: {'matlab' 'mex_code' 'mpi_code'}
mex_combine_thread_mode: 0
mex_combine_buffer_size: 131072
parallel_multifit: 0
parallel_cluster: 'herbert'
parallel_configuration: [1x1 parallel_config]
hpc_options: {1x6 cell}
class_name: 'hpc_config'
saveable: 1
returns_defaults: 0
config_folder: '/home/jacob/.matlab/mprogs_config'
In particular, the parallel enabling options are:
parallel_multifit
: Enable parallel fitting formultifit
andtobyfit
build_sqw_in_parallel
: Enable building sqw objects in parallel, i.e.gen_sqw
,combine_sqw
The parallel_config
contains most of the information to manage parallelism,
though some is stored in hpc_config
(described above):
parallel_config with properties:
worker: 'worker_v4'
is_compiled: 0
parallel_cluster: 'herbert'
cluster_config: 'local'
known_clusters: {'herbert' 'parpool' 'mpiexec_mpi' 'slurm_mpi' 'dummy'}
known_clust_configs: {'local'}
shared_folder_on_local: ''
shared_folder_on_remote: ''
working_directory: '/tmp/'
wkdir_is_default: 1
external_mpiexec: ''
class_name: 'parallel_config'
saveable: 1
returns_defaults: 0
config_folder: '/home/jacob/.matlab/mprogs_config'
17.2. MPI Schemes
Horace can be run in parallel with a number of different schemes, all controlled
through the parallel_config
.
The five currently implemented parallel schemes are:
1. herbert
(Poor-man’s MPI) - Data messages are sent through files written
to the hard drive and read by each process. This is the slowest MPI scheme, but
also the one with the fewest requirements.
2. parpool
(Matlab Parallel Toolbox MPI) - Parpool uses Matlab’s parallel
toolbox parallelism to send messages and therefore requires the parallel toolbox
to be used.
3. mpiexec_mpi
(C++ MPI) - Data messages are sent using C++ wrapping
OpenMPI. This requires the MEX files to be built in order to be used.
4. slurm_mpi
(Slurm MPI) - Data messages are sent using C++ wrapping
OpenMPI, but are submitted to a running Slurm instance by Horace upon starting
the job. This requires the MEX files to be built in order to be used.
5. dummy
(Dummy MPI) - Dummy MPI is not MPI, but simply a dummy system for
debugging and testing MPI algorithms on one process in serial.
17.3. Managing parallel jobs
Running jobs in parallel is as simple as selecting the appropriate MPI scheme,
setting an appropriate parallel_workers_number
and enabling the appropriate
flags through the hpc_config
and parallel_config
.
17.4. Slurm Jobs
When running on Slurm-managed clusters, it is possible to automatically submit
jobs to the Slurm queue to be run in parallel across the cluster. This will
attempt to request the number of nodes required to run the selected number of
parallel workers and associated threads, however, if you are using a cluster
which requires non-standard options such as billing accounts and or non-default
queues specifying, it is possible to issue extra commands through the
slurm_commands
variable accessible via the parallel_config
object. This
is a containers.Map
object, and will only store the latest set commands.
new_commands = containers.Map({'-A' '-p'}, {'account' 'partition'});
pc = parallel_config();
pc.slurm_commands = []; % Delete existing Slurm commands
pc.slurm_commands = new_commands; % Set new map
pc.slurm_commands = '-A account -p=partition' % Set as char
pc.slurm_commands = {'-A' 'account' '-p' 'partition'} % Set as cellstr of commands (must be in pairs)
pc.slurm_commands = {{'-A' 'account'} {'-p' 'partition'}} % Set as cell array of pairs of commands
pc.update_slurm_commands('-A account -p=partition', false) % Using update_slurm_commands setting append to false
pc.update_slurm_commands(new_commands) % Using update_slurm_commands omitting append
Note
Setting slurm_commands
by any of the above methods will remove all
existing slurm_commands
and set the new ones.
pc.slurm_commands('-A') = 'account'; pc.slurm_commands('-p') = 'partition' % Set through Map interface
pc.update_slurm_commands('-A account -p=partition', true); % Set through update_slurm_commands
pc.update_slurm_commands(containers.Map({'-A' '-p'}, {'account', 'partition'}), true)
Note
Setting slurm_commands
by any of the above methods will simply overwrite
any existing slurm_commands
.
It is possible to set the slurm_commands
variable by loading the appropriate
commands from a file if that is what your cluster team provides. This is done by
using the following command:
pc = parallel_config();
pc = pc.load_slurm_commands_from_file(<filename>, <append>);
Where filename
is the path of the file to load the commands from, and
append
specifies whether the commands are meant to be added to the existing
commands or replace them entirely.