Browse Source

Factored out machine-dependent settings from global settings

Machine settings are now implemented in machine_settings_*.
One of these files is chosen by keyword "machine" global settings.
These settings can be overwritten by explicitly putting the variables
in global settings.
experiments/parallelize-flux-calculator
Sven Karsten 1 year ago
parent
commit
ce398c23c4
  1. 38
      Readme.md
  2. 6
      documentation/jupyterbook/development/new_destinations.md
  3. 2
      documentation/jupyterbook/todos.rst
  4. 10
      documentation/jupyterbook/usage/parallelize_flux_calculator.md
  5. 33
      documentation/jupyterbook/usage/setting_up_global_settings.md
  6. 55
      scripts/run/machine_settings_haumea.py
  7. 55
      scripts/run/machine_settings_hlrn.py
  8. 13
      scripts/run/parse_global_settings.py
  9. 48
      scripts/run/run.py
  10. 31
      scripts/run/run_helpers.py

38
Readme.md

@ -17,6 +17,44 @@ Further information is available at https://sven-karsten.github.io/iow_esm/intro
# Versions
## 1.02.00 (in preparation)
| date | author(s) | link |
|--- |--- |--- |
| 2022-04-27 | SK | XXX |
<details>
### changes
* flux calculator can now run in parallel
* see documentation/jupyterbook/usage/parallelize_flux_calculator.md for details
* factored out machine-dependent settings from global settings
* machine settings are now implemented in machine_settings_*
* one of these files is chosen by keyword "machine" global settings
* these settings can be overwritten by explicitly putting the variables in global settings
### dependencies
* bash, git, (python for GUI)
### known issues
* none
### tested with
* intensively tested on both HLRN machines
* using example setups available under:
(coupled) /scratch/usr/mviowmod/IOW_ESM/setups/
MOM5_Baltic-CCLM_Eurocordex/example_8nm_0.22deg/1.00.00
(uncoupled) /scratch/usr/mviowmod/IOW_ESM/setups/
CCLM_Eurocordex/example_0.22deg/1.00.00
(uncoupled) /scratch/usr/mviowmod/IOW_ESM/setups/
MOM5_Baltic/example_8nm/1.00.00
(uncoupled) /scratch/usr/mviowmod/IOW_ESM/setups/
I2LM_Eurocordex/example_0.22deg/1.00.00
* can be built and run on Haumea but output is not intensively tested
</details>
## 1.01.00 (latest release)
| date | author(s) | link |

6
documentation/jupyterbook/development/new_destinations.md

@ -16,4 +16,8 @@ For the example this must be called `start_build_new-target.sh`.
In general the name has to be `start_build_` followed by the keyword and `.sh`.
On some targets the build is performed using the queuing system on others it can be performed on directly the login node.
Find out which is true for your new target.
The existing `start_build_haumea.sh` is an example for using the queue, whereas `start_build_hlrng.sh` is an example for direct compilation on the login node.
The existing `start_build_haumea.sh` is an example for using the queue, whereas `start_build_hlrng.sh` is an example for direct compilation on the login node.
4. Add a machine settings python module `machine_settings_new_target.py` to the directory `scripts/run`.
Here you have to specify how MPI and the queueing system are used on the new target.
As a template you can use the examples `machine_settings_hlrn.py` (Intel-MPI + SLURM) and `machine_settings_haumea.py` (OpenMPI+SLURM).

2
documentation/jupyterbook/todos.rst

@ -35,6 +35,6 @@ TODOs
.. todo::
./usage/setting_up_global_settings.md:69 TODO: Add further description here...
./usage/setting_up_global_settings.md:92 TODO: Add further description here...

10
documentation/jupyterbook/usage/parallelize_flux_calculator.md

@ -27,15 +27,6 @@ flux_calculator_mode = "on_bottom_model_cores"
```
in your `global_settings.py` in the `input` folder.
Moreover you have define a python function that returns a list of your used node names (strings), e.g. if you are working on of the HLRN machines you can use the environment variable `SLURM_NODELIST` and this functions might look like
``` python
def get_node_list(): import os; nodes=os.environ["SLURM_NODELIST"]; return [nodes[0:3]+node for node in nodes[4:-1].split(",")]
```
It will return for example `["bcn1001", "bcn1003", "bcn1005"]`.
This option does not yield the shortest computation time but saves computational resources since the flux calculator and the ocean model share the same cores.
**Importantly**, if you use Intel MPI for paralleization on the HLRN machines you have to put
``` bash
@ -44,6 +35,7 @@ export PSM2_MULTI_EP=0
into your jobscript template after the MPI module has been loaded.
This enables putting more tasks on the node than available cores, see also https://www.hlrn.de/doc/display/PUB/MPI+Jobs+with+more+than+40+%2896%29+tasks+per+node+failing.
### On extra cores
In order to run the flux calculator processes _on extra cores_ you have to specify

33
documentation/jupyterbook/usage/setting_up_global_settings.md

@ -8,19 +8,42 @@ on your target machine.
It consists of the following sections.
## Modeller's information
The modeller's section contains your personal information.
Some of these information can be found later on in some output files, such that your work can be related to you.
## Specify the machine you are workin on
Here you have to specifiy on which machine you are working.
This section might look like:
``` python
####################################################
# Global settings for the IOW-ESM model run #
####################################################
##################################
# STEP 0: specify the machine #
##################################
machine = "hlrn" # this will ensure that the correct MPI variant is used and the correct queueing system if present
```
This ensures that MPI (Intel-MPI or OpenMPI) and the queueing system (if present) are correctly used.
Currently available machine keywords are
* `hlrn` for the two HLRN clusters in Göttingen and Berlin
* `haumea` for the Uni Rostock's cluster
According to your setting one of the `scripts/run/machine_settings_*.py` is loaded.
If you want to overwrite some of the predefined settings you can set the variable explicitely in the global settings.
## Modeller's information
The modeller's section contains your personal information.
Some of these information can be found later on in some output files, such that your work can be related to you.
This section might look like:
``` python
###################################
# STEP 1: Info about the modeller #
###################################

55
scripts/run/machine_settings_haumea.py

@ -0,0 +1,55 @@
# Haumea uses OpenMPI and SLURM
mpi_run_command = 'mpirun --app mpmd_file' # the shell command used to start the MPI execution described in a configuration file "mpmd_file"
# it may contain the following wildcards that will be replaced later:
# _NODES_ total number of nodes
# _CORES_ total number of cores to place threads on
# _THREADS_ total number of mpi threads
# _CORESPERNODE_ number of cores per node to use
# Examples: Intel MPI: 'mpirun -configfile mpmd_file'
# OpenMPI : 'mpirun --app mpmd_file'
mpi_n_flag = '-np' # the mpirun flag for specifying the number of tasks.
# Examples: Intel MPI: '-n'
# OpenMPI : '-np'
bash_get_rank = 'my_id=${OMPI_COMM_WORLD_RANK}' # a bash expression that saves the MPI rank of this thread in the variable "my_id"
# Examples: Intel MPI : 'my_id=${PMI_RANK}'
# OpenMPI+Slurm: 'my_id=${OMPI_COMM_WORLD_RANK}'
python_get_rank = 'my_id = int(os.environ["OMPI_COMM_WORLD_RANK"])' # a python expression that saves the MPI rank of this thread in the variable "my_id"
# Examples: Intel MPI : 'my_id = int(os.environ["PMI_RANK"])'
# OpenMPI+Slurm: 'my_id = int(os.environ["OMPI_COMM_WORLD_RANK"])'
use_mpi_machinefile = "--oversubscribe -hostfile machine_file"
def machinefile_line(node, ntasks):
return str(node)+' slots='+str(ntasks)
def get_node_list():
# get SLURM's node list, can be in format bcn1001 or bcn[1009,1011,1013] or bcn[1009-1011,1013]
import os
nodes = os.environ["SLURM_NODELIST"]
# just a single node -> there is no "[" in the srting
if "[" not in nodes:
return [nodes]
# get machine name -> just "node"
machine = nodes[0:4]
# get list of comma separated values (cut out machine name and last "]")
nodes = [node for node in nodes[len(machine)+1:-1].split(",")]
#list of individual nodes
node_list = []
# go through list of comma separated values
for node in nodes:
# if theres is no minus this ellemtn is a indivual node
if "-" not in node:
node_list.append(machine+node)
continue
# range of nodes
min_max = node.split("-")
for node in range(int(min_max[0]),int(min_max[1])+1):
node_list.append(machine+str(node))
return node_list

55
scripts/run/machine_settings_hlrn.py

@ -0,0 +1,55 @@
# HLRN uses Intel MPI and SLURM
mpi_run_command = 'mpirun -configfile mpmd_file' # the shell command used to start the MPI execution described in a configuration file "mpmd_file"
# it may contain the following wildcards that will be replaced later:
# _NODES_ total number of nodes
# _CORES_ total number of cores to place threads on
# _THREADS_ total number of mpi threads
# _CORESPERNODE_ number of cores per node to use
# Examples: Intel MPI: 'mpirun -configfile mpmd_file'
# OpenMPI : 'mpirun --app mpmd_file'
mpi_n_flag = '-n' # the mpirun flag for specifying the number of tasks.
# Examples: Intel MPI: '-n'
# OpenMPI : '-np'
bash_get_rank = 'my_id=${PMI_RANK}' # a bash expression that saves the MPI rank of this thread in the variable "my_id"
# Examples: Intel MPI : 'my_id=${PMI_RANK}'
# OpenMPI+Slurm: 'my_id=${OMPI_COMM_WORLD_RANK}'
python_get_rank = 'my_id = int(os.environ["PMI_RANK"])' # a python expression that saves the MPI rank of this thread in the variable "my_id"
# Examples: Intel MPI : 'my_id = int(os.environ["PMI_RANK"])'
# OpenMPI+Slurm: 'my_id = int(os.environ["OMPI_COMM_WORLD_RANK"])'
use_mpi_machinefile = "-machine machine_file"
def machinefile_line(node, ntasks):
return str(node)+':'+str(ntasks)
def get_node_list():
# get SLURM's node list, can be in format bcn1001 or bcn[1009,1011,1013] or bcn[1009-1011,1013]
import os
nodes = os.environ["SLURM_NODELIST"]
# just a single node -> there is no "[" in the srting
if "[" not in nodes:
return [nodes]
# get machine name -> can be "bcn" or "gcn"
machine = nodes[0:3]
# get list of comma separated values (cut out machine name and last "]")
nodes = [node for node in nodes[len(machine)+1:-1].split(",")]
#list of individual nodes
node_list = []
# go through list of comma separated values
for node in nodes:
# if theres is no minus this ellemtn is a indivual node
if "-" not in node:
node_list.append(machine+node)
continue
# range of nodes
min_max = node.split("-")
for node in range(int(min_max[0]),int(min_max[1])+1):
node_list.append(machine+str(node))
return node_list

13
scripts/run/parse_global_settings.py

@ -22,6 +22,19 @@ class GlobalSettings:
# create a local dictionary with content of the global_settings file
ldict = {}
exec(open(root_dir + "/" + global_settings).read(), globals(), ldict)
# check if machine is specified
try:
self.machine = ldict["machine"]
except:
self.machine = None
# if machine is specified get variables and store them as embers with the same name
if self.machine is not None:
machine_ldict = {}
exec(open(root_dir + "/scripts/run/machine_settings_" + self.machine + ".py").read(), globals(), machine_ldict)
for variable in machine_ldict.keys():
setattr(self, variable, machine_ldict[variable])
# map dictionary entries to class members with the smae name
for variable in ldict.keys():

48
scripts/run/run.py

@ -22,6 +22,8 @@ from parse_global_settings import GlobalSettings
from model_handling_flux import FluxCalculatorModes
from model_handling import ModelTypes
import run_helpers
##################################
# STEP 0: Get the root directory #
##################################
@ -202,6 +204,7 @@ for run in range(global_settings.runs_per_job):
shellscript.writelines('fi\n')
shellscript.writelines('cd '+global_settings.local_workdir_base+'/'+model+'\n')
shellscript.writelines(global_settings.bash_get_rank+'\n') # e.g. "my_id=${PMI_RANK}"
#shellscript.writelines('module load vtune; exec vtune -collect hotspots -result-dir='+work_directory_root+'/'+model+' ./' + model_executable[i] + ' > logfile_${my_id}.txt 2>&1')
shellscript.writelines('exec ./' + model_executable[i] + ' > logfile_${my_id}.txt 2>&1')
shellscript.close()
st = os.stat(file_name) # get current permissions
@ -212,44 +215,14 @@ for run in range(global_settings.runs_per_job):
# STEP 2f: DO THE WORK #
########################################################################
if global_settings.flux_calculator_mode == FluxCalculatorModes.on_bottom_cores:
# get a list of node names that are currently used
node_list = global_settings.get_node_list()
# find out which threads belong to which models
threads_of_model = {}
for i, model in enumerate(parallelization_layout["this_model"]):
try:
threads_of_model[model].append(i)
except:
threads_of_model[model] = [i]
# write a machine file
with open("machine_file", "w") as file:
# find out which model has how many threads on which node
for model in threads_of_model.keys():
threads_on_node = {}
for thread in threads_of_model[model]:
# get the node of this model thread from the parallelization layout
node = node_list[parallelization_layout["this_node"][thread]]
# add this thread to that node
try:
threads_on_node[node] += 1
except:
threads_on_node[node] = 1
# write how many threads are used for this model on the corresponding nodes
for node in threads_on_node.keys():
# TODO specify how to write a line of a machine(host) file in the global_settings.py
# TODO this is only working for Intel MPI at the moment
file.write(str(node)+':'+str(threads_on_node[node])+'\n')
run_helpers.write_machinefile(global_settings, parallelization_layout)
# WRITE mpirun APPLICATION FILE FOR THE MPMD JOB (specify how many tasks of which model are started)
file_name = 'mpmd_file'
if os.path.islink(file_name):
os.system("cp --remove-destination `realpath " + file_name + "` " + file_name)
mpmd_file = open(file_name, 'w')
if global_settings.flux_calculator_mode == FluxCalculatorModes.on_bottom_cores:
# TODO this is only working for Intel MPI at the moment
mpmd_file.writelines("-machine machine_file\n")
mpmd_file = open(file_name, 'w')
for i,model in enumerate(models):
mpmd_file.writelines(global_settings.mpi_n_flag+' '+str(model_threads[i])+' ./run_'+model+'.sh\n')
mpmd_file.close()
@ -258,6 +231,8 @@ for run in range(global_settings.runs_per_job):
full_mpi_run_command = global_settings.mpi_run_command.replace('_CORES_',str(parallelization_layout['total_cores']))
full_mpi_run_command = full_mpi_run_command.replace('_NODES_',str(parallelization_layout['total_nodes']))
full_mpi_run_command = full_mpi_run_command.replace('_CORESPERNODE_',str(global_settings.cores_per_node))
if global_settings.flux_calculator_mode == FluxCalculatorModes.on_bottom_cores:
full_mpi_run_command += ' '+global_settings.use_mpi_machinefile
print(' starting model task with command: '+full_mpi_run_command, flush=True)
os.system(full_mpi_run_command)
print(' ... model task finished.', flush=True)
@ -289,16 +264,15 @@ for run in range(global_settings.runs_per_job):
st = os.stat(file_name) # get current permissions
os.chmod(file_name, st.st_mode | 0o777) # add a+rwx permission
mpmd_file = open('mpmd_file', 'w')
if global_settings.flux_calculator_mode == FluxCalculatorModes.on_bottom_cores:
# TODO this is only working for Intel MPI at the moment
mpmd_file.writelines("-machine machine_file\n")
mpmd_file = open('mpmd_file', 'w')
mpmd_file.writelines(global_settings.mpi_n_flag+' '+str(parallelization_layout['total_threads'])+' ./run_after1.sh\n')
mpmd_file.close()
full_mpi_run_command = global_settings.mpi_run_command.replace('_CORES_',str(parallelization_layout['total_cores']))
full_mpi_run_command = full_mpi_run_command.replace('_NODES_',str(parallelization_layout['total_nodes']))
full_mpi_run_command = full_mpi_run_command.replace('_CORESPERNODE_',str(global_settings.cores_per_node))
if global_settings.flux_calculator_mode == FluxCalculatorModes.on_bottom_cores:
full_mpi_run_command += ' '+global_settings.use_mpi_machinefile
print(' starting after1 task ...', flush=True)
os.system(full_mpi_run_command)
print(' ... after1 task finished.', flush=True)

31
scripts/run/run_helpers.py

@ -0,0 +1,31 @@
def write_machinefile(global_settings, parallelization_layout):
# get a list of node names that are currently used
node_list = global_settings.get_node_list()
# find out which threads belong to which models
threads_of_model = {}
for i, model in enumerate(parallelization_layout["this_model"]):
try:
threads_of_model[model].append(i)
except:
threads_of_model[model] = [i]
# write a machine file
file_name = "machine_file"
with open(file_name, "w") as file:
# find out which model has how many threads on which node
for model in threads_of_model.keys():
threads_on_node = {}
for thread in threads_of_model[model]:
# get the node of this model thread from the parallelization layout
node = node_list[parallelization_layout["this_node"][thread]]
# add this thread to that node
try:
threads_on_node[node] += 1
except:
threads_on_node[node] = 1
# write how many threads are used for this model on the corresponding nodes
for node in threads_on_node.keys():
file.write(global_settings.machinefile_line(node, threads_on_node[node])+'\n')
Loading…
Cancel
Save