Supercomputer architecture
Configuration of Irene
The compute nodes are gathered in partitions according to their hardware characteristics (CPU architecture, amount of RAM, presence of GPU, etc). A partition is a set of identical nodes that can be targeted to host one or several jobs. Choosing the right partition for a job depends on code prerequisites in term of hardware resources. For example, executing a code designed to be GPU accelerated requires a partition with GPU nodes.
The Irene supercomputer offers three different kind of nodes: regular compute nodes, large memory nodes, and GPU nodes.
- Skylake nodes for regular computation
 Partition name: skylake
CPU : 2x24-cores Intel Skylake@2.7GHz (AVX512)
Cores/Node: 48
Nodes: 1 653
Total cores: 79 344
RAM/Node: 180GB
RAM/Core: 3.75GB
- AMD Rome nodes for regular computation
 Partition name : Rome
CPUs: 2x64 AMD Rome@2.6Ghz (AVX2)
Core/Node: 128
Nodes: 2286
Total core: 292 608
RAM/Node: 228GB
RAM/core : 1.8GB
- Hybrid nodes for GPU computing and graphical usage
 Partition name: hybrid
CPUs: 2x24-cores Intel Skylake@2.7GHz (AVX2)
GPUs: 1x Nvidia Pascal P100
Cores/Node: 48
Nodes: 20
Total cores: 960
RAM/Node: 180GB
RAM/Core: 3.75GB
I/O: 1 HDD 250 GB + 1 SSD 800 GB/NVMe
- Fat nodes with a lot of shared memory for computation lasting a reasonable amount of time and using no more than one node
 Partition name: xlarge
CPUs: 4x28-cores Intel Skylake@2.1GHz
GPUs: 1x Nvidia Pascal P100
Cores/Node: 112
Nodes: 5
Total cores: 560
RAM/Node: 3TB
RAM/Core: 27GB
IO: 2 HDD de 1 TB + 1 SSD 1600 GB/NVMe
- V100 nodes for GPU computing and AI
 Partition name: V100
CPUs: 2x20-cores Intel Cascadelake@2.1GHz (AVX512)
GPUs: 4x Nvidia Tesla V100
Cores/Node: 40
Nodes: 32
Total cores: 1280 (+ 128 GPU)
RAM/Node: 175 GB
RAM/Core: 4.4 GB
- V100l nodes for GPU computing and AI
 Partition name: V100
CPUs: 2x18-cores Intel Cascadelake@2.6GHz (AVX512)
GPUs: 1x Nvidia Tesla V100
Cores/Node: 36
Nodes: 30
Total cores: 1080 (+ 30 GPU)
RAM/Node: 355 GB
RAM/Core: 9.9 GB
- V100xl nodes for GPU computing and AI
 Partition name: V100
CPUs: 4x18-cores Intel Cascadelake@2.6GHz (AVX512)
GPUs: 1x Nvidia Tesla V100
Cores/Node: 72
Nodes: 2
Total cores: 144 (+ 30 GPU)
RAM/Node: 2.9 TB
RAM/Core: 40 GB
Note that depending on the computing share owned by the partner you are attached to, you may not have access to all the partitions. You can check on which partition(s) your project has allocated hours thanks to the command ccc_myproject.
ccc_mpinfo displays the available partitions/queues that can be used on a job.
$ ccc_mpinfo
                      --------------CPUS------------  -------------NODES------------
PARTITION    STATUS   TOTAL   DOWN    USED    FREE    TOTAL   DOWN    USED    FREE     MpC   CpN SpN CpS TpC
---------    ------   ------  ------  ------  ------  ------  ------  ------  ------   ----- --- --- --- ---
skylake      up         9960       0    9773     187     249       0     248       1    4500  40   2  20   1
xlarge       up          192       0     192       0       3       0       3       0   48000  64   4  16   1
hybrid       up          140       0      56      84       5       0       2       3    8892  28   2  14   1
v100         up          120       0       0     120       3       0       0       3    9100  40   2  20   1
MpC : amount of memory per core
CpN : number of cores per node
SpN : number of sockets per node
Cps : number of cores per socket
TpC : number of threads per core This allows for SMT (Simultaneous Multithreading, as hyperthreading for Intel architecture)
Interconnect
The compute nodes are connected through a EDR InfiniBand network in a pruned FAT tree topology. This high throughput and low latency network is used for I/O and communications among nodes of the supercomputer.
Lustre
Lustre is a type of parallel distributed file system, commonly used for large-scale cluster computing. It actually relies on a set of multiple I/O servers and the Lustre software presents them as a single unified filesystem.
The major Lustre components are the MDS and OSSs. The MDS stores metadata such as file names, directories, access permissions, and file layout. It is not actually involved in any I/O operations. The actual data is stored on the OSSs. Note that one single file can be stored on several OSSs which is one of the benefits of Lustre when working with large files.
Lustre
More information on how Lustre works and best practices are described in Lustre best practice.