HPC Clusters

The Scientific Computing group working with JLab's Theory group has deployed a sequence of high performance clusters in support of Lattice Quantum ChromoDynamics (LQCD). LQCD is the numerical approach to solving QCD, the fundamental theory of quarks and gluons and their interactions. This computationally demanding suite of applications are key to understanding the experimental program of the laboratory.

Deployed Clusters

Cluster Specifications Nodes Cores Accelerators GPUs
21g Dual AMD "Rome" 7502 2.5GHz 32-core/64-thread CPU, 1TB 3200 memory, Mellanox (100Gbps), PCIe 4, 2TB SSD Scratch 8 256 8x AMD MI100 GPU, Inter-GPU Infinity fabric 64
19g Intel Xeon "Skylake" Gold 5118, 196 GB memory, Omni-Path fabric (100 Gbps), 1TB disk 32 768 8x NVIDIA GeForce RTX 2080 GPU (no ECC) 256
18p 16 GB HBM, 92 GB main memory, Omni-Path fabric (100 Gbps), 200TB SSD 180 12,240 Knight's Landing 0
16p 16 GB HBM, 192 GB main memory, OmniPath fabric (100 Gbps), 1TB disk 264 16,896 Xeon Phi 7230 0
      30,160   320
 
All clusters have multiple high speed, low latency uplinks into the main disk server fabric, and access all of the filesystems over Omni Path or Infiniband.

 

Decommissioned Clusters

12k Cluster

12k Cluster

The 2012 cluster consists of 42 nodes of dual opto-core Sandy Bridge E5-2650 CPUs running at 2.00 GHz. Each node contains 128 GB of host memory and four passively-cooled NVIDIA Tesla K20m GPUs.

12s Cluster

12s Cluster

The 2012 cluster consists of 212 nodes of dual octo-core Sandy Bridge E5-2650 CPUs running at 2.00 GHz. Each node is equipped with 32 GB DDR3 ECC memory clocked at 1600 MHz and a 500 GB local hard drive. In addition, these nodes are connected via QDR InifiniBand networks providing 40 Gb/s bandwidth.

10g Cluster

10g Cluster

The 2010 cluster consists of 50 nodes of dual quad-core Westmere (E5630) CPUs running at 2.53 GHz. Each node is equipped with 48 GB DDR3 ECC memory clocked at 1331 MHz. Among these 50 nodes, there are 32 nodes each of which is equipped with 4 Nvidia Tesla C2050 GPUs. Each C2050 GPU consists 480 processing cores and 3 GB GDDR5 memory. Each C2050 can deliver 144 GB/s memory bandwidth and 1.03 Tflops of single precision performance. These 32 nodes are connected via QDR InifiniBand networks providing 40 Gb/s bandwidth. Each of the remaining nodes consists of 4 Nvidia GTX 480 GPUs. Each of these GPUs provides 480 processing cores and 1.6 GB GDDR5 memory. Every node provides cuda 3.0 development toolkit, cuda 195.36.24 driver, OFED 1.4.2 InfiniBand Development library and message passing libraries such as mvapich and OpenMPI. Currently, each node runs CentOS 5.3 with gcc 4.1.2 and Linux Kernel 2.6.18.

10q Cluster

10q Cluster

The 2010 cluster consists of 224 nodes of dual quad-core Westmere (E5630) CPUs running at 2.53 GHz. Two nodes are hosted inside one 2U box. Each node is equipped with 48 GB DDR3 ECC memory clocked at 1331 MHz and 500 GB local hard drive. In addition, these nodes are connected via QDR InifiniBand networks providing 40 Gb/s bandwidth. Every node provides OFED 1.4.2 InfiniBand Development library and message passing libraries such as mvapich and OpenMPI. Currently, each node runs CentOS 5.3 with gcc 4.1.2 and Linux Kernel 2.6.18.

9g Cluster

9g Cluster

The 2009 cluster consists of 65 nodes of dual quad-core Nahalem (E5530) CPUs running at 2.4 GHz. Each node is equipped with 48 GB DDR3 ECC memory clocked at 1331 MHz. There are 4 GPUs installed on every nodes. Among these GPUs, there are 96 Nvidia GTX 480 GPUs with Fermi architecture. The rest of 156 GTX 285 GPUs utilize older GT 200 architecture. Every node provides cuda 3.0 development toolkit, cuda 195.36.24 driver. Currently, each node runs CentOS 5.3 with gcc 4.1.2 and Linux Kernel 2.6.18.

9q Cluster

9q Cluster

The 2009 cluster consists of 320 nodes of dual quad core Nehalem CPUs connected via a QDR InifiniBand switched network. Each node has two processors running at 2.4 GHz, 24 GB DDR3-1333 memory and 500 GB SATA disk. In addition, each node has an InfiniBand HCA adapter that provides 40 Gb/s bandwidth.

7n Cluster

7n Cluster

The 2007 cluster consists of 396 nodes of AMD Opteron (quad-core) CPUs connected via DDR InifiniBand switched networks. Each node has two processors running at 1.9 GHz, 8 GB DDR2 memory and an 80 GB SATA disk. In addition, each node has a PCI-EXPRESS (16x) slot for an InfiniBand HCA adapter that provides 20 Gb/s bandwidth.

6n Cluster

6n Cluster

This cluster deployed in 2006 consists of 280 nodes of Intel Pentium D (dual-core) CPUs connected via InifiniBand switched networks. Each node has a single processor running at 3.0GHz with 800 MHz front side bus, 1 GB memory and 80 GB SATA disk. In addition, each node has a PCI-EXPRESS (4x) slot for an InfiniBand HCA adapter that provides 10 Gb/s bandwidth.

4g Cluster

4g Cluster

The 2004 cluster consisted of 384 nodes arranged as a 6x8x2^3 mesh (torus). These nodes were interconnected using 3 dual GigE cards plus one half of the dual gigE NIC on the motherboard (the other half is used for file services. This 5D wiring could be configured as various configurations of a 3D torus (4D and 5D running was possible in principle, but was less efficient so not used). Nodes were single processor 2.8 GHz Xeon, 800 MHz front side bus, 512 MB memory, and 36 GB disk. This cluster achieved approximately 0.7 teraflops sustained.

Message passing on this novel architecture was done using an application optimized library, QMP, for which implementations will also target other custom LQCD machines. The lowest levels of the communications stack were implemented using a VIA driver, and for multiple link transfers VIA data rates approaching 500 MB/sec/node were achieved.

3g Cluster

3g Cluster

The 2003 deployed cluster of 256 nodes each interconnected using 3 dual GigE cards (one per dimension, view of wiring) and one on-board link, an approach which delivered high aggregate bandwidth while avoiding the expense of a high performance switch. Nodes were single processor 2.67 GHz Pentium 4 Xeon with 256 MByte of memory. The cluster achieved 0.4 TeraFlops on LQCD applications.

2m Cluster

2m Cluster

The oldest cluster, now decommissioned, was installed in 2002. The cluster contained 128 nodes of single processor 2.0 GHz Xeons connected by a Myrinet network. It achieved 1/8 TeraFlops of performance (Linpack), and contained 65 GBytes of total physical memory, and delivered 270MB/s simultaneous node-to-node aggregate network bandwidth. All nodes ran Red Hat Linux 7.3 with kernel 2.4-18.

 

About

Resources