NCAR machines (bluefire and bluevista)
Note: blueice has been decommissioned and bluefire was delivered instead (2008/06-).
This page describes G-RSM on bluevista and bluefire. We do not currently use lightning and blueice.
There is a report that pserver now works on NCAR machines. Please try
setenv CVSROOT :pserver:
anoncvs@rokka.ucsd.edu:/rokka1/kana/cvs-server-root/cpscvs
before trying below.
If you have an account on rokka, you can download the model directly from rokka through CVS.
In order to access the CVS server at ECPC, add these lines to your .cshrc
setenv CVSROOT :ext:
Your NCAR User ID@rokka.ucsd.edu:/rokka1/kana/cvs-server-root/cpscvs
setenv CVS_RSH ssh
Please note that you need to contact model master (e.g. kana@ucsd.edu) to set up your account on rokka. Also note taht this can be arranged only on special circumstances. If this is not possible, please follow the instruction below.
All others have to download the model on a machine elsewhere and transfer the package to blueice/bluevista. It may be useful to use the following "install" script option that creates a tar package of the entire G-RSM code.
install –enable-tar
The current installer recognizes blueice as MACHINE=ibmspbl and bluevista as MACHINE=ibmspbv.
Please look at
this page before setting a regional domain. There are some restrictions on the choice of igrd for use with FFT on IBM machines.
Bluefire
Bluefire inherits most of blueice, so the G-RSM runs with the same configuration as blueice (and bluevista). Choose "
ibmspbv" for your
machine. The speed became roughly
doubled to blueice.
Blueice (decommissioned)
G-RSM runs on blueice just like on bluevista.
BENCHMARK TEST (RSM one-month run with rsim script)
|
real # of pes
|
|
Japan (288x309 grids)
|
California (288x349 grids)
|
blueice
|
128
|
128pes, 8x16-way nodes, no-SMT
|
13.4 hours
|
|
|
128
|
256pes, 8x16-way nodes, SMT
|
8.3 hours
|
9.4 hours
|
|
256
|
512pes, 16x16-way nodes, SMT
|
4.4 hours
|
5.4 hours
|
bluevista
|
64
|
128pes, 8x8-way nodes, SMT
|
14.2 hours
|
15.7 hours
|
|
128
|
256pes, 16x8-way nodes, SMT
|
7.5 hours
|
8.3 hours
|
lonestar (TACC)
|
128
|
128pes, 32x4-way nodes
|
9.0 hours
|
|
On both machines it is recommended to use SMT for this domain size. Bluevista is slightly faster than blueice. I have not done any optimization specifically for blueice so there may be some room for improvement on blueice. Blueice has much more processors than bluevista so I expect faster turnaround time on blueice. (Hideki)
Bluevista
- NCAR Bluevista page
- 72 x 8-way nodes
- XL Fortran 10.1 page
- set CVS in .cshrc:
- It is a 64bit machine. I had to change libs/lib/w3lib_xxx/nainit.f because fstat returns a different array than other machines. status(8) (true on other IBM machines) is changed to status(11). This change is already committed to the latest version of the model. If you encounter a similar problem on other machines, first print "status" array and find where the correct file size is, and change the line "isize=status(?)".
- Type $ bsub < [script_name] to submit a job. Do not forget "<" sign!
- Use bjobs, bqueues, and lsfq to check job status.
- G-RSM recognizes the machine as ibmspbv.
- Optimization. http://www.cisl.ucar.edu/hss/csg/webcasts/aixupgd/
- MP_STDINMODE=0 has to be set in order to work around a system error with namelist. The latest G-RSM script handles this.
- Simultaneous multi-threading (SMT) works well for some number of nodes and the domain size. Because scalability is different for 1TTP (conventional use of 8-way node) and 2TTP (SMT use), you should test your experiment thoroughly before going to production runs.10-20% speedup may be achieved in some cases. Just compile the code with twice as many processors and set ptile=16 in your run script.
Bluesky (DECOMMISSIONED)
Hi.
The scalability is largely affected to the number of x-y grids. I'm not so sure, but this might be a part of the reason.
Also, you should compare NPES=128/ptile=64 (with SMT) and NPES=64/ptile=32 (without SMT) for RSM, because SMT is "virtual double-core" function (i.e., in both cases, real nodes used are two). From your experiments, it was known that we should use SMT (ptile=64), because the GAU was much cheaper for the similar wall time. Wasn't this right?
By the way, according to the NCAR's information (see http://www.cisl.ucar.edu/cpg/dailyb/todays.html ; article on 09/24/08), there is a case that without-SMT is faster than with-SMT. But this is not the case for RSM.
Kei