[Trilinos-Users] [EXTERNAL] Re: Results from a scaling study of ML

Mon Mar 29 14:50:18 MST 2021

John that's odd.

Cori performance variations usually happen as you scale out to multiple nodes (and you end up with an allocation + other users that causes bad routing performance).

It may be easier to post on github

If you can give me your slurm: sbatch or salloc commands/script. A list of the modules used, and then your srun ( plus app name + flags you give it). I can try to reproduce this on our miniature Cori (trinity testbed at SNL). I no longer have access to NERSC (I was part of the KNL early access program on Cori).

If you are somehow running the Haswell binary on KNL, this could explain a marked slowdown.
On Cori, you usually have to salloc/sbatch with -C haswell.

A Haswell binary will run on KNL, but a KNL binary will not run on Haswell.

Your loaded modules can also have some impacts on performance (even though the binary may be static)

Jonathan, Chris, and I did run MueLu a reasonable amount on Cori duing the early access. The main culprits (then) were large scale perf variations and tracking down issues in MueLu's repartitioning routines (avoiding many to one communications)

James

On 3/29/21, 6:11 AM, "Trilinos-Users on behalf of John Cary" <trilinos-users-bounces at trilinos.org on behalf of cary at colorado.edu> wrote:

    Thanks, James.  So I did

    srun -n 32 --distribution=block,block -c 2 
    /global/cscratch1/sd/cary/builds-cori-gcc/vsimall-cori-gcc/trilinos-13.0.0/parcomm/packages/ml/examples/BasicExamples/ML_preconditioner.exe

    but I am still seeing the same single-node scaling of dropping to 25% 
    parallel efficiency.

    I can see that it is not the fault of ML, because on my own local 
    cluster, which has two
    AMD EPYC 7302 16-Core Processor per node, the single-node parallel 
    efficiency at 32 processes
    is 82%.

    So I guess I still do not know how best to launch on cori.

    Thx.....John

    On 3/28/21 6:18 PM, James Elliott wrote:
    > # cores per proc is usually between 1 and 16 (fill up one socket)
    >
    > I may be off... been a while since I ran there. FYI, cori was really 
    > noisy.
    >
    > cores_per_proc=1
    > John, I believe the usual Cori/Haswell slurm launch should look like:
    >
    > srun_opts=(
    > # use cores,v if you want verbosity
    > --cpu_bind=cores
    > -c $(($cores_per_proc*2))
    > # distribution puts ranks on nodes, then sockets
    > # block,block - is like aprun default, which fills
    > # a socket on a node, then the next socket on the same node
    > # the the next node...
    > # block,cyclic is/was the default on Cori
    > # that will put rank0 on socket0, rank1 on socket1 (same node)
    > # and repeat until the node is full. (it will stride your procs
    > # between the sockets on the node)
    > # This detail caused a few apps pain when Trinity swapped from
    > # aprun.
    > # Pick block,block or block,cyclic
    > --distribution=block,block
    > # the usual -n -N stuff
    > )
    >
    > srun "${srun_opts[@]}" ./app ....
    >
    > On 3/28/2021 5:23 PM, John Cary wrote:
    >> Hi All,
    >>
    >> As promised, we have done scaling studies on the haswell nodes on 
    >> Cori at NERSC using ML_preconditioner.exe
    >> as compiled, so this is a weak scaling study with 65536 cells/nodes 
    >> per processor.  We find a parallel efficiency
    >> (speedup/expected speedup) that drops to 25% on 32 processes.
    >>
    >> Is this expected?
    >>
    >> Are their command line args to srun that might improve this?  (I 
    >> tried various args to --cpu-bind.)
    >>
    >> I can provide plenty more info (configuration line, how run, ...).
    >>
    >> Thx.....John
    >>
    >> On 3/24/21 9:05 AM, John Cary wrote:
    >>>
    >>>
    >>> Thanks, Chris, thanks Jonathan,
    >>>
    >>> I have found these executables, and we are doing scaling studies now.
    >>>
    >>> Will report....John
    >>>
    >>>
    >>>
    >>> On 3/23/21 9:42 PM, Siefert, Christopher wrote:
    >>>> John,
    >>>>
    >>>> There are some scaling examples in 
    >>>> trilinoscouplings/examples/scaling (example_Poisson.cpp and 
    >>>> example_Poisson2D.cpp) that use the old stack and might do what you 
    >>>> need.
    >>>>
    >>>> -Chris
    >>>
    >>>
    >>> On 3/23/21 7:48 PM, Hu, Jonathan wrote:
    >>>> Hi John,
    >>>>
    >>>>     ML has a 2D Poisson driver in 
    >>>> ml/examples/BasicExamples/ml_preconditioner.cpp.  The cmake target 
    >>>> should be either "ML_preconditioner" or "ML_preconditioner.exe". 
    >>>> There's a really similar one in ml/examples/XML/ml_XML.cpp that you 
    >>>> can drive with an XML deck. Is this what you're after?
    >>>>
    >>>> Jonathan
    >>>>
    >>>> On 3/23/21, 5:47 PM, "Trilinos-Users on behalf of John Cary" 
    >>>> <trilinos-users-bounces at trilinos.org on behalf of 
    >>>> cary at colorado.edu> wrote:
    >>>>
    >>>>      We are still using the old stack: ML, Epetra, ...
    >>>>
    >>>>      When we run a simple Poisson solve on our cluster (32 
    >>>> cores/node), we
    >>>>      see parallel efficiency drop to 4% on one node with 32 cores.  
    >>>> So we
    >>>>      naturally believe we are doing something wrong.
    >>>>
    >>>>      Does trilinos come with a simple Poisson-solve executable that 
    >>>> we could
    >>>>      use to test scaling (to get around the uncertainties of our 
    >>>> use of
    >>>>      trilinos)?
    >>>>
    >>>>      Thx.......John Cary
    >>>>
    >>>>      _______________________________________________
    >>>>      Trilinos-Users mailing list
    >>>>      Trilinos-Users at trilinos.org
    >>>> http://trilinos.org/mailman/listinfo/trilinos-users_trilinos.org
    >>>>
    >>>>
    >>>>
    >>>
    >>
    >>
    >> _______________________________________________
    >> Trilinos-Users mailing list
    >> Trilinos-Users at trilinos.org
    >> http://trilinos.org/mailman/listinfo/trilinos-users_trilinos.org
    >

    _______________________________________________
    Trilinos-Users mailing list
    Trilinos-Users at trilinos.org
    http://trilinos.org/mailman/listinfo/trilinos-users_trilinos.org