[Trilinos-Users] Results from a scaling study of ML

Sun Mar 28 17:18:31 MST 2021

# cores per proc is usually between 1 and 16 (fill up one socket)

I may be off... been a while since I ran there. FYI, cori was really noisy.

cores_per_proc=1
John, I believe the usual Cori/Haswell slurm launch should look like:

srun_opts=(
# use cores,v if you want verbosity
--cpu_bind=cores
-c $(($cores_per_proc*2))
# distribution puts ranks on nodes, then sockets
# block,block - is like aprun default, which fills
# a socket on a node, then the next socket on the same node
# the the next node...
# block,cyclic is/was the default on Cori
# that will put rank0 on socket0, rank1 on socket1 (same node)
# and repeat until the node is full. (it will stride your procs
# between the sockets on the node)
# This detail caused a few apps pain when Trinity swapped from
# aprun.
# Pick block,block or block,cyclic
--distribution=block,block
# the usual -n -N stuff
)

srun "${srun_opts[@]}" ./app ....

On 3/28/2021 5:23 PM, John Cary wrote:
> Hi All,
>
> As promised, we have done scaling studies on the haswell nodes on Cori 
> at NERSC using ML_preconditioner.exe
> as compiled, so this is a weak scaling study with 65536 cells/nodes 
> per processor.  We find a parallel efficiency
> (speedup/expected speedup) that drops to 25% on 32 processes.
>
> Is this expected?
>
> Are their command line args to srun that might improve this?  (I tried 
> various args to --cpu-bind.)
>
> I can provide plenty more info (configuration line, how run, ...).
>
> Thx.....John
>
> On 3/24/21 9:05 AM, John Cary wrote:
>>
>>
>> Thanks, Chris, thanks Jonathan,
>>
>> I have found these executables, and we are doing scaling studies now.
>>
>> Will report....John
>>
>>
>>
>> On 3/23/21 9:42 PM, Siefert, Christopher wrote:
>>> John,
>>>
>>> There are some scaling examples in 
>>> trilinoscouplings/examples/scaling (example_Poisson.cpp and 
>>> example_Poisson2D.cpp) that use the old stack and might do what you 
>>> need.
>>>
>>> -Chris
>>
>>
>> On 3/23/21 7:48 PM, Hu, Jonathan wrote:
>>> Hi John,
>>>
>>>     ML has a 2D Poisson driver in 
>>> ml/examples/BasicExamples/ml_preconditioner.cpp.  The cmake target 
>>> should be either "ML_preconditioner" or "ML_preconditioner.exe". 
>>> There's a really similar one in ml/examples/XML/ml_XML.cpp that you 
>>> can drive with an XML deck. Is this what you're after?
>>>
>>> Jonathan
>>>
>>> On 3/23/21, 5:47 PM, "Trilinos-Users on behalf of John Cary" 
>>> <trilinos-users-bounces at trilinos.org on behalf of cary at colorado.edu> 
>>> wrote:
>>>
>>>      We are still using the old stack: ML, Epetra, ...
>>>
>>>      When we run a simple Poisson solve on our cluster (32 
>>> cores/node), we
>>>      see parallel efficiency drop to 4% on one node with 32 cores.  
>>> So we
>>>      naturally believe we are doing something wrong.
>>>
>>>      Does trilinos come with a simple Poisson-solve executable that 
>>> we could
>>>      use to test scaling (to get around the uncertainties of our use of
>>>      trilinos)?
>>>
>>>      Thx.......John Cary
>>>
>>>      _______________________________________________
>>>      Trilinos-Users mailing list
>>>      Trilinos-Users at trilinos.org
>>> http://trilinos.org/mailman/listinfo/trilinos-users_trilinos.org
>>>
>>>
>>>
>>
>
>
> _______________________________________________
> Trilinos-Users mailing list
> Trilinos-Users at trilinos.org
> http://trilinos.org/mailman/listinfo/trilinos-users_trilinos.org