[Trilinos-Users] Results from a scaling study of ML
James Elliott
jjellio3 at ncsu.edu
Sun Mar 28 17:18:31 MST 2021
# cores per proc is usually between 1 and 16 (fill up one socket)
I may be off... been a while since I ran there. FYI, cori was really noisy.
cores_per_proc=1
John, I believe the usual Cori/Haswell slurm launch should look like:
srun_opts=(
# use cores,v if you want verbosity
--cpu_bind=cores
-c $(($cores_per_proc*2))
# distribution puts ranks on nodes, then sockets
# block,block - is like aprun default, which fills
# a socket on a node, then the next socket on the same node
# the the next node...
# block,cyclic is/was the default on Cori
# that will put rank0 on socket0, rank1 on socket1 (same node)
# and repeat until the node is full. (it will stride your procs
# between the sockets on the node)
# This detail caused a few apps pain when Trinity swapped from
# aprun.
# Pick block,block or block,cyclic
--distribution=block,block
# the usual -n -N stuff
)
srun "${srun_opts[@]}" ./app ....
On 3/28/2021 5:23 PM, John Cary wrote:
> Hi All,
>
> As promised, we have done scaling studies on the haswell nodes on Cori
> at NERSC using ML_preconditioner.exe
> as compiled, so this is a weak scaling study with 65536 cells/nodes
> per processor. We find a parallel efficiency
> (speedup/expected speedup) that drops to 25% on 32 processes.
>
> Is this expected?
>
> Are their command line args to srun that might improve this? (I tried
> various args to --cpu-bind.)
>
> I can provide plenty more info (configuration line, how run, ...).
>
> Thx.....John
>
> On 3/24/21 9:05 AM, John Cary wrote:
>>
>>
>> Thanks, Chris, thanks Jonathan,
>>
>> I have found these executables, and we are doing scaling studies now.
>>
>> Will report....John
>>
>>
>>
>> On 3/23/21 9:42 PM, Siefert, Christopher wrote:
>>> John,
>>>
>>> There are some scaling examples in
>>> trilinoscouplings/examples/scaling (example_Poisson.cpp and
>>> example_Poisson2D.cpp) that use the old stack and might do what you
>>> need.
>>>
>>> -Chris
>>
>>
>> On 3/23/21 7:48 PM, Hu, Jonathan wrote:
>>> Hi John,
>>>
>>> ML has a 2D Poisson driver in
>>> ml/examples/BasicExamples/ml_preconditioner.cpp. The cmake target
>>> should be either "ML_preconditioner" or "ML_preconditioner.exe".
>>> There's a really similar one in ml/examples/XML/ml_XML.cpp that you
>>> can drive with an XML deck. Is this what you're after?
>>>
>>> Jonathan
>>>
>>> On 3/23/21, 5:47 PM, "Trilinos-Users on behalf of John Cary"
>>> <trilinos-users-bounces at trilinos.org on behalf of cary at colorado.edu>
>>> wrote:
>>>
>>> We are still using the old stack: ML, Epetra, ...
>>>
>>> When we run a simple Poisson solve on our cluster (32
>>> cores/node), we
>>> see parallel efficiency drop to 4% on one node with 32 cores.
>>> So we
>>> naturally believe we are doing something wrong.
>>>
>>> Does trilinos come with a simple Poisson-solve executable that
>>> we could
>>> use to test scaling (to get around the uncertainties of our use of
>>> trilinos)?
>>>
>>> Thx.......John Cary
>>>
>>> _______________________________________________
>>> Trilinos-Users mailing list
>>> Trilinos-Users at trilinos.org
>>> http://trilinos.org/mailman/listinfo/trilinos-users_trilinos.org
>>>
>>>
>>>
>>
>
>
> _______________________________________________
> Trilinos-Users mailing list
> Trilinos-Users at trilinos.org
> http://trilinos.org/mailman/listinfo/trilinos-users_trilinos.org
More information about the Trilinos-Users
mailing list