[Trilinos-Users] Finite Element Assembly

Danielle Catherine Maddix dcmaddix at stanford.edu
Sun Jan 31 15:08:34 EST 2016


As a further note, when I call Trilinos::K->OptimizeStorage(), in the documentation it says it should it also set the storage of the graph to be optimized, but when I check the following I get true for the first one but false for the second.  I think the fact that the graph is not optimized is causing part of the slow performance

std::cout << Trilinos::K->StorageOptimized();

std::cout << Trilinos::K->Graph().StorageOptimized();

1 0

When I try to explicitly call Trilinos::K_graph->OptimizeStorage(); it sets the optimized storage to true but then I get a seg fault.

Are we supposed to be able to call this function on the graph?


Thanks,

Danielle


________________________________
From: Danielle Catherine Maddix
Sent: Sunday, January 31, 2016 9:50 AM
To: trilinos-users at trilinos.org
Subject: Re: Finite Element Assembly


Hello,


As an update, I added the call to Trilinos::K->OptimizeStorage() and now both methods take the same amount of time in the linear solver.

I had a remaining question about optimizing my assembly routine, since it is still slower than the in-house one we have ~1.6-1.7x slower.

Here is the main parts of my trilinos assembly code, which loop over the elements and assembles in the block element stiffnesses.

Please let me know if you see any optimization I am missing.

My global stiffness matrix is FE_VbrMatrix constructed form FECrsGraph, i.e.

Trilinos::K = new Epetra_FEVbrMatrix(Copy, K_graph); , where

Epetra_FECrsGraph K_graph(Copy, *Trilinos::blockMap,

&nnzPerRow[0]);

I saw that setting StaticProfile to true for Epetra_CrsGraph resulted in better performance, but I don't see this available for the Epetra_FECrsGraph.

I call OptimizeStorage on K after the assembly is complete.

Any help or thoughts would be appreciated.


Thank you,

Danielle Maddix


for (int a = 0; a < numNodesPerElement; ++a)

{

//sum into contributions from element node-global assemble will take those from shared nodes on other processors since FE routine

int error = Trilinos::K->BeginSumIntoGlobalValues(localToGlobal[a], //global block row

   numNodesPerElement, //number of block entries in this row

   &localToGlobal[0]); //block global column indices-implicitly does inner b loop

if (error != 0)

{

std::cout << "ERROR: Setting block row and block column of summing into global values!" << std::endl;

exit(1);

}

//Multiply by Dirichlet term to 0 appropriate row

for (int i = 0; i < dof; ++i)

{

lR[a*dof + i] *= W[(eqN[a] - 1)*dof + i];

}


//submit global F values

Trilinos::F->SumIntoGlobalValues (1, //number of global block rows put in 1 at a time

&localToGlobal[a],  //global index id

&numValuesPerID, //dof values per id pointer to dof

&lR[a*dof]); //values of size dof

if (error != 0) //1 or more indices not associated with calling processor!

{

std::cout << error << std::endl;

std::cout << "ERROR: Summing into global vector values!" << std::endl;

exit(1);

}

//loop over local nodes for columns

for (int b = 0; b < numNodesPerElement; ++b)

{

std::vector<double> values(dof*dof); //used to transpose block since Trilinos takes in SerialMAtrix in column major

for (int i = 0; i < dof; ++i)

{

for (int j = 0; j < dof; ++j) //premult by diagonal W so W(a) multiplies row a

{

values[i*dof + j] = lK[b*dof*dof*numNodesPerElement + a*dof*dof + j*dof + i];

}

}

error = Trilinos::K->SubmitBlockEntry(&values[0], //square dof*dof blocks

    dof, //LDA

dof, //number of rows

dof); //number of columns

if (error != 0)

{

std::cout << "ERROR: Inputting values of summing into matrix global values!" << std::endl;

exit(1);

}

}

//finish submitting the block entry for the current row

error = Trilinos::K->EndSubmitEntries();

if (error != 0)

{

std::cout << "ERROR: End submitting block entries!" << std::endl;

exit(1);

}

}

}


________________________________
From: Danielle Catherine Maddix
Sent: Friday, January 29, 2016 2:17 PM
To: trilinos-users at trilinos.org
Subject: Finite Element Assembly


Hello,


I am getting confusing timing results based on two versions of our finite element code that uses Trilinos.  I started with an in-house finite element solver and we wanted to see how Trilinos linear solvers compared to ours. I am using Epetra and AztecOO.  I have two methods.  One where I copy our global stiffness and global force vector into Trilinos, using BeginReplaceIntoGlobalValues.  My matrix is FeVBRMatrix.  In this case, the times of the linear solver per each Newton iteration are given below:


time_global=[0.1058    0.7401    0.8486    0.8326    0.8607]

iter_global =  [50 196 214 212 218]


In the other case, I replace our current assemble routine and assemble in Epetra using BeginSumIntoGlobalValues.  I verified that the global stiffness matrix and force vector that I get is exactly the same, before I call gmres on it. However, the times are sufficiently slower for the same number of iterations and same solution.


time_assemble = [0.320155 1.6002 1.75062 1.71539 1.84786]

iter_assemble = [50 196 214 212 218]


So I see a 2-3x slowdown for the same number of iterations and same linear problem between the two methods.  Is there any potential reason for this that may come to mind?


Thanks,

Danielle Maddix
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://trilinos.org/pipermail/trilinos-users/attachments/20160131/e602c027/attachment.html>


More information about the Trilinos-Users mailing list