[Trilinos-Users] Problem with Epetra FECrsMatrix during the assembly phase

MLX82 mlx82 at fastwebnet.it
Fri Jul 25 12:56:29 MDT 2008


Alan, thanks for your help!

Here is a portion of my code containing only trilinos essential parts.
All of this is in a c++ class that assemble and solve a FEM linear
system from a NS problem. The class is created on all processes
involved. Note that this is the code that I am using for the test now.
In the original code (where MPI matrix assembly was really slow) all
operations involving AgraphS, AblockS, mapVS and Export were not
present! If you want a version of the code without the serial Matrix
part I will send it to you.

Thansk again!

----------------------------------------------------------------------------------------
comm = new Epetra_MpiComm(MPI_COMM_WORLD);
mapV = new Epetra_Map(NVdof,0,*comm); // Velocity map (NVdof is Number
of degree of freedom for velocity field)

...

// Costruction of the SERIAL MATRIX GRAPH =================
//=========================================================

int NumMyElementsV = 0;
if(comm->MyPID() == 0)
    NumMyElementsV = NVdof;
Epetra_Map mapVS(NVdof,NumMyElementsV,0,*comm);
Epetra_FECrsGraph AgraphS(Copy,mapVS,Arows);
for(int i=mapVS.MinMyGID() ; i<=mapVS.MaxMyGID() ; ++i)
{   
    ...
.    ...
   
    AgraphS.InsertGlobalIndices(i, 3*nLocal, intVector);
}   
AgraphS.GlobalAssemble(mapVS,mapVS);
AgraphS.OptimizeStorage();

// Costruction of the SERIAL MATRIX =======================
//=========================================================

Epetra_FECrsMatrix AblockS(Copy,AgraphS); // <-- THIS IS THE SERIAL MATRIX
if(comm->MyPID() == 0)
{
    for(int r=0;r<miniGrid->getElementsNumber();++r)
        {
            ...
           
            for(int i=0;i<3;++i) // 3 velocity component
            {                       
                ...
                //Create Matloc and Iblock
                ...
               
               
AblockS.SumIntoGlobalValues(4,Iblock,4,Iblock,MatLoc,Epetra_FECrsMatrix::ROW_MAJOR);
                // In the original code (where Ablocks replace Ablock),
the above operation was the slow part of the code!!!
            }
        }
}

AblockS.GlobalAssemble(mapVS,mapVS);

// Costruction of the MPI MATRIX GRAPH ====================
//=========================================================
//Costruction of the Graph
Agraph = new Epetra_FECrsGraph(Copy,*mapV,Arows);
for(int i=mapV->MinMyGID() ; i<=mapV->MaxMyGID() ; ++i)
{   
    ...
    ...
   
    Agraph->InsertGlobalIndices(i, 3*nLocal, intVector);
}

Agraph->GlobalAssemble(*mapV,*mapV);
Agraph->OptimizeStorage();

// Costruction of the COPY THE SERIAL TO THE MPI ==========
//=========================================================

Ablock = new Epetra_FECrsMatrix(Copy,*Agraph); // <-- THIS IS THE MPI MATRIX

Epetra_Export ExpV(mapVS,*mapV);
Ablock->Export(AblockS, ExpV, Add);

Ablock->GlobalAssemble(*mapV,*mapV);
Ablock->OptimizeStorage();

// Boundary conditions ====================================
//=========================================================

// Finaly I have a for loop with this kind of calls for boundary conditions
Ablock->ExtractGlobalRowCopy(row,lungNV,numEntries,values,indices);
Ablock->ReplaceGlobalValues(row,numEntries,zero,indices);




Williams, Alan B ha scritto:
> Hi Cristiano,
>
> Assembly of a matrix usually performs very well in parallel. Time decreases as more processors are used for the same matrix.
>
> When assembling a matrix with a distributed finite element problem, FECrsMatrix places local contributions directly into the underlying CrsMatrix, and holds overlapping contributions (contributions related to shared nodes, i.e. nodes on the boundary between processors). The overlapping contributions get communicated to the "owning processor" when the GlobalAssemble function is called.
>
> Can you send a portion of your code? I would like to see the code that creates your matrix and defines the row-map, and also the code that inserts or sums global values into the matrix.
>
> Alan
>
>
>   
>> -----Original Message-----
>> From: trilinos-users-bounces at software.sandia.gov
>> [mailto:trilinos-users-bounces at software.sandia.gov] On Behalf Of MLX82
>> Sent: Friday, July 25, 2008 9:48 AM
>> To: trilinos-users at software.sandia.gov
>> Subject: [Trilinos-Users] Problem with Epetra FECrsMatrix
>> during the assembly phase
>>
>> Hi,
>>
>> I am Cristiano Malossi from MOX, (Department of Mathematics,
>> Politecnico di Milano, Italy). I am having some problems during the
>> assembly phase in a FEM code.
>>
>> If I run the code on a single CPU it takes about 50 seconds to
>> assemble a 500k DOF matrix, which is a reasonable time. The same case,
>> takes about 2200 seconds to create the matrix if I use two or more
>> CPUs: that is strange, as I use always only one core to assemble the
>> matrix, (which in this case has a map with rows distributed on two or
>> more
>> processes), and so I think I should have an assembly time of about the
>> same magnitude. Note that, to assemble the matrix I loop on the
>> elements, inserting many small blocks (4x4 or 16x16) with the
>> instruction InsertGlobalValues(...).
>>
>> I have tried to solve the problem using graphs, but unfortunately it
>> seems that nothing is changed in the assembly phase (now I am using
>> SumIntoGlobalValueS()), while (fortunately) I have had some sensible
>> increase on the performances generating the ILU preconditioner and
>> solving the linear system.
>>
>> Now I am trying to find if the problem is related to the
>> redistribution among the processes of the rows of the matrix: to do
>> this I'm using the same comunicator, with a different "serial map"
>> (with all rows on process 0) to generate a "serial matrix". After the
>> assembly of this new matrix, I use these commands to create a new
>> distributed matrix:
>>
>> Epetra_Export Exp(map_Serial, map_MPI);
>> MATRIX_MPI.Export(MATRIX_Serial, Exp, Add);
>>
>> the results is that on some small cases (both serial and MPI) it works
>> fine, while on others (always MPI) it collapses. In any case, the
>> assembly time of the serial matrix seems ok (50 seconds) and the time
>> taken by the export process is very low (1 o 2 seconds).
>>
>> Can anyone give me an advice to speed up my assembly process?
>>
>> Thanks for any help.
>>
>>    Cristiano Malossi
>>
>>
>>
>>
>> _______________________________________________
>> Trilinos-Users mailing list
>> Trilinos-Users at software.sandia.gov
>> http://software.sandia.gov/mailman/listinfo/trilinos-users
>>
>>     
>
>   

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://software.sandia.gov/mailman/private/trilinos-users/attachments/20080725/5b1a29ff/attachment.html 


More information about the Trilinos-Users mailing list