[Trilinos-Users] Problem with Epetra FECrsMatrix during the assembly phase

MLX82 mlx82 at fastwebnet.it
Tue Jul 29 07:12:03 MDT 2008


Alan,

regarding the GlobalAssemble, I have set correctly two different maps 
for rectangular matrices Bblock and Cblock.

About the construction of the Graph of Ablock, I have found a warning 
message on the instruction: Agraph->InsertGlobalIndices(...); The 
warning code is 3, and on the Doxygen Manual I have read that: "If the 
graph is a 'View' mode graph, then a positive warning code of 3 will be 
returned if the specified row already exists." This is strange, because 
I am constructing the graph in "copy" mode!

Maybe I have misunderstood how to assemble correctly a Graph: in my code 
Arows is an int vector, where each position correspond to a row of the 
matrix and contains the total number of NonZeros elements of that row. 
With the second instruction (Agraph->InsertGlobalIndices(i, 3*nLocal, 
intVector)), I put for each row (i) the column indexes (intVector) 
containing the NonZeros elements, where 3*nLocal is again the total 
number of NonZeros elements of that row (length of vector intVector). I 
hope that all of this is correct.

There is also another error message during the execution of the test 
code with the serial part. This error don't stop the execution of the 
code but probably can you to give me an advice on how to solve the problem:

Epetra ERROR -1, 
../../../../trilinos-8.0.7/packages/epetra/src/Epetra_CrsGraph.cpp, line 
1245
Epetra ERROR -1, 
../../../../trilinos-8.0.7/packages/epetra/src/Epetra_CrsMatrix.cpp, 
line 1004
Epetra ERROR -1, 
../../../../trilinos-8.0.7/packages/epetraext/src/EpetraExt_MatrixMatrix.cpp, 
line 1320

This error appear after the operation 
EpetraExt::MatrixMatrix::Add(*Ablock,false,1.0,AblockS,-1.0); which I 
use to see if there is a difference between Ablock and AblockS.

Thanks for your help.

   Cristiano Malossi

Williams, Alan B ha scritto:
> Cristiano,
>  
> Regarding passing the maps to GlobalAssemble, make sure you understand 
> which maps are which. Using mapV for both domain-map and range-map 
> will not be correct in parallel or if the matrix is rectangular.
>  
> When you construct the matrix graph you loop from mapV->MinMyGID() to 
> mapV->MaxMyGID(), which gives you the locally-owned portion of the 
> row-space. But when you fill the matrix you loop from 0 to 
> miniGrid->getElementsNumber(). Is that the locally-owned portion of 
> your elements? To gain an advantage from pre-constructing the graph, 
> you need to make sure the graph contains the same set of rows and 
> column-indices that you later put into the matrix.
>  
> When you fill the matrix, you should check the return-value of the 
> SumIntoGlobalValues function. If it is non-zero, it means that either 
> extra allocation is being done, or that some coefficients are being 
> excluded from the fill (in other words not being put into the matrix).
>  
> Alan
>  
>
>     ------------------------------------------------------------------------
>     *From:* trilinos-users-bounces at software.sandia.gov
>     [mailto:trilinos-users-bounces at software.sandia.gov] *On Behalf Of *MLX
>     *Sent:* Sunday, July 27, 2008 5:45 AM
>     *To:* Williams, Alan B
>     *Cc:* trilinos-users at software.sandia.gov
>     *Subject:* Re: [Trilinos-Users] Problem with Epetra FECrsMatrix
>     during the assembly phase
>
>     Alan,
>      
>     Thanks for the advice on the GlobalAssemble. I have decided to
>     give both maps because in the complete code I have other 2 matrix
>     (Bblock and Cblock) which are rectangular and need that kind of
>     call; so, to be conservative, I have put the maps also in
>     the GlobalAssemble of Ablock matrix (notice that also these two
>     matrix are assembled really slow when I use more than one process).
>      
>     Here is a portion of the same code with the original structure. I
>     assemble Ablock in two steps (two different parts of the NS problem).
>     The slow istruction is Ablock.SumIntoGlobalValues(...). The rest
>     of the code is fast.
>      
>     There is also a really strange thing that I want to tell
>     you: as you can see I have to different SumIntoGlobalValues. The
>     first one inserts 4x4 blocks, while the second one  16x16
>     blocks. In the following code, 99% of the assembly time is taken
>     by the SumIntoGlobalValue of 4x4 blocks, while only 1% is spent on
>     the 16x16 instruction. If I invert the steps order, doing first
>     the 16x16 blocks part, and then the 4x4 blocks part, the situation
>     completely changes and 99% of the time is spent on the 16x16 part
>     of the code.
>      
>     I hope you can help me solve this!
>      
>     Thanks, again,
>      
>        Cristiano Malossi
>      
>     -----------------------------------------------------------------------------------
>     comm = new Epetra_MpiComm(MPI_COMM_WORLD);
>     mapV = new Epetra_Map(NVdof,0,*comm);
>
>     ...
>      
>     // Costruction of the MATRIX GRAPH =================
>     //==================================================
>     //Costruction of the Graph
>     Agraph = new Epetra_FECrsGraph(Copy,*mapV,Arows);
>     for(int i=mapV->MinMyGID() ; i<=mapV->MaxMyGID() ; ++i)
>     {   
>         ...
>         ...
>        
>         Agraph->InsertGlobalIndices(i, 3*nLocal, intVector);
>     }
>
>     Agraph->GlobalAssemble(*mapV,*mapV);
>     Agraph->OptimizeStorage();
>
>
>     // Costruction of the MATRIX =======================
>     //==================================================
>     Ablock = new Epetra_FECrsMatrix(Copy,*Agraph);
>
>     if(comm->MyPID() == 0)
>     {
>         
>         // --- FIRST STEP ---
>         for(int r=0;r<miniGrid->getElementsNumber();++r)
>         {
>                 ...
>                
>                 for(int i=0;i<3;++i) // 3 velocity component
>                 {                       
>                     ...
>                     //Create Matloc and Iblock
>                     ...
>                    
>                    
>     Ablock.SumIntoGlobalValues(4,Iblock,4,Iblock,MatLoc,Epetra_FECrsMatrix::ROW_MAJOR);
>                 }
>          }
>        
>         // --- SECOND STEP ---
>         for(int r=0;r<miniGrid->getElementsNumber();++r)
>         {
>                 ...
>                
>                 for(int i=0;i<3;++i) // 3 velocity component
>                 {                        
>                        for(int j=0;j<3;++j) // another loop
>                        {       
>                              //Create Matloc, Iblock and Jblock
>                              ...
>                              ... 
>      
>                             
>     Ablock.SumIntoGlobalValues(16,Iblock,16,Jblock,MatLoc,Epetra_FECrsMatrix::ROW_MAJOR);
>                        }
>                 }
>          }
>     }
>
>     Ablock->GlobalAssemble(*mapV,*mapV);
>     Ablock->OptimizeStorage();
>
>     // Boundary conditions ==============================
>     //===================================================
>
>     // Finaly I have a for loop with this kind of calls for boundary
>     conditions
>     Ablock->ExtractGlobalRowCopy(row,lungNV,numEntries,values,indices);
>     Ablock->ReplaceGlobalValues(row,numEntries,zero,indices);
>      
>      
>      
>      
>      
>
>         ----- Original Message -----
>         *From:* Williams, Alan B <mailto:william at sandia.gov>
>         *To:* MLX82 <mailto:mlx82 at fastwebnet.it>
>         *Cc:* trilinos-users at software.sandia.gov
>         <mailto:trilinos-users at software.sandia.gov>
>         *Sent:* Saturday, July 26, 2008 1:02 AM
>         *Subject:* RE: [Trilinos-Users] Problem with Epetra
>         FECrsMatrix during the assembly phase
>
>         Cristiano,
>         In the MPI assembly case where it is really slow, do you know
>         which matrix functions are using most of the time?
>         In the code you sent, the only potential error I notice is
>         that you are passing mapV as both arguments to the
>         GlobalAssemble function. The GlobalAssemble function expects a
>         domain-map and a range-map. Passing your row-map for both of
>         these will generally not be correct. If your matrix is square
>         (it usually is for finite-element problems) then it is best to
>         pass nothing for these arguments, and let Epetra determine the
>         domain-map and range-map internally.
>          
>         I would like to see the original code where the MPI matrix
>         assembly was really slow.
>          
>         Alan
>          
>
>             ------------------------------------------------------------------------
>             *From:* trilinos-users-bounces at software.sandia.gov
>             [mailto:trilinos-users-bounces at software.sandia.gov] *On
>             Behalf Of *MLX82
>             *Sent:* Friday, July 25, 2008 12:56 PM
>             *To:* Williams, Alan B
>             *Cc:* trilinos-users at software.sandia.gov
>             *Subject:* Re: [Trilinos-Users] Problem with Epetra
>             FECrsMatrix during the assembly phase
>
>             Alan, thanks for your help!
>
>             Here is a portion of my code containing only trilinos
>             essential parts. All of this is in a c++ class that
>             assemble and solve a FEM linear system from a NS problem.
>             The class is created on all processes involved. Note that
>             this is the code that I am using for the test now. In the
>             original code (where MPI matrix assembly was really slow)
>             all operations involving AgraphS, AblockS, mapVS and
>             Export were not present! If you want a version of the code
>             without the serial Matrix part I will send it to you.
>
>             Thansk again!
>
>             ----------------------------------------------------------------------------------------
>             comm = new Epetra_MpiComm(MPI_COMM_WORLD);
>             mapV = new Epetra_Map(NVdof,0,*comm); // Velocity map
>             (NVdof is Number of degree of freedom for velocity field)
>
>             ...
>
>             // Costruction of the SERIAL MATRIX GRAPH =================
>             //=========================================================
>
>             int NumMyElementsV = 0;
>             if(comm->MyPID() == 0)
>                 NumMyElementsV = NVdof;
>             Epetra_Map mapVS(NVdof,NumMyElementsV,0,*comm);
>             Epetra_FECrsGraph AgraphS(Copy,mapVS,Arows);
>             for(int i=mapVS.MinMyGID() ; i<=mapVS.MaxMyGID() ; ++i)
>             {   
>                 ...
>             .    ...
>                
>                 AgraphS.InsertGlobalIndices(i, 3*nLocal, intVector);
>             }   
>             AgraphS.GlobalAssemble(mapVS,mapVS);
>             AgraphS.OptimizeStorage();
>
>             // Costruction of the SERIAL MATRIX =======================
>             //=========================================================
>
>             Epetra_FECrsMatrix AblockS(Copy,AgraphS); // <-- THIS IS
>             THE SERIAL MATRIX
>             if(comm->MyPID() == 0)
>             {
>                 for(int r=0;r<miniGrid->getElementsNumber();++r)
>                     {
>                         ...
>                        
>                         for(int i=0;i<3;++i) // 3 velocity component
>                         {                       
>                             ...
>                             //Create Matloc and Iblock
>                             ...
>                            
>                            
>             AblockS.SumIntoGlobalValues(4,Iblock,4,Iblock,MatLoc,Epetra_FECrsMatrix::ROW_MAJOR);
>                             // In the original code (where Ablocks
>             replace Ablock), the above operation was the slow part of
>             the code!!!
>                         }
>                     }
>             }
>
>             AblockS.GlobalAssemble(mapVS,mapVS);
>
>             // Costruction of the MPI MATRIX GRAPH ====================
>             //=========================================================
>             //Costruction of the Graph
>             Agraph = new Epetra_FECrsGraph(Copy,*mapV,Arows);
>             for(int i=mapV->MinMyGID() ; i<=mapV->MaxMyGID() ; ++i)
>             {   
>                 ...
>                 ...
>                
>                 Agraph->InsertGlobalIndices(i, 3*nLocal, intVector);
>             }
>
>             Agraph->GlobalAssemble(*mapV,*mapV);
>             Agraph->OptimizeStorage();
>
>             // Costruction of the COPY THE SERIAL TO THE MPI ==========
>             //=========================================================
>
>             Ablock = new Epetra_FECrsMatrix(Copy,*Agraph); // <-- THIS
>             IS THE MPI MATRIX
>
>             Epetra_Export ExpV(mapVS,*mapV);
>             Ablock->Export(AblockS, ExpV, Add);
>
>             Ablock->GlobalAssemble(*mapV,*mapV);
>             Ablock->OptimizeStorage();
>
>             // Boundary conditions ====================================
>             //=========================================================
>
>             // Finaly I have a for loop with this kind of calls for
>             boundary conditions
>             Ablock->ExtractGlobalRowCopy(row,lungNV,numEntries,values,indices);
>             Ablock->ReplaceGlobalValues(row,numEntries,zero,indices);
>
>
>
>
>             Williams, Alan B ha scritto:
>>             Hi Cristiano,
>>
>>             Assembly of a matrix usually performs very well in parallel. Time decreases as more processors are used for the same matrix.
>>
>>             When assembling a matrix with a distributed finite element problem, FECrsMatrix places local contributions directly into the underlying CrsMatrix, and holds overlapping contributions (contributions related to shared nodes, i.e. nodes on the boundary between processors). The overlapping contributions get communicated to the "owning processor" when the GlobalAssemble function is called.
>>
>>             Can you send a portion of your code? I would like to see the code that creates your matrix and defines the row-map, and also the code that inserts or sums global values into the matrix.
>>
>>             Alan
>>
>>
>>               
>>>             -----Original Message-----
>>>             From: trilinos-users-bounces at software.sandia.gov
>>>             [mailto:trilinos-users-bounces at software.sandia.gov] On Behalf Of MLX82
>>>             Sent: Friday, July 25, 2008 9:48 AM
>>>             To: trilinos-users at software.sandia.gov
>>>             Subject: [Trilinos-Users] Problem with Epetra FECrsMatrix
>>>             during the assembly phase
>>>
>>>             Hi,
>>>
>>>             I am Cristiano Malossi from MOX, (Department of Mathematics,
>>>             Politecnico di Milano, Italy). I am having some problems during the
>>>             assembly phase in a FEM code.
>>>
>>>             If I run the code on a single CPU it takes about 50 seconds to
>>>             assemble a 500k DOF matrix, which is a reasonable time. The same case,
>>>             takes about 2200 seconds to create the matrix if I use two or more
>>>             CPUs: that is strange, as I use always only one core to assemble the
>>>             matrix, (which in this case has a map with rows distributed on two or
>>>             more
>>>             processes), and so I think I should have an assembly time of about the
>>>             same magnitude. Note that, to assemble the matrix I loop on the
>>>             elements, inserting many small blocks (4x4 or 16x16) with the
>>>             instruction InsertGlobalValues(...).
>>>
>>>             I have tried to solve the problem using graphs, but unfortunately it
>>>             seems that nothing is changed in the assembly phase (now I am using
>>>             SumIntoGlobalValueS()), while (fortunately) I have had some sensible
>>>             increase on the performances generating the ILU preconditioner and
>>>             solving the linear system.
>>>
>>>             Now I am trying to find if the problem is related to the
>>>             redistribution among the processes of the rows of the matrix: to do
>>>             this I'm using the same comunicator, with a different "serial map"
>>>             (with all rows on process 0) to generate a "serial matrix". After the
>>>             assembly of this new matrix, I use these commands to create a new
>>>             distributed matrix:
>>>
>>>             Epetra_Export Exp(map_Serial, map_MPI);
>>>             MATRIX_MPI.Export(MATRIX_Serial, Exp, Add);
>>>
>>>             the results is that on some small cases (both serial and MPI) it works
>>>             fine, while on others (always MPI) it collapses. In any case, the
>>>             assembly time of the serial matrix seems ok (50 seconds) and the time
>>>             taken by the export process is very low (1 o 2 seconds).
>>>
>>>             Can anyone give me an advice to speed up my assembly process?
>>>
>>>             Thanks for any help.
>>>
>>>                Cristiano Malossi
>>>
>>>
>>>
>>>
>>>             _______________________________________________
>>>             Trilinos-Users mailing list
>>>             Trilinos-Users at software.sandia.gov
>>>             http://software.sandia.gov/mailman/listinfo/trilinos-users
>>>
>>>                 
>>
>>               
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://software.sandia.gov/mailman/private/trilinos-users/attachments/20080729/5f3fd12a/attachment-0001.html 


More information about the Trilinos-Users mailing list