[Trilinos-Users] Epetra on TACC's Stampede

Heroux, Mike MHeroux at CSBSJU.EDU
Sun May 11 19:57:38 MDT 2014


Gyorgy,

This is not an error I have seen before.  The NormInf method has two implementations for on-node computation (below the MPI level), one for OpenMP and one for sequential.   

You appear to be running without OpenMP enabled, which means that the BLAS function IDAMAX is called to find the index of the entry with largest magitude.  

It appears that IDAMAX is then calling the avx version of the same function.

I would be very surprised if this is an Epetra error, since this is very stable and widely used code.  However it code be an interface issue between Epetra and the avx blas.

Mike

> On May 11, 2014, at 6:43 PM, "Gyorgy Matyasfalvi" <matyasfalvi at gmail.com> wrote:
> 
> Dear User Community:
> 
> I have already created a TACC ticket regarding this issue but I'm curious if anyone has successfully used Epetra on Stampede? 
> 
> The code below segfaults when computing the sup-norm:
> 
> **************************************************************************************
> #include <iostream>
> #include "mpi.h"
> #include "Epetra_MpiComm.h"
> #include "Epetra_LocalMap.h"
> #include "Epetra_Vector.h"
> #include "Epetra_Version.h"
> #include "mkl_cblas.h"
> 
> #define N 1000
> 
> int main(int argc, char* argv[]) {
> 
> MPI_Init(&argc, &argv);
> 
> Epetra_MpiComm comm(MPI_COMM_WORLD);
> 
> std::cout<<Epetra_Version()<<std::endl<<std::endl<<"### TEST ###"<<std::endl<<std::endl;
> 
> Epetra_Map map(N,0,comm);
> 
> std::cout<<"Map created"<<std::endl;
> 
> Epetra_Vector x(map);
> 
> std::cout<<"Vector created"<<std::endl;
> 
> double norm;
> x.NormInf(&norm);
> 
> std::cout<<"sup-norm of x = "<<norm<<std::endl;
> 
> MPI_Finalize();
> 
> return 0;
> }
> 
> **************************************************************************************
> 
> 
> I get the following error message when debugging with ddt:
> 
> **************************************************************************************
> Process 0: 
> 
> Memory error detected in mkl_blas_avx_idamax from 
> /opt/apps/intel/13/composer_xe_2013.2.146/mkl/lib/intel64/libmkl_avx.so: 
> 
> null pointer dereference or unaligned memory access 
> 
> Note: the latter may sometimes occur spuriously if guard pages are enabled 
> 
> Tip: Use the stack list and the local variables to explore your program's 
> current state and identify the source of the error. 
> 
> **************************************************************************************
> 
> Thanks a lot! 
> Best,
> Gyorgy
> _______________________________________________
> Trilinos-Users mailing list
> Trilinos-Users at software.sandia.gov
> https://software.sandia.gov/mailman/listinfo/trilinos-users


More information about the Trilinos-Users mailing list