For the context, in my CFD simulation, my computational domain is divided in blocks. Each block has its own number of cells, each one containing various information. Blocks are distributed among processes through a domain decomposition algorithm to achieve a balanced load.
Meshing the computational domain is performed by process 0 only (I do not want to store the entire mesh in each process as it would be disastrous in term of memory used). Actually, the mesh is coded as a 1D array of blocks, each blocks being a complex derived type with allocatable statement of other derived-types:
TYPE something
integer :: i,j,k
END TYPE something
!------
TYPE cell
integer :: var1
real, dimension(3) :: var2
type(something), dimension(:), allocatable :: var3
END TYPE
!------
TYPE block
integer :: var4
real :: var5
type(cell), dimension(:), allocatable :: var6
END TYPE block
and the mesh defined as :
TYPE(block), dimension(n) :: mesh
My idea is then to use MPI_SCATTERV
(each process handles a different amount of blocks) from process 0 to distribute chunks of my array mesh
to other processes. At the end, I use a MPI_GATHERV
to recover the entire domain if necessary. The problem is that I have to transfer complex derived-type.
I think I have to define MPI derived-type with MPI_TYPE_CREATE_STRUCT
since I have non-homogeneous data. I also read about MPI_PACK
but it seems to be subject to memory overhead. Either way, I am stuck because I have to handle allocatable array inside each derived-type.
How can I define MPI derived-type with allocatable array? Of course, before sending data, those array are allocated but at compilation time, there are not. I need to use MPI_GET_ADDRESS
to compute offsets between each data but I don't know how to do it with allocatable array. Do I need to switch to fixed-length arrays? Do I need to define each of the three types above and construct a 'super' MPI derived-type block? I was thinking about a loop over each cell of each block and send data one by one by that does not seem the right way to do it as it would require a huge number of mpi call, I am looking for a true 'block-block' communication.