Comments (6)
I was able to duplicate the problem on smog by increasing the number of hostdirs to 379 (same as cielo). By turning on the VERBOSE DEBUG compiler directive, I was able to see that nodes other than the rank 0 node were not seeing the full bitmap from MPI_Bcast.
What I discovered is that the number of bytes being transimitted was not 128 (full bitmap), but 32. This statement in code was the culprit:
int bitmap_bcast_sz = (BIT_ARRAY_LENGTH/8)/sizeof(MPI_UNSIGNED_CHAR);
this is the size used for broadcast
MPIBCAST(bitmap,bitmap_bcast_sz,MPI_UNSIGNED_CHAR,0,comm);
The problem is the sizeof MPI_UNSIGNED_CHAR is 4, so the resulting size was computing as follows: (1024/8)/4 = 32. I was only seeing 32 bytes being broadcast. So any hostdir > 256 (in the map) was not being broadcast and because of no bounds checking, the code kept running until if found a bit set, which was of course an illegal hostdir. Example MPI_Bcast code I have seen does not take into account the size of an MPI type so I am not sure why this was there in the first place. MPI must account for this internally.
Also, I am not sure why this worked previously on cielo. I browsed earlier versions of cray mpich with regard to the MPI_UNSIGNED_CHAR definition and that does not appear to have changed.
from plfs-core.
Wow. Nice debugging.
On Feb 25, 2014, at 7:45 AM, Alfred Torrez [email protected] wrote:
I was able to duplicate the problem on smog by increasing the number of hostdirs to 379 (same as cielo). By turning on the VERBOSE DEBUG compiler directive, I was able to see that nodes other than the rank 0 node were not seeing the full bitmap from MPI_Bcast.
What I discovered is that the number of bytes being transimitted was not 128 (full bitmap), but 32. This statement in code was the culprit:
int bitmap_bcast_sz = (BIT_ARRAY_LENGTH/8)/sizeof(MPI_UNSIGNED_CHAR);
this is the size used for broadcast
MPIBCAST(bitmap,bitmap_bcast_sz,MPI_UNSIGNED_CHAR,0,comm);
The problem is the sizeof MPI_UNSIGNED_CHAR is 4, so the resulting size was computing as follows: (1024/8)/4 = 32. I was only seeing 32 bytes being broadcast. So any hostdir > 256 (in the map) was not being broadcast and because of no bounds checking, the code kept running until if found a bit set, which was of course an illegal hostdir. Example MPI_Bcast code I have seen does not take into account the size of an MPI type so I am not sure why this was there in the first place. MPI must account for this internally.
Also, I am not sure why this worked previously on cielo. I browsed earlier versions of cray mpich with regard to the MPI_UNSIGNED_CHAR definition and that does not appear to have changed.
—
Reply to this email directly or view it on GitHub.
from plfs-core.
The interesting thing here is that the whole point of the MPI datatypes is so we don't have to handle this. MPI_BCast should take into account the size of the type you're sending (since the second argument is the number of items of some MPI datatype).
Did the broadcast code change? It should broadcast 128 bytes if you ask it to broadcast 32 MPI_UNSIGNED_CHARs just like it'll transmit 128 bytes if you ask it to send 10 MPI_LONG_DOUBLEs...
from plfs-core.
Yes, nice job, indeed Alfred!
I see this code in ad_plfs_open.c:
#define BIT_ARRAY_LENGTH MAX_HOSTDIRS
typedef char Bitmap;
Bitmap bitmap[BIT_ARRAY_LENGTH/8];
And, here is MAX_HOSTDIRS:
mallorca:src brettk$ grep MAX_HOSTDIRS *.h
plfs_internal.h:#define MAX_HOSTDIRS 1024
So, it seems to me that since Bitmap is a char, we are assuming a char is 8 bits. Sure, it is now, but that's what we have "sizeof" for. So, we need to do this:
typedef char Bitmap;
Bitmap bitmap[BIT_ARRAY_LENGTH/sizeof( Bitmap )];
Then, we need to do bounds checking so that we don't go off either end of bitmap when using it. And, we need to do the calculation properly to tell MPI how big bitmap is.
Alfred: Did you plan to make the code fixes?
Thanks,
Brett
From: John Bent <[email protected]mailto:[email protected]>
Reply-To: plfs/plfs-core <[email protected]mailto:[email protected]>
Date: Tuesday, February 25, 2014 8:01 AM
To: plfs/plfs-core <[email protected]mailto:[email protected]>
Subject: Re: [plfs-core] MPI Abort during Read Phase of fs_test (#340)
Wow. Nice debugging.
On Feb 25, 2014, at 7:45 AM, Alfred Torrez <[email protected]mailto:[email protected]> wrote:
I was able to duplicate the problem on smog by increasing the number of hostdirs to 379 (same as cielo). By turning on the VERBOSE DEBUG compiler directive, I was able to see that nodes other than the rank 0 node were not seeing the full bitmap from MPI_Bcast.
What I discovered is that the number of bytes being transimitted was not 128 (full bitmap), but 32. This statement in code was the culprit:
int bitmap_bcast_sz = (BIT_ARRAY_LENGTH/8)/sizeof(MPI_UNSIGNED_CHAR);
this is the size used for broadcast
MPIBCAST(bitmap,bitmap_bcast_sz,MPI_UNSIGNED_CHAR,0,comm);
The problem is the sizeof MPI_UNSIGNED_CHAR is 4, so the resulting size was computing as follows: (1024/8)/4 = 32. I was only seeing 32 bytes being broadcast. So any hostdir > 256 (in the map) was not being broadcast and because of no bounds checking, the code kept running until if found a bit set, which was of course an illegal hostdir. Example MPI_Bcast code I have seen does not take into account the size of an MPI type so I am not sure why this was there in the first place. MPI must account for this internally.
Also, I am not sure why this worked previously on cielo. I browsed earlier versions of cray mpich with regard to the MPI_UNSIGNED_CHAR definition and that does not appear to have changed.
—
Reply to this email directly or view it on GitHub.
—
Reply to this email directly or view it on GitHubhttps://github.com//issues/340#issuecomment-36015829.
from plfs-core.
Brett,
Thanks, yes, I planned on making the changes this morning. I just wanted to get some feedback before I made them. I also wanted to ask Nathan Hjelm about this – not necessarily before I make the changes.
Alfred
From: Brett Kettering <[email protected]mailto:[email protected]>
Reply-To: plfs/plfs-core <[email protected]mailto:[email protected]>
Date: Tue, 25 Feb 2014 07:20:09 -0800
To: plfs/plfs-core <[email protected]mailto:[email protected]>
Cc: Alfred Torrez <[email protected]mailto:[email protected]>
Subject: Re: [plfs-core] MPI Abort during Read Phase of fs_test (#340)
Yes, nice job, indeed Alfred!
I see this code in ad_plfs_open.c:
#define BIT_ARRAY_LENGTH MAX_HOSTDIRS
typedef char Bitmap;
Bitmap bitmap[BIT_ARRAY_LENGTH/8];
And, here is MAX_HOSTDIRS:
mallorca:src brettk$ grep MAX_HOSTDIRS *.h
plfs_internal.h:#define MAX_HOSTDIRS 1024
So, it seems to me that since Bitmap is a char, we are assuming a char is 8 bits. Sure, it is now, but that's what we have "sizeof" for. So, we need to do this:
typedef char Bitmap;
Bitmap bitmap[BIT_ARRAY_LENGTH/sizeof( Bitmap )];
Then, we need to do bounds checking so that we don't go off either end of bitmap when using it. And, we need to do the calculation properly to tell MPI how big bitmap is.
Alfred: Did you plan to make the code fixes?
Thanks,
Brett
From: John Bent <[email protected]mailto:[email protected]mailto:[email protected]>
Reply-To: plfs/plfs-core <[email protected]mailto:[email protected]mailto:[email protected]>
Date: Tuesday, February 25, 2014 8:01 AM
To: plfs/plfs-core <[email protected]mailto:[email protected]mailto:[email protected]>
Subject: Re: [plfs-core] MPI Abort during Read Phase of fs_test (#340)
Wow. Nice debugging.
On Feb 25, 2014, at 7:45 AM, Alfred Torrez <[email protected]mailto:[email protected]mailto:[email protected]> wrote:
I was able to duplicate the problem on smog by increasing the number of hostdirs to 379 (same as cielo). By turning on the VERBOSE DEBUG compiler directive, I was able to see that nodes other than the rank 0 node were not seeing the full bitmap from MPI_Bcast.
What I discovered is that the number of bytes being transimitted was not 128 (full bitmap), but 32. This statement in code was the culprit:
int bitmap_bcast_sz = (BIT_ARRAY_LENGTH/8)/sizeof(MPI_UNSIGNED_CHAR);
this is the size used for broadcast
MPIBCAST(bitmap,bitmap_bcast_sz,MPI_UNSIGNED_CHAR,0,comm);
The problem is the sizeof MPI_UNSIGNED_CHAR is 4, so the resulting size was computing as follows: (1024/8)/4 = 32. I was only seeing 32 bytes being broadcast. So any hostdir > 256 (in the map) was not being broadcast and because of no bounds checking, the code kept running until if found a bit set, which was of course an illegal hostdir. Example MPI_Bcast code I have seen does not take into account the size of an MPI type so I am not sure why this was there in the first place. MPI must account for this internally.
Also, I am not sure why this worked previously on cielo. I browsed earlier versions of cray mpich with regard to the MPI_UNSIGNED_CHAR definition and that does not appear to have changed.
—
Reply to this email directly or view it on GitHub.
—
Reply to this email directly or view it on GitHubhttps://github.com//issues/340#issuecomment-36015829.
—
Reply to this email directly or view it on GitHubhttps://github.com//issues/340#issuecomment-36018052.
from plfs-core.
Fixed Broadcast Size by calling MPI_Type_size instead of using sizeof(MPI_Type). Added bounds checking when determining if hostdir bitmap bit is set. Fix merged into Master
from plfs-core.
Related Issues (20)
- PLFSRC indication of index type for mount point HOT 1
- Move reference counting out of libplfs and into FUSE
- Production-harden code
- Patching issue with openmpi-1.7.4
- Error in PLFS N-N files' metadata HOT 1
- Race conditions in Writefile
- Index files aren't sync'd when user calls plfs_sync HOT 2
- plfs_query not working for physical to logical? HOT 3
- plfsrc parsing error HOT 1
- renaming an open file: data structure updates incomplete
- renaming an open file: locking data structures and concurrency
- multithreaded mkdir can fail with multiple backends
- plfs_writex
- VERSION and VERSION.layout files deprecated
- flatfile rename issue HOT 3
- plfs_read not working with small files HOT 5
- plfs shouldn't have to patch romio HOT 1
- is this project dead? HOT 2
- traces for reference not on lanl.gov anymore
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from plfs-core.