0

referring to http://www.linuxatemyram.com/ where it describes how disk i/o is buffered in memory. I'd like to account for this, and undo it actually, to measure how long it takes to write various amounts of data to file. What i am looking to do is understand run times based on i/o of two scenarios:

  1. to memory mounted as a ramdisk via mount -t tmpfs -o size=500g tmpfs /ramdisk
  2. writing data to /scratch/ where that is a 900GB seagate sas hard disk formatted as EXT3 or XFS

my program when writing files to hard disk is quite quick, and i'm sure it's because the linux operating system is buffering that data in RAM making the disk i/o transparent... because after my program completes if at the prompt i do a rm temp_speed* it takes a minute or more for that to happen.

For example if the program is writing 10gb, it takes 7 seconds to complete writing to ramdisk and 20 seconds to complete writing to /scratch. But I can do a rm temp* after writing to ramdisk and that completes within 1-3 seconds, whereas if it's on scratch the remove command takes over 90 seconds to complete.

anyone know how i can write the program below to know when disk i/o has fully completed, within the program? thanks.

Be forewarned, if you try running this code you risk quickly filling up your hard drive and/or memory and crashing your system.

# include <stdio.h>
# include <stdlib.h>
# include <time.h>

# define MAX_F  256

/* WARNING: running this code you risk quickly filling up */
/*          your hard drive and/or memory and crashing your system.  */

/* compile with -O0  no optimization */

long int get_time ( void )
{
   long int t;

   t = (long int) time( NULL );

   return( t );
}

void diff_time ( long int *start, long int *finish )
{
   time_t s, e;
   long int diff;
   long int h = 0, m = 0;

   s = (time_t) *start;
   e = (time_t) *finish;

   diff = (long int) difftime( e, s );

   h = diff / 3600;

   diff -= ( h * 3600 );

   m = diff / 60;

   diff -= ( m * 60 );

   printf(" problem runtime (h:m:s) = %02ld:%02ld:%02ld\n", h, m, diff );
   printf("\n");
}

int main ( int argc, char *argv[] )
{
    FILE *fp[MAX_F];
    char fname[MAX_F][64];
    long int a, i, amount, num_times, num_files;
    long int maxfilesize;
    long int start_time, end_time;
    double ff[512];     /* 512 doubles = 4096 = 4k worth of data */


    if ( argc != 3 )
    {
       printf("\n   usage: readwrite_speed  <total amount in gb> <max file size in gb>\n\n");
       exit( 0 );
    }

    system( "date" );

    amount = atol( argv[1] );
    maxfilesize = atol( argv[2] );

    if ( maxfilesize > amount )
       maxfilesize = amount;

    num_files = amount / maxfilesize;

    if ( num_files > MAX_F )
    {
       printf("\n   increase max # files abouve %d\n\n", MAX_F );
       exit( 0 );
    }

    num_times = ( amount * 1024 * 1024 * 1024 ) / 4096;

    num_times /= num_files;

    printf("\n");
    printf("   amount = %ldgb, num_times = %ld, num_files = %ld\n", amount, num_times, num_files );

    printf("\n");

    for ( i = 0; i < num_files; i++ )
       sprintf( fname[i], "temp_speed%03d", i );

    start_time = get_time();

    for ( i = 0; i < num_files; i++ )
    {
       fp[i] = fopen( fname[i], "wb" );
       if ( fp[i] == NULL )
           printf("   can't write binary %s\n", fname[i] );
    }

    for ( i = 0; i < 512; i++ )
       ff[i] = rand() / RAND_MAX;

    /* 1 gb = 262,144 times writing 4096 bytes */

    for ( a = 0; a < num_times; a++ )
    {
       for ( i = 0; i < num_files; i++ )
       {
          fwrite( ff, sizeof( double ), 512, fp[i] );
          fflush( fp[i] );
       }
    }

    for ( i = 0; i < num_files; i++ )
       fclose( fp[i] );

    end_time = get_time();

    diff_time( &start_time, &end_time );

    system( "date" );
}
ron
  • 889
  • 5
  • 18
  • 1
    First, don't use `fopen()` and `fwrite()`. Use `open()` and `write()`. That way there's no need for the overhead of `fflush()`, and you can use `O_DIRECT` to bypass the page cache. See the [`open.2` man page](http://man7.org/linux/man-pages/man2/open.2.html) for details on using `O_DIRECT` - and pay attention to them. Direct IO on Linux is a finicky beast. Finally, what are you trying to measure? How *fast* your disk can move data, or how many separate IO operations/second it can support? Those are not the same thing, especially for a physical, spinning disk. – Andrew Henle Apr 19 '16 at 18:16
  • looking a simple and reasonable way to to quantify and show anyone (non-computer minded people) the affect of disk i/o when running a program and whether that program accesses data from local ram or ramdisk, to local hard disk, to network share storage, and so on. And to show the difference between having to worry about 1 gb of data or less, to 500gb and more and the time scale involved. – ron Apr 20 '16 at 13:30
  • i am dealing with a few servers having 256gb and 512gb of ram, with only a single 300gb and/or 600gb 2.5" hard disk. the linux os (SLES11.4) is amazingly good at using ram to make up for slow disk write, and that's what is preventing me from getting a true elapsed time of how long it takes to write N gigabytes to my spinning hard drive. – ron Apr 20 '16 at 13:35

1 Answers1

0

First, look here.
You cannot know data is on disk with 100% guarantee.
You can use fsync to tell the kernel to flush data to disk. But disks, and disk controllers, have their own caches, so even the kernel cannot always know when it's 100% done.

Community
  • 1
  • 1
BitWhistler
  • 1,417
  • 6
  • 12