0

In the following code I'm trying to find the frequencies of the rows in fileA which have the same value on the second column. (each row has two column and both are integers.) Sample of fileA:

1   22
8   3
9   3

I have to write the output in fileB like this:

22   1
3    2

Because element 22 has been repeated once in second column(and 3 repeated 2 times.)

fileA is very large(30G). And there are 41,000,000 elements in it(in other words, fileB has 41,000,000) rows. This is the code that I wrote:

void function(){

unsigned long int size = 41000000;
int* inDeg = new int[size];

for(int i=0 ; i<size; i++)
{
    inDeg[i] = 0;
}


ifstream input;
input.open("/home/fileA");

ofstream output;
output.open("/home/fileB");

int a,b;
    
while(!input.eof())
{
   input>>a>>b; 
   inDeg[b]++; //<------getting error here.
}
input.close();


for(int i=0 ; i<size; i++)
{
    output<<i<<"\t"<<inDeg[i]<<endl;
}

output.close();
delete[] inDeg;

}

I'm facing segmentation fault error on the second line of the while loop. On the 547387th iteration. I have already assigned 600M to the stack memory based on this. I'm using gcc 4.8.2 (on Mint17 x86_64).


Solved

I analysed fileA thoroughly. The reason of the problem as hyde mentioned wasn't with hardware. Segfault reason was wrong indexing. Changing the size to 61,500,000 solved my problem.

Community
  • 1
  • 1
wastepaper
  • 53
  • 6
  • Array indexing starts at 0, not 1. `inDeg[size]` does not point into the memory you've allocated. And `eof()` is set *after* you read, not before. You're checking it in the wrong place. – Cameron Oct 16 '14 at 14:21
  • There are no 0 elements for the b value in the file and they're all less than 41000000, right? – Marco A. Oct 16 '14 at 14:22
  • What is `b` when the crash happens? – crashmstr Oct 16 '14 at 14:23
  • Cameron:indexing of the array is not the case here. I rewrite the code with correct indexing. still the same problem. @crashmstr: actually the real value of size in my code is: 40171637. And at the time of error b=40172544 – wastepaper Oct 16 '14 at 14:27
  • 1
    `while(!input.eof())` this is wrong http://stackoverflow.com/questions/5605125/why-is-iostreameof-inside-a-loop-condition-considered-wrong – Neil Kirk Oct 16 '14 at 14:41
  • Are you looking for consecutive occurrences or total occurrences? – molbdnilo Oct 16 '14 at 14:50
  • following @crashmstr 's comment I wrote a code to find the the maximum value of the second column. It seems that the size of array should be 61,500,000 instead of 41,000,000. I'm new here what should I do? delete the question or edit it? – wastepaper Oct 16 '14 at 14:54
  • @wastepaper: Nothing wrong with editing in the problem and solution you discovered (if the solution you find matches the question you originally asked, you can even answer your own question). Note that your `while (!input.eof())` loop is still broken, even if it happens not to crash most of the time. – Cameron Oct 16 '14 at 15:19

2 Answers2

2

In the statement:

while(!input.eof())
{
   input>>a>>b; 
   inDeg[b]++;
}

Is b the index of your array?

When you read in the values:
1 22 You are discarding the 1 and incrementing the value at slot 22 in your array.

You should check the range of b before incrementing the value at inDeg[b]:

  while (input >> a >> b)
  {
    if ((b >= 0) && (b < size))
    {
      int c = inDeg[b];
      ++c;
      inDeg[b] = c;
    }
    else
    {
      std::cerr << "Index out of range: " << b << "\n";
    }
  }
Thomas Matthews
  • 52,985
  • 12
  • 85
  • 144
0

You are allocating a too huge array in to the heap. It´s a memory thing, your heap cant take that much space.

You should split your in and output in smaller parts, so at example create a for loop which goes every time 100k , deletes them and then does the next 100k.

in such cases try a exception handling, this is a example snippet how to manage exception checking for too huge arrays:

  int ii;

   double *ptr[5000000];



   try

   {

      for( ii=0; ii < 5000000; ii++)

      {

         ptr[ii] = new double[5000000];

      }

   }

   catch ( bad_alloc &memmoryAllocationException )

   {

      cout << "Error on loop number: " << ii << endl;

      cout << "Memory allocation exception occurred: "

           << memmoryAllocationException.what()

           << endl;

   }

   catch(...)

   }

      cout << "Unrecognized exception" << endl;

   {
Etixpp
  • 300
  • 1
  • 11
  • If it doesn't fit in the heap, what makes you think it will fit easier in the data section? In any case, it's only ~156MB and there shouldn't be any problem allocating on the heap. And if there were, `new` would throw `std::bad_alloc`. – Cameron Oct 16 '14 at 14:35
  • When I'm using int inDeg[size]; I'm facing Segmentation fault error at the beginning of the first for loop. – wastepaper Oct 16 '14 at 14:36
  • @Cameron i said try, i wasnt sure if it was bigger or smaller than heap, but thanks for the information its nice to know. Do you know how big the heap basicly is depending on what? I will update my answer. – Etixpp Oct 16 '14 at 14:38
  • Are you suggesting to make it a global variable (bad) or allocating it in *stack* (which is several orders of magnitude *smaller* than heap)?s – hyde Oct 16 '14 at 14:39
  • @hyce i suggested to try creating it on stack, it was bad, i already updated my answer. – Etixpp Oct 16 '14 at 14:42
  • I don't think this is the problem. new would throw an exception if there was not enough memory. – fhsilva Oct 16 '14 at 14:42
  • @Etixpp: It's not fixed. It depends on the OS and how much free memory is available. But at the end of the day, it's all [mapped into the same address space](http://duartes.org/gustavo/blog/post/anatomy-of-a-program-in-memory/) anyway :-) – Cameron Oct 16 '14 at 14:43
  • @fhsilva it works to 547387th iteration , this is a clear sign for too huge allocation. I see no other possible reason in this small code snippet. – Etixpp Oct 16 '14 at 14:43
  • @Etixpp: The allocation either succeeds, or doesn't. It doesn't allocate half the memory and then cause access violations when you access it later on. – Cameron Oct 16 '14 at 14:45
  • @fhsilva At least Linux by default overcommits memory, so allocation itself doesn't fail unless it is ridiculously big. It's just virtual memory after all. Failure will happen when a process actually tries to use a new memory page, and kernel doesn't have any real memory to map to it (at that point OOM killer kicks in, too). – hyde Oct 16 '14 at 14:48
  • following @crashmstr 's comment I wrote a code to find the the maximum value of the second column. It seems that the size of array should be 61,500,000 instead of 41,000,000. After correcting the size,I'm not facing that error anymore. I'm new here what should I do? delete the question or edit it? – wastepaper Oct 16 '14 at 14:55
  • @Cameron look at hyde – Etixpp Oct 16 '14 at 14:57
  • 1
    ~160MB allocation should be no big deal, btw. Also, setting it all to 0 does not crash, so clearly allocation was fine. – hyde Oct 16 '14 at 15:03
  • @hyde: Even in the case of overcommit, though, wouldn't the OOM killer just kill the process outright? Or does it result in a segfault? – Cameron Oct 16 '14 at 15:15
  • @Cameron Read more about OOM for example [here](http://linux-mm.org/OOM_Killer). Anyway, OOM Killer doesn't kill "the" process, it kills "a" process. It's basically random which process happens to make the actual memory page access which can't be served, so killing that random process would not be a very useful way to deal with it. – hyde Oct 16 '14 at 18:09