1

The scenario is a file read into an unsigned char buffer, the buffer is put into an istringstream, and the lines in it iterated through.

istringstream data((char*)buffer);

char line[1024]
while (data.good()) {
     data.getline(line, 1024);

     [...]
}

if (
    data.rdstate() & (ios_base::badbit | ios_base::failbit)
) throw foobarException (

Initially, it was this foobarException which was being caught, which didn't say much because it was a very unlikely case -- the file is /proc/stat, the buffer is fine, and only the first few lines are actually iterated this way then the rest of the data is discarded (and the loop broken out of). In fact, that clause has never fired previously.1

I want to stress the point about about only the first few lines being used, and that in debugging etc. the buffer obviously still has plenty of data left in it before the failure, so nothing is hitting EOF.

I stepped through with a debugger to check the buffer was being filled from the file appropriately and each iteration of getline() was getting what it should, right up until the mysterious point of failure -- although since this was a fatal error, there was not much more information to get at that point. I then changed the above code to trap and report the error in more detail:

istringstream data((char*)buffer);
data.exceptions(istringstream::failbit | istringstream::badbit);

char line[1024];
while (data.good()) {
    try { data.getline(line, 1024); }
    catch (istringstream::failure& ex) {       

And suddenly things changed -- rather than catching and reporting an error, the process was dying through SIGABRT inside the try. A backtrace looks like this:

#0  0x00007ffff6b2fa28 in __GI_raise (sig=sig@entry=6)
    at ../sysdeps/unix/sysv/linux/raise.c:55
#1  0x00007ffff6b3162a in __GI_abort () at abort.c:89
#2  0x00007ffff7464add in __gnu_cxx::__verbose_terminate_handler ()
    at ../../../../libstdc++-v3/libsupc++/vterminate.cc:95
#3  0x00007ffff7462936 in __cxxabiv1::__terminate (handler=<optimized out>)
    at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:47
#4  0x00007ffff7462981 in std::terminate ()
    at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:57
#5  0x00007ffff7462b99 in __cxxabiv1::__cxa_throw (obj=obj@entry=0x6801a0, 
    tinfo=0x7ffff7749740 <typeinfo for std::ios_base::failure>, 
    dest=0x7ffff7472890 <std::ios_base::failure::~failure()>)
    at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:87
#6  0x00007ffff748b9a6 in std::__throw_ios_failure (
    __s=__s@entry=0x7ffff7512427 "basic_ios::clear")
    at ../../../../../libstdc++-v3/src/c++11/functexcept.cc:126
#7  0x00007ffff74c938a in std::basic_ios<char, std::char_traits<char> >::clear
    (this=<optimized out>, __state=<optimized out>)
---Type <return> to continue, or q <return> to quit---
    at /usr/src/debug/gcc-5.3.1-20160406/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/bits/basic_ios.tcc:48
#8  0x00007ffff747a74f in std::basic_ios<char, std::char_traits<char> >::setstate (__state=<optimized out>, this=<optimized out>)
    at /usr/src/debug/gcc-5.3.1-20160406/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/bits/basic_ios.h:158
#9  std::istream::getline (this=0x7fffffffcd80, __s=0x7fffffffcd7f "", 
    __n=1024, __delim=<optimized out>)
    at ../../../../../libstdc++-v3/src/c++98/istream.cc:106
#10 0x000000000041225a in SSMlinuxMetrics::cpuModule::getLevels (this=0x67e040)
    at cpuModule.cpp:179

cpuModule.cpp:179 is the try { data.getline(line, 1024) }.

According to these couple of questions:

It sounds like there are really only two possibilities here:

  1. I've gone out of bounds somewhere and corrupted the istringstream instance.

  2. There's a bug in the library.

Since 2 seems unlikely and I can't find a case for #1 -- e.g., run in valgrind there are no errors before the abort:

==8886== Memcheck, a memory error detector
==8886== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==8886== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==8886== Command: ./monitor_demo
==8886== 
terminate called after throwing an instance of 'std::ios_base::failure'
  what():  basic_ios::clear
==8886== 
==8886== Process terminating with default action of signal 6 (SIGABRT)
==8886==    at 0x5D89A28: raise (raise.c:55)
==8886==    by 0x5D8B629: abort (abort.c:89)

And (of course) "it has been working fine up until now", I'm stumped.

Beyond squinting at code and trying to isolate paths until I find the problem or have an SSCCE demonstrating a bug, is there anything I'm ignorant of that might provide a quick solution?


1. The project is an incomplete one I've come back to after a few months, during which time I know glibc was upgraded on the system.

Community
  • 1
  • 1
CodeClown42
  • 10,460
  • 1
  • 29
  • 59
  • i bet it's an ABI change of gcc 5 problem – strangeqargo May 23 '16 at 15:17
  • maybe some line is longer then 1024? (getline sets failbit then). It could happen in some /proc files. Second crash inside try can be caused by catching via reference, not const reference. – Hcorg May 23 '16 at 15:44
  • @strangeqargo I think you are correct, see my answer (the system had been up for 8 days and not rebooted subsequent to the libc update). – CodeClown42 May 23 '16 at 16:09

1 Answers1

2

I believe strangeqargo's guess, that it was because of an ABI compatibility introduced by the libc upgrade is correct. The system had been up for 8 days and the update had occurred during that time.

After a reboot, and with no changes what-so-ever to the code, it compiles and runs without error. I tested on another system as well, same result.

Probably the moral is if you notice glibc has been updated, reboot the system...

CodeClown42
  • 10,460
  • 1
  • 29
  • 59