23

I'm writing a c++ parser for a custom option file for an application. I have a loop that reads lines in the form of option=value from a text file where value must be converted to double. In pseudocode it does the following:

while(not EOF)
    statement <- read_from_file
    useful_statement <- remove whitespaces, comments, etc from statement
    equal_position <- find '=' in useful_statement
    option_str <- useful_statement[0:equal_position)
    value_str <- useful_statement[equal_position:end)
    find_option(option_str) <- double(value_str)

To handle the string splitting and passing around to functions, I use std::string_view because it avoids excessive copying and clearly states the intent of viewing segments of a pre-existing std::string. I've done everything to the point where std::string_view value_str points to the exact part of useful_statement that contains the value I want to extract, but I can't figure out the way to read a double from an std::string_view.

I know of std::stod which doesn't work with std::string_view. It allows me to write

double value = std::stod(std::string(value_str));

However, this is ugly because it converts to a string which is not actually needed, and even though it will presumably not make a noticeable difference in my case, it could be too slow if one had to read a huge amount of numbers from a text file.

On the other hand, atof won't work because I can't guarantee a null terminator. I could hack it by adding \0 to useful_statement when constructing it, but that will make the code confusing to a reader and make it too easy to break if the code is altered/refactored.

So, what would be a clean, intuitive and reasonably efficient way to do this?

Pavel_K
  • 8,216
  • 6
  • 44
  • 127
patatahooligan
  • 2,642
  • 16
  • 28
  • 1
    Are you ok with using boost? I think you can do this with `boost::convert(stringview);`. I got it off of here... last comment on the page https://github.com/boostorg/convert/issues/29 – Millie Smith Aug 11 '17 at 14:38
  • 1
    Nice find. It's probably `boost::convert(stringview, stringview.length())`, though. It's certainly cleaner than converting to a string and hopefully faster. The only downside is an additional dependency to boost. – patatahooligan Aug 11 '17 at 14:45
  • 1
    Off topic: Obviously this is psuedocode, but take care in how you implement "while(not EOF)". The trivial `while (!stream.eof())` has a few nasty gotchas. – user4581301 Aug 11 '17 at 16:09
  • user4581301 Usually, one should use something like `while ( stream << statement )` instead… – Arne Vogel Aug 14 '17 at 10:18
  • Please don't comment on the reading from stream part. I specifically wrote it in pseudocode to keep the discussions on point. – patatahooligan Aug 14 '17 at 14:01
  • @MillieSmith @patatahooligan The fix in `boost::convert` to support `std::string_view` is to copy the range to an array and NUL-terminate: https://github.com/boostorg/convert/commit/ab1a43676e04a7c73602e6d1cb2337ea5402c4df – Andreas Magnusson Jan 08 '21 at 09:30

2 Answers2

23

Since you marked your question with C++1z, then that (theoretically) means you have access to from_chars. It can handle your string-to-number conversion without needing anything more than a pair of const char*s:

double dbl;
auto result = from_chars(value_str.data(), value_str.data() + value_str.size(), dbl);

Of course, this requires that your standard library provide an implementation of from_chars.

Nicol Bolas
  • 378,677
  • 53
  • 635
  • 829
  • 3
    `from_chars()` takes a `double&` not a `double*`. Also that is such an awkward interface... given that nobody provides it, maybe it's a good opportunity to have it take a `string_view`... – Barry Aug 11 '17 at 14:54
  • 1
    This would be exactly what I'm looking for if it were implemented. I recommend changing the call to`from_chars(&value_str.front(), &value_str.back(), dbl)` for readability, btw. @Barry: there's not much reason to make it take `string_view` Working with `char*` makes it universal and it's trivial to get those from a `string_view`. – patatahooligan Aug 11 '17 at 15:00
  • 1
    @Barry: It would have been less awkward if `basic_string_view` guaranteed that its iterators were *pointers*. – Nicol Bolas Aug 11 '17 at 15:00
  • 4
    @patatahooligan: "*I recommend changing the call to*" That still doesn't work. `back` is the *last* character. It would need to be a pointer *past* the last character. – Nicol Bolas Aug 11 '17 at 15:02
  • Fixed multiple typos. Note that it returns a reference, not a value. – patatahooligan Aug 11 '17 at 15:03
  • @patatahooligan: `&back()` is not a pointer past the last character; it is a pointer *to* the last character. Your code would chop off the last character. – Nicol Bolas Aug 11 '17 at 15:04
  • 3
    @NicolBolas You don't even need the iterators, could just use `.data()` and `.data() + .size()` just the same as you're basically doing. – Barry Aug 11 '17 at 15:04
  • 3
    @patatahooligan Taking two arguments to refer to one thing is questionable, calling it "universal" is worse. – Barry Aug 11 '17 at 15:05
  • @patatahooligan That isn’t guaranteed to work; `string_view::iterator` might not be a pointer. If it isn’t, it doesn’t matter if you use `back` or `end` as others point out because it won’t compile. – Daniel H Aug 11 '17 at 15:05
  • A sorry I didn't notice it uses the sequence `[first,last)` rather than `[first,last]`. That's kind of annoying... – patatahooligan Aug 11 '17 at 15:05
  • @DanielH It's not an iterator according to cppreference.com. The problem is what Nicol Bolas noted. @Barry, it's universal because it works on anything eg `char[]`, `string` and `string_view` and even raw memory if you really want it to. – patatahooligan Aug 11 '17 at 15:08
  • I do believe Barry's usage of `.data()` is the cleanest and the most intuitive expression to use. – patatahooligan Aug 11 '17 at 15:11
  • @patatahooligan Oops, sorry, I missed that. As for Barry’s comment, all of those can be converted to `string_view` at least as easily as they can be converted to `char*`. For symmetry with `to_chars`, though, I think they made the right decision in having this be the signature (although I wouldn’t have been opposed to a `string_view` overload). – Daniel H Aug 11 '17 at 15:19
  • I just noticed that according to the standard, using `operator[](size_type pos)` with `pos >= size()` is undefined behavior meaning that an implementation is not required to evaluate `&value_str[0]` to something meaningful for empty `string_view`s. On the other hand, `data()` is always required to return a pointer such that the range `[data(); data() + size())` is valid and `from_chars` requires that the range `[first, last)` is valid. Please edit your answer to use `data()` and `data()+size()` to account for the empty `string_view` case. – patatahooligan Aug 23 '17 at 13:33
  • I can't find it on the status page this answer links to, but I can confirm that the libstdc++ provided with g++ 8.1 provides from_chars. – Jeffrey Bosboom May 12 '18 at 06:42
  • 5
    @JeffreyBosboom `double` version isn't implemented even in gcc-10. Another `` like fiasco. – Maxim Egorushkin Mar 11 '20 at 11:15
3

Headers:

#include <boost/convert.hpp>
#include <boost/convert/strtol.hpp>

Then:

std::string x { "aa123.4"};
const std::string_view y(x.c_str()+2, 5); // Window that views the characters "123.4".

auto value = boost::convert<double>(y, boost::cnv::strtol());
if (value.has_value())
{
    cout << value.get() << "\n"; // Prints: 123.4
}

Tested Compilers:

  • MSVC 2017

p.s. Can easily install Boost using vcpkg (defaults to 32-bit, second command is for 64-bit):

vcpkg install boost-convert
vcpkg install boost-convert:x64-windows

Update: Apparently, many Boost functions use string streams internally, which has a lock on the global OS locale. So they have terrible multi-threaded performance**.

I would now recommend something like stoi() with substr instead. See: Safely convert std::string_view to int (like stoi or atoi)

** This strange quirk of Boost renders most of Boost string processing absolutely useless in a multi-threaded environment, which is strange paradox indeed. This is the voice of hard won experience talking - measure it for yourself if you have any doubts. A 48-core machine runs no faster with many Boost calls compared to a 2-core machine. So now I avoid certain parts of Boost like the proverbial plague, as anything can have a dependency on that damn global OS locale lock.

Contango
  • 65,385
  • 53
  • 229
  • 279