1

I would like to be able to read the data that I have into C++ and then start to do things to manipulate it. I am quite new but have a tiny bit of basic knowledge. The most obvious way of doing this that strikes me (and maybe this comes from using excel previously) would be to read the data into a 2d array. This is the code that I have so far.

#include <iostream>
#include <fstream>
#include <algorithm>
#include <string>
#include <sstream>

using namespace std;

string C_J;

int main()
{
    float data[1000000][10];

    ifstream C_J_input;
    C_J_input.open("/Users/RT/B/CJ.csv");

    if (!C_J_input) return -1;

    for(int row = 0; row <1000000; row++)
    {

        string line;
        getline(C_J_input, C_J, '?');
        if ( !C_J_input.good() )
            break;

        stringstream iss(line);

        for(int col = 0; col < 10; col++)
            {
            string val;
            getline(iss, val, ',');
            if (!iss.good() )
                break;

            stringstream converter(val);
            converter >> data[row][col];
        }
    }


    cout << data;

    return 0;
}

Once I have the data read in I would like to be able to read through it line by line and then pull analyse it, looking for certain things however I think that could probably be the topic of another thread, once I have the data read in.

Just let me know if this is a bad question in any way and I will try to add anything more that might make it better.

Thanks!

Taylrl
  • 2,187
  • 2
  • 25
  • 37
  • Well, I must admit that your question is mostly OK, but apparently you forgot to state the question itself. You have precisely stated what you have and what you want to have, but there's no word about what's currently wrong with it and what you are **currently** trying to fix. First glance at your code - you have an array, you open the filestream, read lines in a loop and parse them with stringstream, seems valid. So, what's actually wrong? What does not work in that current code? – quetzalcoatl Aug 18 '14 at 13:16
  • I guess I just wrote that maybe was naturally expecting it to be wrong! I created that as a merge of 2 snippets that I have. I am a little rusty and haven't been able to do anything yet other than create a successful build. How would I take the data in "data" and now work with it? (I might create a new thread for that however) – Taylrl Aug 18 '14 at 13:20
  • `float data[1000000][10];` this is 38mb of data and will almost certainly overflow the stack. Why don't you use `std::vector` and the `push_back` function to only allocate the amount of memory required to represent the file? – Neil Kirk Aug 18 '14 at 13:20
  • Instead of hardcoding the size of the file, use `while (getline(..)) {..}` instead, assuming rows are seperated by newlines. – Neil Kirk Aug 18 '14 at 13:22
  • 1
    it would make more sense to dump the whole file into a string, and then use a split method to first split by newlines, then split each line by the comma separate. – Kevin Aug 18 '14 at 14:22
  • In order to better help me to understand your answer below @Kevin, could you explain where these pieces of information would go once they have been split up? – Taylrl Aug 21 '14 at 12:26
  • 1
    In this case, they are in a vector of vectors of strings, so for each line there is a vector of strings each holding an cell of the csv. – Kevin Aug 21 '14 at 12:30
  • You are brilliant @Kevin. I am slowly starting to understand this now – Taylrl Aug 21 '14 at 12:59

2 Answers2

2

as request of the asker, this is how you would load it into a string, then split into lines, and then further split into elements:

#include <iostream>
#include <string>
#include <fstream>
#include <vector>
#include <sstream>


//This takes a string and splits it with a delimiter and returns a vector of strings
std::vector<std::string> &SplitString(const std::string &s, char delim, std::vector<std::string> &elems)
{
    std::stringstream ss(s);
    std::string item;
    while (std::getline(ss, item, delim))
    {
        elems.push_back(item);
    }
    return elems;
}


int main(int argc, char* argv[])
{

    //load the file with ifstream
    std::ifstream t("test.csv");
    if (!t)
    {
        std::cout << "Unknown File" << std::endl;
        return 1;
    }

    //this is just a block of code designed to load the whole file into one string
    std::string str;

    //this sets the read position to the end
    t.seekg(0, std::ios::end);
    str.reserve(t.tellg());//this gives the string enough memory to allocate up the the read position of the file (which is the end)
    t.seekg(0, std::ios::beg);//this sets the read position back to the beginning to start reading it

    //this takes the everything in the stream (the file data) and loads it into the string.
    //istreambuf_iterator is used to loop through the contents of the stream (t), and in this case go up to the end.
    str.assign((std::istreambuf_iterator<char>(t)),
        std::istreambuf_iterator<char>());
    //if (sizeof(rawData) != *rawSize)
    //  return false;

    //if the file has size (is not empty) then analyze
    if (str.length() > 0)
    {
        //the file is loaded

        //split by delimeter(which is the newline character)
        std::vector<std::string> lines;//this holds a string for each line in the file
        SplitString(str, '\n', lines);

        //each element in the vector holds a vector of of elements(strings between commas)
        std::vector<std::vector<std::string> > LineElements;



        //for each line
        for (auto it : lines)
        {
            //this is a vector of elements in this line
            std::vector<std::string> elementsInLine;

            //split with the comma, this would seperate "one,two,three" into {"one","two","three"}
            SplitString(it, ',', elementsInLine);

            //take the elements in this line, and add it to the line-element vector
            LineElements.push_back(elementsInLine);
        }

        //this displays each element in an organized fashion

        //for each line
        for (auto it : LineElements)
        {
            //for each element IN that line
            for (auto i : it)
            {
                //if it is not the last element in the line, then insert comma
                if (i != it.back())
                    std::cout << i << ',';
                else
                    std::cout << i;//last element does not get a trailing comma
            }
            //the end of the line
            std::cout << '\n';
        }
    }
    else
    {
        std::cout << "File Is empty" << std::endl;
        return 1;
    }

    system("PAUSE");
    return 0;
}
Kevin
  • 404
  • 3
  • 10
  • Sorry @Kevin, I don't really understand. As I understand it, you initially set up a vector that consists of strings .......(and then I lose the plot). Apologies but I think I would literally need a line by line description of what this is doing. I think it might be back to the textbooks for me for now :-( – Taylrl Aug 19 '14 at 12:21
  • 1
    Sorry to be late, but I added some comments to clarify. – Kevin Aug 19 '14 at 23:03
  • No problem @Kevin. Unfortunately I think this is a bit advanced for me. I have never encountered `seekg` before and know nothing about `stream buffers` – Taylrl Aug 21 '14 at 12:20
  • 1
    That I took directly from here: http://stackoverflow.com/questions/2602013/read-whole-ascii-file-into-c-stdstring as a solution to reading a whole file into a string in one shot, but I can clarify what each line does. – Kevin Aug 21 '14 at 12:24
  • Which line, can you copy/paste to give some context. – Kevin Aug 21 '14 at 13:28
  • The problem I am having is when I want to add the elements in the line into the line-element vector on this line `elements.push_back(elementsInLine);` I get an error saying `No viable conversion from vector to value_type (aka 'char')` I thought it was a problem with the way that I was reading in a .csv so I saved the input file as a .txt so the information is `string` but the error is still there. Any ideas? – Taylrl Aug 21 '14 at 16:36
  • 1
    I had a syntax error, because I changed `elements` to `LineElements` but not in every instance, it should work fine now. – Kevin Aug 21 '14 at 17:17
  • Thanks @Kevin you have been amazingly helpful and I appreciate it hugely. As I understand it I now have `lines` for the lines and `LineElements` for the "cells". I am just wondering however if you can help me to understand the first line. I am aware that `&` is a pointer to a place in memory but I can't follow the logic on how you set up that vector. – Taylrl Aug 22 '14 at 08:33
  • Also, this information originally comes from excel where the first line has the titles for each column. Is there someway that I can name each column with this information whilst still preserving the rows? – Taylrl Aug 22 '14 at 10:03
  • 1
    @Taylr, they way I designed this, you pass the empty vector into SplitString, and it fills it for you. the `&` is used to pass the vector by reference, so it can be modified within the split method. Technically, I did not need to to return `std::vector&`, but I did that out of personal preference. – Kevin Aug 22 '14 at 13:03
1

On second glance, I've noticed few obvious issues which will slow your progress greatly, so I'll drop them here:

1) you are using two disconnected variables for reading the lines:

  • C_J - which receives data from getline function
  • line - which is used as the source of stringstream

I'm pretty sure that the C_J is completely unnecessary. I think you wanted to simply do

getline(C_J_input, line, ...)  // so that the textline read will fly to the LINE var
// ...and later
stringstream iss(line); // no change

or, alternatively:

getline(C_J_input, C_J, ...)  // no change
// ...and later
stringstream iss(C_J); // so that ISS will read the textline we've just read

elsewise, the stringstream will never see what getline has read form the file - getline writes the data to different place (C_J) than the stringstream looks at (line).

2) another tiny bit is that you are feeding a '?' into getline() as the line separator. CSVs usually use a 'newline' character to separate the data lines. Of course, your input file may use '?' - I dont know. But if you wanted to use a newline instead then omit the parameter at all, getline will use default newline character matching your OS, and this will probably be just OK.

3) your array of float is, um huge. Consider using list instead. It will nicely grow as you read rows. You can even nest them, so list<list<float>> is also very usable. I'd actually probably use list<vector<float>> as the number of columns is constant though. Using a preallocated huge array is not a good idea, as there always be a file with one-line-too-much you know and ka-boom.

4) your code contains a just-as-huge loop that iterates a constant number of times. A loop itself is ok, but the linecount will vary. You actually don't need to count the lines. Especially if you use list<> to store the values. Just like you;ve checked if the file is properly open if(!C_J_input), you may also check if you have reached End-Of-File:

if(C_J_input.eof())
    ; // will fire ONLY if you are at the end of the file.

see here for an example

uh.. well, that's for start. Goodluck!

Community
  • 1
  • 1
quetzalcoatl
  • 27,938
  • 8
  • 58
  • 94