1

This is a relatively simple question but I can't seem to find an answer. I need to read every character from a text file excluding spaces.

I currently have:

fstream inFile(fileName, ios::in);
char ch;
while (!inFile.eof()){
ch = inFile.get();

This is working for all letters and number but not special characters. What's an alternative I can use to read everything but spaces?

  • Have you tried reading the entire line and then process it (remove the spaces)? – ihavenoidea Oct 25 '18 at 02:16
  • 1
    What kind of special characters? Also [while eof is wrong](https://stackoverflow.com/questions/5605125/why-is-iostreameof-inside-a-loop-condition-considered-wrong). – n. 'pronouns' m. Oct 25 '18 at 02:18
  • 1
    Please clarify - are you expecting non-ASCII characters in the file? Also, what are you going to do with the input after reading it in? – Samantha Oct 25 '18 at 02:19
  • If your special characters are *Unicode*, then they don't fit in a single `char`... Are you on windows? – David C. Rankin Oct 25 '18 at 02:19
  • Anything that's not a number or letter such as & or *. – Mikey Thomas Oct 25 '18 at 02:19
  • The program is fully functioning in opening and reading the file until the end. I just need to replace the inFile.get() part of it with something that reads everything. Maybe a while loop that incorporates it not being a space? – Mikey Thomas Oct 25 '18 at 02:20
  • As long as it's just ASCII, then `while ((ch = inFile.get())) if (!isspace(ch)) /* do something with ch */` Also make sure you validate `if (!infile.is_open()) /* handle error */` – David C. Rankin Oct 25 '18 at 02:21
  • Please show a [mcve], including input that produces bad results. – n. 'pronouns' m. Oct 25 '18 at 02:25
  • @Mikey Thomas Can you add more details about how you use this data? Do you have to store it for further use or just use it at this point and then it can be discarded? And what is the data you are reading? – JiaHao Xu Oct 25 '18 at 08:52
  • It is hard to give a solution fit your needs without knowing the use cases. – JiaHao Xu Oct 25 '18 at 09:20

4 Answers4

1

Assuming the file is ASCII and contains no NULL characters the following method could be used.

size_t ReadAllChars(char const* fileName, char **ppDestination)
{
    //Check inputs
    if(!filename || !ppDestination)
    {
        //Handle errors;
        return 0;
    }

    //open file for reading
    FILE *pFile = fopen(fileName, "rb");

    //check file successfully opened
    if(!pFile)
    {
        //Handle error
        return 0;
    }

    //Seek to end of file (to get file length)
    if(_fseeki64(pFile, 0, SEEK_END))
    {
        //Handle error
        return 0;
    }

    //Get file length
    size_t fileLength = _ftelli64(pFile);
    if(fileLength == -1)
    {
        //Handle error
        return 0;
    }

    //Seek back to start of file
    if(_fseeki64(pFile, 0, SEEK_SET))
    {
        //Handle error
        return 0;
    }

    //Allocate memory to store entire contents of file
    char *pRawSource = (char*)malloc(fileLength);

    //Check that allocation succeeded
    if(!pRawSource)
    {
        //Handle error
        //return 0;
    }

    //Read entire file
    if(fread(pRawSource, 1, fileLength, pFile) != fileLength))
    {
        //Handle error
        fclose(pFile);
        free(pRawSource);
        return 0;
    }

    //Close file
    fclose(pFile);

    //count spaces
    size_t spaceCount = 0;
    for(size_t i = 0; i < fileLength; i++)
    {
        if(pRawSource[i] == ' ')
            ++spaceCount;
    }

    //allocate space for file contents not including spaces (plus a null terminator)
    size_t resultLength = fileLength - spaceCount;
    char *pResult = (char*)malloc(resultLength + 1)

    //Check allocation succeeded
    if(!pResult)
    {
        //Handle error
        free(pRawSource);
        return 0;
    }

    //Null terminate result
    pResult[resultLength] = NULL;

    //copy all characters except space into pResult
    char *pNextTarget = pResult;
    for(size_t i = 0; i < fileLength; i++)
    {
        if(pRawSource[i] != ' ')
        {
            *pNextTarget = pRawSource[i];
            ++pNextTarget;
        }
    }

    //Free temporary buffer
    free(pRawSource);

    *ppDestination = pResult;
    return resultLength;
}
Khan Maxfield
  • 379
  • 1
  • 10
  • This code literally reads everything from a text file except the spaces, and performs input validation and error checks. Nothing more. – Khan Maxfield Oct 25 '18 at 03:23
  • Granted this solution is implemented in C, but as far as I can see it does solve the question being asked and is a valid solution. – AdaRaider Oct 25 '18 at 03:25
  • This code might be literally the best code in the world bit it is *not useful for a student of C++ that has a problem stated in the question*. – n. 'pronouns' m. Oct 25 '18 at 03:25
  • Perhaps it would be more prudent to detail how this answer could be improved – AdaRaider Oct 25 '18 at 03:26
  • C++ is a superset of C which means that C code is by definition C++ code. What i didn't do is use the Standard Template Library. STL, while an excellent library, is by no means a simple solution to most problems. Just because the code may be shorter doesn't mean it is easier to understand or easier to debug. – Khan Maxfield Oct 25 '18 at 04:08
  • This program only has 3 problem s: 1. It copies the file content twice(in the worst case); 2. It do one continuous memory allocation, which can fail when the file is long enough compared to doing multiple allocation; 3. It is way too long... – JiaHao Xu Oct 25 '18 at 04:19
  • Besides, ```fseek``` could disrupt filesystem cache so it could be more expensive than you think. – JiaHao Xu Oct 25 '18 at 04:34
  • On one hand the code "does a whole bunch of things not being asked about" and on the other hand i'm not considering my impact on the file system cache. I think maybe the top priorities for a solution should be that it doesn't crash and that it returns either the correct answer or no answer at all. As far as i can tell this is the only solution so far that meets those criteria. – Khan Maxfield Oct 25 '18 at 04:47
  • @KhanMaxfield C++ is not a superset of C, it's a *superset of a subset of C* (which can be said about any two sets btw). Even if it were a superset of C, which it isn't, good C code is not necessarily good C++ code. A student of C++ will not benefit from learning the C-like subset first. – n. 'pronouns' m. Oct 25 '18 at 05:03
  • The deepest problem with this solution is not performance, but the fact that the program could fail to allocate memory even there is enough memory but not enough continuous memory when feeding large enough data. – JiaHao Xu Oct 25 '18 at 08:51
  • And the program will fail if the file is not a regular file, either because ```fseek``` is undefined, Ex. character device files like stdin, sockets, or it will wipe the previous data, Ex. pipes. – JiaHao Xu Oct 25 '18 at 09:06
  • You can find no way to solve these 2 problems except changing a solution. – JiaHao Xu Oct 25 '18 at 09:08
0

You should open the file in binary mode

Tony Thomas
  • 865
  • 8
  • 19
-1

Assuming you are using the default locale of C++, maybe try to put them into a std::string and let std::ifstream& operator >> (std::ifstream&, std::string&) and std::skipws do the magic (skip all spaces) for you?

#include <iostream>
#include <fstream>
#include <vector>
#include <string>
#include <utility>

int main(int, char* argv[]) 
{
    const char *filename = /* filename */;
    std::ifstream in{filename};
    if (in.fail()) {
        std::cerr << "Fails to open " << filename << std::endl;
        return 1;
    }
    /*
     * Actually, you can skip this line, because the default behavior of 
     * std::fstream and other stream is to skip all the white space before input.
     */
    in >> std::skipws;

    std::vector<std::string> stringv;
    // reserve to speed up, you can replace the new_cap with your guess
    stringv.reserve(10);

    std::string str;
    /*
     * while std::skipws tells the stream to skip all the white space before input, 
     * std::ifstream& operator >> (std::ifstream&, std::string&) will stop when a space is read.
     */
    while(in >> str)   
        stringv.push_back(std::move(str));
}

Edit:

I haven't tested this program yet, so there might be some compilation errors, but I am so sure that this method should works.

Using !in.eof() tests whether the eof is reached, but it doesn't test whether the extraction succeeds or not, which means you can get invalid data. in >> str fixs this because after the extraction the value of !in.fail() indicates whether the extraction from stream succeeds or not.

JiaHao Xu
  • 1,556
  • 8
  • 21
  • Non-working "read while not file.eof" pattern. No explanation of OP's mistake. – n. 'pronouns' m. Oct 25 '18 at 03:53
  • @nm What do you mean by saying "read while not file eof" is non-working? I am confused. – JiaHao Xu Oct 25 '18 at 04:04
  • The op wants to read everything from file except space, isn't this exactly what he wants? – JiaHao Xu Oct 25 '18 at 04:05
  • @nm The op asked this question because he doesn't know how to skip spaces, I didn't see that he/she made any mistake. – JiaHao Xu Oct 25 '18 at 04:06
  • No error handling. I can see at least 3 places where this program could crash. – Khan Maxfield Oct 25 '18 at 04:38
  • See [this](https://stackoverflow.com/questions/5605125/why-is-iostreameof-inside-a-loop-condition-considered-wrong) for more detailed explanation about while(eof) – n. 'pronouns' m. Oct 25 '18 at 05:11
  • Additionally since you appear to be concerned about how many times the data is copied you should know that this implementation requires at minimum Nlog(N) (where N is the length of the file) copies of the data if your "guess" is insufficient. – Khan Maxfield Oct 25 '18 at 05:15
  • @KhanMaxfield vector::push_back is amortised O(1). – n. 'pronouns' m. Oct 25 '18 at 05:24
  • @Khan Maxfield I used ```std::vector```, a good implementaion would use move constructor of ```std::string```. That means as long as move constructor of ```std::string``` is used, the time complexity is constant * (number_of_strings_created - 1), please consider reviewing geometric sequence, which is the theory why ```std::vector``` is O(1) on average. – JiaHao Xu Oct 25 '18 at 05:25
  • &JiaHao Xu I was about to retract my statement when i noticed that its a vector of strings and not a vector of chars. A quick google of the average length of an english word yielded ~5 characters. Therefore assuming a file containing average English text the cost of move constructing a list of strings containing the individual words within the text is actually even more expensive than moving the characters themselves as the pointer within a std::string is actually larger than the string it points to. – Khan Maxfield Oct 25 '18 at 06:07
  • @KhanMaxfield *the vector WILL resize Log(N) times* yes, but the cost of each resize is *not quite* N, so their multiple is not a good estimate of the overall cost. "A quick google of the average length of an english word yielded ~5 characters" Cost of string move construction is normally O(1) so length is irrelevant. – n. 'pronouns' m. Oct 25 '18 at 06:15
  • I suspect you don't actually know why OP asked the question. I certainly don't. There is no MCVE, and no indication of what is actually "not working". If they didn't make a mistake, then your answer is useless. If they did, and the mistake is somewhere in the code that they didn't post, then your answer is still useless. (Removed a bit about C vs C++, was meant for another answer). – n. 'pronouns' m. Oct 25 '18 at 06:16
  • @KhanMaxfield std::string also usually employs small-string optimisation, so no pointers are used in strings shorter than about 24 characters. – n. 'pronouns' m. Oct 25 '18 at 06:22
  • @Khan Maxfield The size of the content is not fixed. Given the first is a, the second is 2a, the third is 4a...the last is 2^(log2(N)), where N is the ultimate size of the container. You could not claim this is N*log(N), it is a geometric sequence and should be calculated using geometric sequence's formula. Please google that and do some calculations yourself. – JiaHao Xu Oct 25 '18 at 06:26
  • Please **do actually read** [how to handle eof](https://stackoverflow.com/questions/5605125/why-is-iostreameof-inside-a-loop-condition-considered-wrong). It's as simple as `while (in >> str) { stringv.push_back(std::move(str)); }`. No need for ifs or clears or whatever. – n. 'pronouns' m. Oct 25 '18 at 06:45
  • @nm ```clear()``` is for ```std::string```. Because moving it cause it to be in an unspecified state, it have to be cleared to be able to usable again. [Reusing a move container](https://stackoverflow.com/questions/9168823/reusing-a-moved-container). – JiaHao Xu Oct 25 '18 at 06:50
  • The state is unspecified but valid. There is no need to `clean` it. "the object’s invariants are met and operations on the object behave as specified for its type". `>>` is specified to unconditionally throw away the state and replace it with the new string. – n. 'pronouns' m. Oct 25 '18 at 07:00
  • @JiaHao Xu "Cost of string move construction is normally O(1) so length is irrelevant." The cost of move construction is O(1) yes but the cost of moving 100 strings containing 5 chars each is still greater than the cost of moving 500 chars because the size of a string is greater than the size of 5 chars. Assuming a small buffer optimization of 24 characters a std::string is (sizeof(dataPointer) + sizeof(smallBuffer) + sizeof(length)) or something like 40 bytes minimum. Therefore moving 100 strings is about 4000 bytes while 500 characters is just 500 bytes. – Khan Maxfield Oct 25 '18 at 08:08
  • @Khan Maxfield You misunderstood how small string optimization works in ```std::string```. Printing ```sizeof(std::string)``` gives out 24 not 40. It just use the memory allocated for pointer to do the optimization. – JiaHao Xu Oct 25 '18 at 08:26
  • I feel like we are arguing for nothing, since we don't even know the use case clearly. It can be an interesting question, but without specific context I feel like it is just pointless. – JiaHao Xu Oct 25 '18 at 08:53
-1

One of the simpler approaches would just start checking the ASCII of all each character that you are iterating on. If the ASCII value of the character is "20" (ASCII for SPACE) then skip it with "continue" otherwise just print it.

Ashish_KS
  • 9
  • 4
  • "This is working for all letters and number but not special characters." Your answer doesn't attempt to address this. No one shoild be interested in what is ASCII for space. The space character is spelled `' '` in C++. Not `20` or any other magic number. – n. 'pronouns' m. Oct 25 '18 at 05:30