1

I am writing a code to read in large .xyz files. These type of files are useful for Molecular Dynamics visualizers like VMD. So the file format looks something like this

#Number of particles
#frame number
#Coordinates

As an example:

5
0
C    1.23    2.33    4.56
C    1.23    2.33    5.56
C    1.23    2.33    6.56
C    1.23    2.33    7.56
C    1.23    2.33    8.56
5
1
C    2.23    2.33    4.56
C    2.23    3.33    5.56
C    2.23    4.33    6.56
C    2.23    5.33    7.56
C    2.23    6.33    8.56

and so on. I was trying to understand this post here https://codereview.stackexchange.com/questions/201743/processing-xyz-data-from-a-large-file which talks about efficiently reading from large datasets using operator overloading method. I am trying to write a class which can read such large trajectory files and give me the following outputs : 1) number of particles 2) Total number of frames 3)set of Coordinates at each time step. So i have tried to write down the following based on this post to read in the the file format as mentioned above. So far the code below is able to read a single frame and exits after that.

#include <iostream>
#include <vector>
#include <fstream>

struct Particle{

    long double x,y,z;
    char tab ='\t';
    char newline = '\n';
    char atom ;
    friend std::istream& operator>>(std::istream& in, Particle &xyz) {
        in >> xyz.atom >> xyz.x >> xyz.y >> xyz.z ;
        return in;
    }
    friend std::ostream& operator<<(std::ostream& out, Particle &xyz){
        out << xyz.x << xyz.tab << xyz.y << xyz.tab << xyz.z << xyz.newline;
        return out;
    }
};
class XYZ_frame_read
{

    int curr_frame;
    int num_particles;
    std::vector<Particle> coordinates_t;

    public:

    friend std::istream& operator>>(std::istream& in, XYZ_frame_read &traj ){

                in >> traj.num_particles;
                in >> traj.curr_frame;
                Particle p;
                while(in >> p){
                    traj.coordinates_t.push_back(p);
                }
            return in;
        }
    friend std::ostream& operator<<(std::ostream& out, XYZ_frame_read &traj){

            for(int i = 0; i< traj.num_particles ;i ++){
                out << traj.coordinates_t.at(i) ;
            }
            return out;
        }
};

int main(int argc, char *argv[]){

    std::ifstream in(argv[1]);
    XYZ_frame_read* frames = new XYZ_frame_read[3];
    in >> frames[0];
    std::cout << frames[0];

    return 0;
}

The problem is I don't understand how will I will implement this method to read the next frames and keep appending them to the coordinates_t vector for each instance of the object XYZ_frame_read. I think I understand how this works so obviously a while(!in.eof()) is out of question, since it'll just read the first frame over and over again. I am a newbie to c++ and am working on a Molecular dyanamics related project, any changes/suggestions are welcome!! Thank you for the help!

EDIT

I have tried using

size_t i = 0;
while(in >> frames[i]){
    std::cout << frames[i];
    if(i == 3){
        break;
    }
    i++;
}

It returns blank. It doesn't work. The loop doesn't even get executed.

  • It looks like you're missing one "level"in your model – a trajectory is a sequence of frames; a frame is a collection of particles. – molbdnilo May 18 '19 at 10:13
  • Yes. The class ```XYZ_frame_read``` reads in a particular frame and stores the coordinates. For now in the main function I can initialize a ```XYZ_frame_read``` object as I have shown in my code. ```XYZ_frame_read* frames = new XYZ_frame_read[3];```. I have just considered 3 frames for now. Hope this clears it up – Rohan Mittal May 18 '19 at 10:43

2 Answers2

0

while(!in.eof()) is out of the question because eof doesn't work like that.

Why is iostream::eof inside a loop condition (i.e. `while (!stream.eof())`) considered wrong?

I'm not sure I see the problem, what's wrong with

size_t i = 0;
while (in >> frames[i])
    ++i;

(apart from the possibility of array bounds errors).

EDIT

This code is incorrect

 friend std::istream& operator>>(std::istream& in, XYZ_frame_read &traj) {
     in >> traj.num_particles;
     in >> traj.curr_frame;
     Particle p;
     while(in >> p){
          traj.coordinates_t.push_back(p);
     }
     return in;
 }

This says keep reading particles until a read fails. That's incorrect, you know how many particles there are. It should say keep reading particles until you have read num_particles of them (or a read fails). I.e. it should say

 friend std::istream& operator>>(std::istream& in, XYZ_frame_read &traj) {
     in >> traj.num_particles;
     in >> traj.curr_frame;
     Particle p;
     for (int i = 0; i < traj.num_particles && in >> p; ++i) 
          traj.coordinates_t.push_back(p);
     }
     return in;
 }
john
  • 71,156
  • 4
  • 49
  • 68
  • I tried this too. It didn't work. Not sure why. The while loop doesn't even get executed. – Rohan Mittal May 18 '19 at 08:36
  • @RohanMittal If the while loop fails to execute it's because the read has failed. There's obviously various reasons for that, but given your code the first thing I'd look at is that your logic for reading this format is not correct. – john May 18 '19 at 14:01
  • @RohanMittal OK I see the flaw, I'll edit my answer. – john May 18 '19 at 14:02
  • But shouldn't the while loop take care of this? Since I am only iterating as long as the input ```in``` reads the object ```p```. The while loop will stop executing once it reaches the end of frame and exit(since at the end of frame, it will not encounter input of the type ```p```). So does this mean once the read ends(from while loop) the stream gets closed or the file pointer gets reset? – Rohan Mittal May 18 '19 at 14:33
  • Given the edit you suggest, the code perfectly works fine with the while loop. But i am not able to understand why it works and how is it different in from the while loop – Rohan Mittal May 18 '19 at 14:38
  • @RohanMittal You think C++ I/O is smarter than it is. What will actually happens is that after the valid particles have been read, `5` will be read as the next atom, `1` will be read as the next x coordinate, only then will it fail because `C` cannot be read as a y coordinate (since it isn't numeric). So your code gets part way through reading the next frame before it fails. At that point you cannot recover because what's been mistakenly read does not get unread. – john May 18 '19 at 19:58
0

You are very close, you simply need to validate your input within your overloaded operator functions, and don't use new!, instead just use a std::vector<XYZ_frame_read> frames;

For example with your overload of istream for class XYZ_frame_read all you need is:

    friend std::istream& operator>>(std::istream& in, XYZ_frame_read &traj)
    {
        /* validate that num_particles and curr_frame read */
        if (in >> traj.num_particles >> traj.curr_frame) {
            int n = traj.num_particles; /* set number of particles to read */
            Particle p;

            while (n-- && (in >> p))    /* read that number of particles */
                traj.coordinates_t.push_back(p);
        }
        return in;
    }

Then in main() instead of allocating with new for frames as you do here:

    XYZ_frame_read* frames = new XYZ_frame_read[3];
    in >> frames[0];
    std::cout << frames[0];

just use a std::vector<XYZ_frame_read> frames; and then use a temporary class XYZ_frame_read to validate the read before adding it to the vector of frames, e.g.

    std::vector<XYZ_frame_read> frames; /* vector of frames (NO new!) */

    for (;;) {                      /* continual loop while good input */
        XYZ_frame_read tmp;         /* temp XYZ_frame_read for read */

        if ((in >> tmp))            /* if read is good */
            frames.push_back(tmp);  /* add it to vector of frames */
        else
            break;                  /* otherwise -- bail */
    }

For output, just use an auto-ranged for loop, e.g.

    for (auto & f : frames)         /* auto-ranged for loop to output frames */
        std::cout << "\nframe: " << f.get_frame() << 
                    "  particles: " << f.get_nparticles() << "\n\n" << 
                    f << '\n';

Putting it altogether, you would have:

#include <iostream>
#include <vector>
#include <fstream>

struct Particle {

    long double x,y,z;
    char tab ='\t';
    char newline = '\n';
    char atom ;

    friend std::istream& operator>>(std::istream& in, Particle &xyz) {
        in >> xyz.atom >> xyz.x >> xyz.y >> xyz.z;
        return in;
    }

    friend std::ostream& operator<<(std::ostream& out, Particle &xyz) {
        out << xyz.x << xyz.tab << xyz.y << xyz.tab << xyz.z << xyz.newline;
        return out;
    }
};

class XYZ_frame_read
{
    int curr_frame;
    int num_particles;
    std::vector<Particle> coordinates_t;

  public:

    friend std::istream& operator>>(std::istream& in, XYZ_frame_read &traj)
    {
        /* validate that num_particles and curr_frame read */
        if (in >> traj.num_particles >> traj.curr_frame) {
            int n = traj.num_particles; /* set number of particles to read */
            Particle p;

            while (n-- && (in >> p))    /* read that number of particles */
                traj.coordinates_t.push_back(p);
        }
        return in;
    }
    friend std::ostream& operator<<(std::ostream& out, XYZ_frame_read &traj) {

        for(int i = 0; i< traj.num_particles ;i ++)
            out << traj.coordinates_t.at(i) ;

        return out;
    }
    int get_frame(void) { return curr_frame; }
    int get_nparticles (void) { return num_particles; }
    int getpsize(void) { return coordinates_t.size(); }
};

int main(int argc, char *argv[]) {

    std::ifstream in(argv[1]);
    std::vector<XYZ_frame_read> frames; /* vector of frames (NO new!) */

    for (;;) {                      /* continual loop while good input */
        XYZ_frame_read tmp;         /* temp XYZ_frame_read for read */

        if ((in >> tmp))            /* if read is good */
            frames.push_back(tmp);  /* add it to vector of frames */
        else
            break;                  /* otherwise -- bail */
    }

    for (auto & f : frames)         /* auto-ranged for loop to output frames */
        std::cout << "\nframe: " << f.get_frame() << 
                    "  particles: " << f.get_nparticles() << "\n\n" << 
                    f << '\n';

    return 0;
    (void)argc;     /* suppress -Wunused warning */
}

There are huge number of advantages to using vector for class XYZ_frame_read instead of allocating with new. Auto memory-management is just the tip of the iceberg.

Example Input File

Using your sample input:

$ cat particles.txt
5
0
C    1.23    2.33    4.56
C    1.23    2.33    5.56
C    1.23    2.33    6.56
C    1.23    2.33    7.56
C    1.23    2.33    8.56
5
1
C    2.23    2.33    4.56
C    2.23    3.33    5.56
C    2.23    4.33    6.56
C    2.23    5.33    7.56
C    2.23    6.33    8.56

Example Use/Output

Just provide the filename and your std::vector<XYZ_frame_read> frames is automatically filled, regardless how many frames are present in your data file (up to the limits of your virtual-memory)

$ ./bin/paticle particles.txt

frame: 0  particles: 5

1.23    2.33    4.56
1.23    2.33    5.56
1.23    2.33    6.56
1.23    2.33    7.56
1.23    2.33    8.56


frame: 1  particles: 5

2.23    2.33    4.56
2.23    3.33    5.56
2.23    4.33    6.56
2.23    5.33    7.56
2.23    6.33    8.56
David C. Rankin
  • 69,681
  • 6
  • 44
  • 72
  • I think this is what I gather from @john's comment and your answer. The reason why the while loop fails in my example is that and the reason it doesn't read the next frame is, associated with the basic functioning of while loop, it reads until the condition is violated. So as @john mentions, once there is a misread there is no going back and the overloaded ```istream``` will not be able to identify the same pattern, since the file pointer has already been moved to the wrong position due to the while loop during the read of the first frame. – Rohan Mittal May 19 '19 at 08:24
  • You avoid that by problem by adding a validation step to the input and adding another condition to the while loop i.e ```n--``` which makes sure on the read of the each frame the file pointer doesn't move beyond the last coordinate of that particular frame. – Rohan Mittal May 19 '19 at 08:25
  • I am, new to c++. Could you please explain a little more on the auto ranged for loop to output frames – Rohan Mittal May 19 '19 at 08:26
  • Sure, the auto ranged-based for loop is a nice addition in C++11 that rather than having to loop `for (auto i = container.begin(); i != container.end(); i++)` allows you to simply do `for (auto& refname : container)` where `refname` will be a reference to each element in `container`. Your traditional [for loop](https://en.cppreference.com/w/cpp/language/for) compared to your [Range-based for loop](https://en.cppreference.com/w/cpp/language/range-for) is really slick for containers `:)` – David C. Rankin May 19 '19 at 08:57
  • I will need to read more to understand it better, but i get the essence. Thank you. – Rohan Mittal May 19 '19 at 19:17
  • You will get it. Start with a simple example, then you will understand, e.g. `std::string s ("hello world"); for (auto& c : s) std::cout << c << '\n';` Since `std::string` is just a container of characters, when you use the range-based `for` on the string, you loop over each char. It works the same way no matter what the container is. – David C. Rankin May 20 '19 at 00:12