2

I have a text file that contains nested objects and I need to preserve the relationship between them. How would I read them? I think I need to use a data structure like a tree whose nodes can have an arbitrary number of children (sort of like an n-ary tree without the 'n' limitation). Parsing the data and building the tree in memory is tripping me up.

The data in the text file is structured as follows:

{
    Element_A (3)
    Element_B (3,4)

    {
        Element_B (6,24)
        Element_A (1)
    }

    {
        Element_A (3)

        {
            Element_A (4)
            Element_B (12,6)
        }

        Element_B (1,4)
    }
}

EDIT: Just to clarify, the opening/closing braces enclose a single object and all its children. Element_A and Element_B above are parts of the same object.

So far, I parse the entire file into a vector of strings like so:

vector<string> lines;

ifstream file("input.txt");

string s;

while (getline(file, s))
    lines.push_back(s);

and read data from each line using something like the following

std::regex re(R"(Element_A \(\s*(\d+)\))");
std::smatch m;

if (std::regex_search(line, m, re) )
{
    // extract data from 'm'
}

EDIT 2: Scheff's solution adapted to my program.

// Node is defined somewhere at the top of the file
struct Node
{
    int a = 0;
    int b[2] = {0};
    std::vector<Node> children;
};

// this code is inside some function that does the parsing
Node root;
stack<Node*> nodeStack;
nodeStack.push(&root);

for(string line; getline(fin, line);)
{
    line = trim(line); // custom function to remove leading/trailing spaces/tabs (not included in this post for brevity)

    if (line.size() == 0) // empty line (data file might have empty lines for readability)
        continue;
    else if (line.size() == 1) // only one character
    {
        if (line[0] == '{')
        {
            nodeStack.top()->children.push_back(Node());
            nodeStack.push(&nodeStack.top()->children.back());
        }
        else if (line[0] == '}')
        {
            nodeStack.pop();
        }
        else 
            cerr << "Error: Invalid character detected.\n";
    }
    else // at least two characters
    {
        regex reEl_A(R"(Element_A \(\s*(\d+)\))");
        regex reEl_B(R"(Element_B \(\s*(\d+),\s*(\d+)\))");
        smatch m;

        if (std::regex_search(line, m, reEl_A))
        {
            nodeStack.top()->a = std::stoi(m[1]);
            continue;
        }    

        if (std::regex_search(line, m, reEl_B))
        {
            nodeStack.top()->b[0] = std::stoi(m[1]);
            nodeStack.top()->b[1] = std::stoi(m[2]);
            continue;
        }


    }
}

if (nodeStack.empty() || nodeStack.top() != &root)
{
    std::cerr << "ERROR! Data not well balanced.\n";
}
melanie93
  • 139
  • 8
  • 1
    I see. 1. read line 2. if first token is `{` push a new node into current and set it as current node, else if first token is `}` pop current node and set it's parent as current, else read `Element_`? and store it in current node, 3. goto 1. If you don't end up in the root (node?) then data was not well-balanced. (I would consider this as an error.) Concerning _set it's parent as current_: nodes may store it's parent node. Alternatively, the file reader could internally use a std::stack to remember parents. – Scheff's Cat Oct 12 '18 at 15:22

1 Answers1

1

This is how it could work:

  1. while read line does not fail continue
  2. for
    • "{" push a new node into current and set it as current node
    • "}" pop current node and set it's parent as current
    • "Element_A" parse values of a
    • "Element_B" parse value of b
  3. goto 1.

The nodes may store its parent. Alternatively, the file reader could internally use a std::stack to remember parents (what I did in the below sample code).

A sample program to sketch this:

#include <cstring>
#include <iomanip>
#include <iostream>
#include <stack>
#include <string>
#include <vector>

struct Node {
  std::pair<int, int> a;
  int b;
  std::vector<Node> children;
  Node(): a(0, 0), b(0) { }
};

std::ostream& operator<<(std::ostream &out, const Node &node)
{
  static unsigned indent = 0;
  out << std::setw(indent) << ""
    << "Node:"
    << " a(" << node.a.first << ", " << node.a.second << "),"
    << " b(" << node.b << ") {\n";
  indent += 2;
  for (const Node &child : node.children) out << child;
  indent -= 2;
  out << std::setw(indent) << ""
    << "}\n";
  return out;
}

void read(std::istream &in, Node &node)
{
  std::stack<Node*> nodeStack;
  nodeStack.push(&node);
  // nodeStack.top() is the (pointer to) current node
  for (std::string line; std::getline(in, line);) {
    if (line.compare(0, strlen("{"), "{") == 0) {
      nodeStack.top()->children.push_back(Node());
      nodeStack.push(&nodeStack.top()->children.back());
    } else if (line.compare(0, strlen("}"), "}") == 0) {
      nodeStack.pop();
    } else if (line.compare(0, strlen("Element_A"), "Element_A") == 0) {
      std::istringstream parser(line.substr(strlen("Element_A")));
      parser >> nodeStack.top()->a.first >> nodeStack.top()->a.second;
    } else if (line.compare(0, strlen("Element_B"), "Element_B") == 0) {
      std::istringstream parser(line.substr(strlen("Element_B")));
      parser >> nodeStack.top()->b;
    } // else ERROR!
  }
  if (nodeStack.empty() || nodeStack.top() != &node) {
    std::cerr << "ERROR! Data not well balanced.\n";
  }
}

const char *const sample =
"{\n"
"Element_A 3\n"
"Element_B 3 4\n"
"{\n"
"Element_B 6 24\n"
"Element_A 1\n"
"}\n"
"{\n"
"Element_A 3\n"
"{\n"
"Element_A 4\n"
"Element_B 12 6\n"
"}\n"
"Element_B 1 4\n"
"}\n"
"}\n";

int main()
{
  std::istringstream in(sample);
  Node root;
  read(in, root);
  std::cout << root;
  return 0;
}

Output:

Node: a(0, 0), b(0) {
  Node: a(3, 0), b(3) {
    Node: a(1, 0), b(6) {
    }
    Node: a(3, 0), b(1) {
      Node: a(4, 0), b(12) {
      }
    }
  }
}

Live Demo on coliru

Note:

The parsing has been done in a very simple ugly way. I found it sufficient as I wanted to sketch the node management.

Another approach for a parser could be found e.g. in Small Parser from Syntax Diagram or, may be, using the std::regex approach of OP.

Scheff's Cat
  • 16,517
  • 5
  • 25
  • 45
  • 1
    Thank you for the code. You basically handed me the solution. It took me a while to reply as I had to get my head around it first. – melanie93 Oct 12 '18 at 17:44
  • 1
    @melanie93 You're welcome. (I was a bit in hurry.) A pointer to `std::vector` elements is a bit dangerous as `std::vector::push_back()` may cause a re-allocation. In this case, the pointers are only hold for parent nodes and `push_back()` only for children of them. So, it should be safe. (I was a bit proud to do it completely without `new`.) ;-) – Scheff's Cat Oct 12 '18 at 17:58