3

We want to get the line/column of an xpath query result in pugixml :

pugi::xpath_query query_child(query_str);
std::string value = Convert::toString(query_child.evaluate_string(root_node));

We can retrieve the offset, but not the line/column :

unsigned int = query_child.result().offset;

If we re-parse the file we can convert offset => (line, column), but it's not efficient.

Is there an efficient method to achieve this ?

zeuxcg
  • 8,703
  • 1
  • 22
  • 33
Ghassen Hamrouni
  • 2,878
  • 2
  • 17
  • 31

1 Answers1

2
  1. result().offset is the last parsed offset in the query string; it will be equal to 0 if the query got parsed successfully; so this is not the offset in XML file.

  2. For XPath queries that return strings the concept of 'offset in XML file' is not defined - i.e. what would you expect for concat("a", "b") query?

  3. For XPath queries that return nodes, you can get the offset of node data in file. Unfortunately, due to parsing performance and memory consumption reasons, this information can't be obtained without reparsing. There is a task in the TODO list to make it easier (i.e. with couple of lines of code), but it's going to take a while.

So, assuming you want to find the offset of node that is a result of XPath query, the only way is to get XPath query result as a node set (query.evaluate_node_set or node.select_single_node/select_nodes), get the offset (node.offset_debug()) and convert it to line/column manually.

You can prepare a data structure for offset -> line/column conversion once, and then use it multiple times; for example, the following code should work:

#include <vector>
#include <algorithm>
#include <cassert>
#include <cstdio>

typedef std::vector<ptrdiff_t> offset_data_t;

bool build_offset_data(offset_data_t& result, const char* file)
{
    FILE* f = fopen(file, "rb");
    if (!f) return false;

    ptrdiff_t offset = 0;

    char buffer[1024];
    size_t size;

    while ((size = fread(buffer, 1, sizeof(buffer), f)) > 0)
    {
        for (size_t i = 0; i < size; ++i)
            if (buffer[i] == '\n')
                result.push_back(offset + i);

        offset += size;
    }

    fclose(f);

    return true;
}

std::pair<int, int> get_location(const offset_data_t& data, ptrdiff_t offset)
{
    offset_data_t::const_iterator it = std::lower_bound(data.begin(), data.end(), offset);
    size_t index = it - data.begin();

    return std::make_pair(1 + index, index == 0 ? offset : offset - data[index - 1]);
}

This does not handle Mac-style linebreaks and does not handle tabs; this can be trivially added, of course.

zeuxcg
  • 8,703
  • 1
  • 22
  • 33
  • Thanks, yes I want to find the offset of node. But the question is how to convert to line/column without re-parsing ? Maybe I must edit the pugixml code ? – Ghassen Hamrouni Jan 27 '11 at 16:45
  • And the answer is - you can't. You can edit pugixml code, but this won't be very easy - for performance reasons there is no lexer, so there is no single place where you can count newlines. Your best bet is reparsing; you can do a single pass reparsing once, building a std::map with key = newline offset and value = row index (increasing number); then you can use equal_range to convert offset to row+column. No need to modify pugixml code. – zeuxcg Jan 27 '11 at 17:38
  • 1
    But how to get the offset of an attribute ? – Ghassen Hamrouni Feb 24 '11 at 14:08