2

My XML File has:

< Package > xmlMetadata < /Package >

I am searching for a tag in this file and the text between the starting and closing tags of this has to be printed on console. i.e. in this case I want xmlMetadata to be printed on the console. Similarly it should go further in the file and print again if it encounters another < Package > tag in the same file.

Here is my code but it is printing the contents of the whole file:

{
    string line="< Package >";
    ifstream myfile (xmlFileName); //xmlFileName is xml file in which search is to done
    if (myfile.is_open())
    {
    while ( myfile.good() )
    {
      getline (myfile,line);
      std::cout<<line<< endl;
    }
    myfile.close();
    }
    else cout << "Unable to open file"; 
}

Displaying below my whole xml:

< ? xml version="1.0" ? >
< fileStructure >
< Main_Package >
   File_Navigate
< /Main_Package >
< Dependency_Details >

< Dependency >
   < Package >
      xmlMetadata
   < /Package >
   < Header >
      xmlMetadata.h
   < /Header >
   < Header_path >
      C:\Dependency\xmlMetadata\xmlMetadata.h
   < /Header_path >
   < Implementation >
      xmlMetadata.cpp
   < /Implementation >
   < Implementation_path >
      C:\Dependency\xmlMetadata\xmlMetadata.cpp
   < /Implementation_path >
< /Dependency >

< Dependency >
   < Package >
      xmlMetadata1
   < /Package >
   < Header >
      xmlMetadata1.h
   < /Header >
   < Header_path >
      C:\Dependency\xmlMetadata\xmlMetadata1.h
   < /Header_path >
   < Implementation >
      xmlMetadata1.cpp
   < /Implementation >
   < Implementation_path >
      C:\Dependency\xmlMetadata\xmlMetadata1.cpp
   < /Implementation_path >
< /Dependency >

< /Dependency_Details >
< /fileStructure >
karlphillip
  • 87,606
  • 33
  • 227
  • 395
tech_learner
  • 695
  • 10
  • 21
  • 31

3 Answers3

6

Getline doesn't search for a line it simply reads each line into the variable "line", you then have to search in that "line" for the text you want.

   size_t found=line.find("Package");
   if (found!=std::string::npos) {
       cout << line;

BUT this is a bad way to handle XML - there is nothing stopping the XML writer from breaking the tag onto multiple lines. Unless this is a one off and you create the file you really should use a general XML parser to read the file and give you a list of tags.

There are a bunch of very easy to use XML parsers, such as TinyXML

EDIT (different xml now posted) - that's the problem with using regex to parse xml, you don't know how the xml will break lines. You can keep adding more and more layers of complexity until you have written your own xml parser - just use one of What is the best open XML parser for C++?

Community
  • 1
  • 1
Martin Beckett
  • 90,457
  • 25
  • 178
  • 252
  • Thanks Martin, I tried the above method but its returning me "Package" on the command line. I want the text after < Package > I am now trying to play with this using substr(). – tech_learner Mar 26 '11 at 16:19
4

This is not the way you should parse an XML file, but since you don't want to use a parser library this code might get you started.

File: demo.xml

<? xml version="1.0" ?>
<fileStructure>
<Main_Package>
   File_Navigate
</Main_Package>
<Dependency_Details>

<Dependency>
   <Package>
      xmlMetadata
   </Package>
   <Header>
      xmlMetadata.h
   </Header>
   <Header_path>
      C:\Dependency\xmlMetadata\xmlMetadata.h
   </Header_path>
   <Implementation>
      xmlMetadata.cpp
   </Implementation>
   <Implementation_path>
      C:\Dependency\xmlMetadata\xmlMetadata.cpp
   </Implementation_path>
</Dependency>

<Dependency>
   <Package>
      xmlMetadata1
   </Package>
   <Header>
      xmlMetadata1.h
   </Header>
   <Header_path>
      C:\Dependency\xmlMetadata\xmlMetadata1.h
   </Header_path>
   <Implementation>
      xmlMetadata1.cpp
   </Implementation>
   <Implementation_path>
      C:\Dependency\xmlMetadata\xmlMetadata1.cpp
   </Implementation_path>
</Dependency>

</Dependency_Details>
</fileStructure>

The basic idea of the code is while you are reading each line of the file, strip the white spaces that are in the beginning and store the new-stripped-string into tmp, and then try to match it to one of the tags you are looking for. Once you find the begin-tag, keep printing the following lines until the close-tag is found.

File: parse.cpp

#include <iostream>
#include <string>
#include <fstream>

using namespace std;

int main()
{
    string line;
    ifstream in("demo.xml");

    bool begin_tag = false;
    while (getline(in,line))
    {
        std::string tmp; // strip whitespaces from the beginning
        for (int i = 0; i < line.length(); i++)
        {
            if (line[i] == ' ' && tmp.size() == 0)
            {
            }
            else
            {
                tmp += line[i];
            }
        }

        //cout << "-->" << tmp << "<--" << endl;

        if (tmp == "<Package>")
        {
            //cout << "Found <Package>" << endl;
            begin_tag = true;
            continue;
        }
        else if (tmp == "</Package>")
        {
            begin_tag = false;
            //cout << "Found </Package>" << endl;
        }

        if (begin_tag)
        {
            cout << tmp << endl;
        }
    }
}

Outputs:

xmlMetadata
xmlMetadata1
karlphillip
  • 87,606
  • 33
  • 227
  • 395
1

A single line of tags on a file can hardly be described as XML. Anyway, if you really want to parse a XML file, this could be accomplished so much easier using a parser library like RapidXML. This page is an excellent resource.

The code below is my attempt to read the following XML (yes, a XML file must have a header):

File: demo.xml

<?xml version="1.0" encoding="utf-8"?>
<rootnode version="1.0" type="example">
    <Package> xmlMetadata </Package>
</rootnode>

A quick note: rapidxml is consisted only of headers. On my system I unzipped the library to /usr/include/rapidxml-1.13, so the code below could be compiled with:

g++ read_tag.cpp -o read_tag -I/usr/include/rapidxml-1.13/

File: read_tag.cpp

#include <iostream>
#include <string>
#include <vector>
#include <fstream>
#include <rapidxml.hpp>

using namespace std;
using namespace rapidxml;


int main()
{
    string input_xml;
    string line;
    ifstream in("demo.xml");

    // read file into input_xml
    while(getline(in,line))
        input_xml += line;

    // make a safe-to-modify copy of input_xml
    // (you should never modify the contents of an std::string directly)
    vector<char> xml_copy(input_xml.begin(), input_xml.end());
    xml_copy.push_back('\0');

    // only use xml_copy from here on!
    xml_document<> doc;
    // we are choosing to parse the XML declaration
    // parse_no_data_nodes prevents RapidXML from using the somewhat surprising
    // behavior of having both values and data nodes, and having data nodes take
    // precedence over values when printing
    // >>> note that this will skip parsing of CDATA nodes <<<
    doc.parse<parse_declaration_node | parse_no_data_nodes>(&xml_copy[0]);

    // alternatively, use one of the two commented lines below to parse CDATA nodes,
    // but please note the above caveat about surprising interactions between
    // values and data nodes (also read http://www.ffuts.org/blog/a-rapidxml-gotcha/)
    // if you use one of these two declarations try to use data nodes exclusively and
    // avoid using value()
    //doc.parse<parse_declaration_node>(&xml_copy[0]); // just get the XML declaration
    //doc.parse<parse_full>(&xml_copy[0]); // parses everything (slowest)

    // since we have parsed the XML declaration, it is the first node
    // (otherwise the first node would be our root node)
    string encoding = doc.first_node()->first_attribute("encoding")->value();
    // encoding == "utf-8"

    // we didn't keep track of our previous traversal, so let's start again
    // we can match nodes by name, skipping the xml declaration entirely
    xml_node<>* cur_node = doc.first_node("rootnode");
    string rootnode_type = cur_node->first_attribute("type")->value();
    // rootnode_type == "example"

    // go straight to the first Package node
    cur_node = cur_node->first_node("Package");
    string content = cur_node->value(); // if the node doesn't exist, this line will crash

    cout << content << endl;
}

Outputs:

xmlMetadata

karlphillip
  • 87,606
  • 33
  • 227
  • 395
  • Hey Karl, Thanks for helping. But actually I cannot use any of the build modules/parsers like Rapidxml or Tinyxml in this program. I need to have my own. Also the xml is not a one tag xml, its has a lot many tags... I didn't displayed it here just to make my query easier. :) – tech_learner Mar 26 '11 at 16:23
  • Ok, so let me point out something obvious that will help you parse that string: tags don't have spaces on them! `< Package >` should be ``. Got it? – karlphillip Mar 26 '11 at 16:29
  • Yeah i am using it without space but if I post it in the same way here then its not displayed as tags... so I included space between the two tags... – tech_learner Mar 26 '11 at 16:33
  • That's just because you wasn't formmating the code correctly. I edited your post and did it for you. – karlphillip Mar 26 '11 at 16:35
  • 1
    I decided to leave this code here since it works. Others might find this approach useful. – karlphillip Mar 26 '11 at 17:12