0

We get order e-mails whenever a buyer makes a purchase; these e-mails are sent in a text format with some relevant and some irrelevant information. I am trying to write a python program which will read the text and then build an XML file (using ElementTree) which we can important into other software.

Unfortunately I do not quite know the proper terms for some of this, so please bear with the overlong explanations.

The problem is that I cannot figure out how to make it work with more than one product on the order. The program currently goes through each order and puts the data in a dictionary.

while file_length_dic != 0:
    #goes line by line and adds each value (and it's name) to a dictionary
    #keys are the first have a sentence followed by a distinguishing number
    for line in raw_email:
        colon_loc = line.index(':')
        end_loc = len(line)
        data_type = line[0:colon_loc] + "_" + file_length
        data_variable = line[colon_loc+2:end_loc].lstrip(' ')
        xml_dic[data_type] = data_variable
        if line.find("URL"):
            break
    file_lenght_dic -= 1

How can I get this dictionary values into XML? For example, under the main "JOB" element there will be a sub-element ITEMNUMBER and then SALESMANN and QUANTITY. How can I fill out multiple sets?

<JOB>
    <ITEM>
        <ITEMNUMBER>36322</ITEMNUMBER>
        <SALESMANN>17</SALESMANN>
        <QUANTITY>2</QUANTITY>
    </ITEM>
    <ITEM>
        <ITEMNUMBER>22388</ITEMNUMBER>
        <SALESMANN>5</SALESMANN>
        <QUANTITY>8</QUANTITY>
    </ITEM>
</JOB>

As far as I can tell, ElementTree will only let me but the data into the first set of children but I can't imagine this must be so. I also do not know in advance how many items are with each order; it can be anywhere from 1 to 150 and the program needs to scale easily.

Should I be using a different library? lxml looks powerful but again, I do not know what it is exactly I am looking for.

Justin Burgard
  • 360
  • 4
  • 16

2 Answers2

0

Here's a simple example. Note that the basic ElementTree doesn't pretty print, so I included a pretty print function from the ElementTree author.

If you provide an actual example of the input file and dictionary it would be easier to target your specific case. I just Put some data in a dictionary to show how to iterate over it and generate some XML.

from xml.etree import ElementTree as et

def indent(elem, level=0):
    i = "\n" + level*"  "
    if len(elem):
        if not elem.text or not elem.text.strip():
            elem.text = i + "  "
        if not elem.tail or not elem.tail.strip():
            elem.tail = i
        for elem in elem:
            indent(elem, level+1)
        if not elem.tail or not elem.tail.strip():
            elem.tail = i
    else:
        if level and (not elem.tail or not elem.tail.strip()):
            elem.tail = i

D = {36322:(17,2),22388:(5,8)}

job = et.Element('JOB')
for itemnumber,(salesman,quantity) in D.items():
    item = et.SubElement(job,'ITEMNUMBER').text = str(itemnumber)
    et.SubElement(job,'SALESMAN').text = str(salesman)
    et.SubElement(job,'QUANTITY').text = str(quantity)
indent(job)
et.dump(job)

Output:

<JOB>
  <ITEMNUMBER>36322</ITEMNUMBER>
  <SALESMAN>17</SALESMAN>
  <QUANTITY>2</QUANTITY>
  <ITEMNUMBER>22388</ITEMNUMBER>
  <SALESMAN>5</SALESMAN>
  <QUANTITY>8</QUANTITY>
</JOB>

Although as @alko mentioned, a more structured XML might be:

job = et.Element('JOB')
for itemnumber,(salesman,quantity) in D.items():
    item = et.SubElement(job,'ITEM')
    et.SubElement(item,'NUMBER').text = str(itemnumber)
    et.SubElement(item,'SALESMAN').text = str(salesman)
    et.SubElement(item,'QUANTITY').text = str(quantity)

Output:

<JOB>
  <ITEM>
    <NUMBER>36322</NUMBER>
    <SALESMAN>17</SALESMAN>
    <QUANTITY>2</QUANTITY>
  </ITEM>
  <ITEM>
    <NUMBER>22388</NUMBER>
    <SALESMAN>5</SALESMAN>
    <QUANTITY>8</QUANTITY>
  </ITEM>
</JOB>
Mark Tolonen
  • 132,868
  • 21
  • 152
  • 208
0

Your XML structure do not seem valid to me. How can one tell which salesman refers which itemnumber?

Probably, you need something like

<JOB>
    <ITEM>
        <NUMBER>36322</NUMBER>
        <SALESMANN>17</SALESMANN>
        <QUANTITY>2</QUANTITY>
    </ITEM>
    <ITEM>
        <NUMBER>22388</NUMBER>
        <SALESMANN>5</SALESMANN>
        <QUANTITY>8</QUANTITY>
    </ITEM>
</JOB>

For a list of serialization techniques, refer to Serialize Python dictionary to XML

Sample with dicttoxml:

import dicttoxml
from xml.dom.minidom import parseString

xml = dicttoxml.dicttoxml({'JOB':[{'NUMBER':36322,
                                    'QUANTITY': 2, 
                                    'SALESMANN': 17}
                                  ]}, root=False)
dom = parseString(xml)

and output

>>> print(dom.toprettyxml())
<?xml version="1.0" ?>
<JOB type="list">
        <item type="dict">
                <SALESMANN type="int">
                        17
                </SALESMANN>
                <NUMBER type="int">
                        36322
                </NUMBER>
                <QUANTITY type="int">
                        2
                </QUANTITY>
        </item>
</JOB>
Community
  • 1
  • 1
alko
  • 39,930
  • 9
  • 90
  • 97