9

I am not sure if I've been missing anything obvious, but I have not found anything documented about how one would go to insert Word elements (tables, for example) at some specific place in a document?

I am loading an existing MS Word .docx document by using:

my_document = Document('some/path/to/my/document.docx')

My use case would be to get the 'position' of a bookmark or section in the document and then proceed to insert tables below that point.

I'm thinking about an API that would allow me to do something along those lines:

insertion_point = my_document.bookmarks['bookmark_name'].position
my_document.add_table(rows=10, cols=3, position=insertion_point+1)

I saw that there are plans to implement something akin to the 'range' object of the MS Word API, this would effectively solve that problem. In the meantime, is there a way to instruct the document object methods where to insert the new elements?

Maybe I can glue some lxml code to find a node and pass that to these python-docx methods? Any help on this subject would be much appreciated! Thanks.

Apteryx
  • 4,279
  • 3
  • 12
  • 15

5 Answers5

14

I remembered an old adage, "use the source, Luke!", and could figure it out. A post from python-docx owner on its git project page also gave me a hint: https://github.com/python-openxml/python-docx/issues/7.

The full XML document model can be accessed by using the its _document_part._element property. It behaves exactly like an lxml etree element. From there, everything is possible.

To solve my specific insertion point problem, I created a temp docx.Document object which I used to store my generated content.

import docx
from docx.oxml.shared import qn
tmp_doc = docx.Document()

# Generate content in tmp_doc document
tmp_doc.add_heading('New heading', 1)
# more content generation using docx API.
# ...

# Reference the tmp_doc XML content
tmp_doc_body = tmp_doc._document_part._element.body
# You could pretty print it by using:
#print(docx.oxml.xmlchemy.serialize_for_reading(tmp_doc_body))

I then loaded my docx template (containing a bookmark named 'insertion_point') into a second docx.Document object.

doc = docx.Document('/some/path/example.docx')
doc_body = doc._document_part._element.body
#print(docx.oxml.xmlchemy.serialize_for_reading(doc_body))

The next step is parsing the doc XML to find the index of the insertion point. I defined a small function for the task at hand, which returns a named bookmark parent paragraph element:

def get_bookmark_par_element(document, bookmark_name):
"""
Return the named bookmark parent paragraph element. If no matching
bookmark is found, the result is '1'. If an error is encountered, '2'
is returned.
"""
doc_element = document._document_part._element
bookmarks_list = doc_element.findall('.//' + qn('w:bookmarkStart'))
for bookmark in bookmarks_list:
    name = bookmark.get(qn('w:name'))
    if name == bookmark_name:
        par = bookmark.getparent()
        if not isinstance(par, docx.oxml.CT_P): 
            return 2
        else:
            return par
return 1

The newly defined function was used toget the bookmark 'insertion_point' parent paragraph. Error control is left to the reader.

bookmark_par = get_bookmark_par_element(doc, 'insertion_point')

We can now use bookmark_par's etree index to insert our tmp_doc generated content at the right place:

bookmark_par_parent = bookmark_par.getparent()
index = bookmark_par_parent.index(bookmark_par) + 1
for child in tmp_doc_body:
    bookmark_par_parent.insert(index, child)
    index = index + 1
bookmark_par_parent.remove(bookmark_par)

The document is now finalized, the generated content having been inserted at the bookmark location of an existing Word document.

# Save result
# print(docx.oxml.xmlchemy.serialize_for_reading(doc_body))
doc.save('/some/path/generated_doc.docx')

I hope this can help someone, as the documentation regarding this is still yet to be written.

Apteryx
  • 4,279
  • 3
  • 12
  • 15
  • 1
    As for version 0.8.7, write `doc_element = doc.part.element` instead of `doc_element = document._document_part._element`. – Max Dec 06 '18 at 20:44
  • 1
    Thank you this answer helped me so much. Was able to figure it out. Any idea how I can just replace the text within the paragraph were the bookmark exist. E.g. just want to change one word in a paragraph were the Bookmark is. – Tooblippe Feb 24 '20 at 08:40
3

You put [image] as a token in your template document:

for paragraph in document.paragraphs:
    if "[image]" in paragraph.text:
        paragraph.text = paragraph.text.strip().replace("[image]", "")

        run = paragraph.add_run()
        run.add_picture(image_path, width=Inches(3))

you have have a paragraph in a table cell as well. just find the cell and do as above.

David Dehghan
  • 14,410
  • 2
  • 84
  • 85
2

Python-docx owner suggests how to insert a table into the middle of an existing document: https://github.com/python-openxml/python-docx/issues/156

Here it is with some improvements:

import re
from docx import Document

def move_table_after(document, table, search_phrase):
    regexp = re.compile(search_phrase)
    for paragraph in document.paragraphs:
        if paragraph.text and regexp.search(paragraph.text):
            tbl, p = table._tbl, paragraph._p
            p.addnext(tbl)
            return paragraph

if __name__ == '__main__':
    document = Document('Existing_Document.docx')    
    table = document.add_table(rows=..., cols=...)
    ...
    move_table_after(document, table, "your search phrase")                    
    document.save('Modified_Document.docx')
Noam Manos
  • 10,586
  • 2
  • 68
  • 65
2

Have a look at python-docx-template which allows jinja2 style templates insertion points in a docx file rather than Word bookmarks:

https://pypi.org/project/docxtpl/

https://docxtpl.readthedocs.io/en/latest/

adejones
  • 512
  • 5
  • 10
0

Thanks a lot for taking time to explain all of this.

I was going through more or less the same issue. My specific point was how to merge two or more docx documents, at the end.

It's not exactly a solution to your problem, but here is the function I came with:

def combinate_word(main_file, files, output):   
    main_doc = Document(main_file)
    for file in files:
        sub_doc = Document(file)

        for element in sub_doc._document_part.body._element:
            main_doc._document_part.body._element.append(element)

    main_doc.save(output)

Unfortunately, it's not yet possible nor easy to copy images with python-docx. I fall back to win32com ...

Stéphane
  • 1,837
  • 19
  • 27
  • Thanks for sharing! I didn't need to experiment with images yet, so I'm not sure about the challenges on that side. – Apteryx Aug 08 '14 at 15:54