11

I know this is a repeated question but the other answers did not work for me. I have a word file that consists of one table. I want that table as an output of my python program. I'm using python 3.6 and I have installed python -docx as well. Here is my code for the data extraction

from docx.api import Document

document = Document('test_word.docx')
table = document.tables[0]

data = []

keys = None
for i, row in enumerate(table.rows):
    text = (cell.text for cell in row.cells)

    if i == 0:
        keys = tuple(text)
        continue
    row_data = dict(zip(keys, text))
    data.append(row_data)
    print (data)

I want the result that exactly looks like the word docx file. Thanks in advance

Rachel Gallen
  • 25,819
  • 19
  • 69
  • 75
Aroon
  • 886
  • 1
  • 13
  • 28

1 Answers1

15

Your code works fine for me. How about inserting it into a dataframe?

import pandas as pd
from docx.api import Document

document = Document('test_word.docx')
table = document.tables[0]

data = []

keys = None
for i, row in enumerate(table.rows):
    text = (cell.text for cell in row.cells)

    if i == 0:
        keys = tuple(text)
        continue
    row_data = dict(zip(keys, text))
    data.append(row_data)
    print (data)

df = pd.DataFrame(data)

How can i display particular row and column in that table? We can extract rows and cols based on index with iloc

# iloc[row,columns] 
df.iloc[0,:].tolist() # [5,6,7,8]  - row index 0
df.iloc[:,0].tolist() # [5,9,13,17]  - column index 0
df.iloc[0,0] # 5  - cell(0,0)
df.iloc[1:,2].tolist() # [11,15,19]  - column index 2, but skip first row

and so on...

However, if your columns have names (in this case it is numbers) you can do it like this:

#df["name"].tolist() 
df[1].tolist() # [5,6,7,8] - column with name 1 

print(df)

prints, which is how the table looks like in my sample doc.

    1   2   3   4
0   5   6   7   8
1   9   10  11  12
2   13  14  15  16
3   17  18  19  20
Anton vBR
  • 15,331
  • 3
  • 31
  • 42
  • Thanks man. Works great i have a another question, How can i display particular row and column in that table? – Aroon Oct 08 '17 at 08:20
  • can you please stick the full code like earlier you sent. I confused with this code. for example how can i edit this code import pandas as pd from docx.api import Document document = Document('test_word.docx') table = document.tables[0] data = [] keys = None for i, row in enumerate(table.rows): text = (cell.text for cell in row.cells) if i == 0: keys = tuple(text) continue row_data = dict(zip(keys, text)) data.append(row_data) print (data) df = pd.DataFrame(data) print(df) – Aroon Oct 09 '17 at 17:30
  • 1
    @ArunBaskar I moved things around a bit, hope this is what you want :) – Anton vBR Oct 09 '17 at 19:24
  • thanks man.@Anton how about this one https://stackoverflow.com/questions/46659311/how-to-count-specific-column-name-among-different-tables-in-word-using-python – Aroon Oct 10 '17 at 12:52
  • @AntonvBR, [this solution](https://stackoverflow.com/a/47978006/5741205) might be interesting for you... – MaxU Dec 26 '17 at 12:18
  • @ArunBaskar, you may try to use [something similar to this](https://stackoverflow.com/a/47978006/5741205) for your deleted question. P.S. please ping me if you will decide to undelete your question... – MaxU Dec 26 '17 at 12:31
  • @MaxU Nice1. This answer was really just a modification of the question where I tweaked the code. I like your approach. – Anton vBR Dec 26 '17 at 13:06
  • if there is only one row and may be several column for instance 4 column with one row then it's not giving the output. what can we do for that? any idea.. – RAVI KUMAR SHARMA CSE16 Feb 04 '20 at 13:15