6

I have some PowerPoint documents that I keep version-controlled with git. I want to know what differences are between versions of a file. Text is most important, images and formatting not so much (at least not at this point).

nmz787
  • 1,511
  • 1
  • 16
  • 29

4 Answers4

6

I wrote this for use with git on the command-line (requires Python and the python-pptx library):

"""
Setup -- Add these lines to the following files:
--- .gitattributes
*.pptx diff=pptx

--- .gitconfig (or repo\.git\config    or your_user_home\.gitconfig) (change the path to point to your local copy of the script)
[diff "pptx"]
    binary = true
    textconv = python C:/Python27/Scripts/git-pptx-textconv.py

usage:
git diff your_powerpoint.pptx


Thanks to the  python-pptx docs and this snippet:
http://python-pptx.readthedocs.org/en/latest/user/quickstart.html#extract-all-text-from-slides-in-presentation
"""

import sys
from pptx import Presentation


if __name__ == '__main__':
    if len(sys.argv) != 2:
        print "Usage: git-pptx-textconv file.xslx"

    path_to_presentation = sys.argv[1]

    prs = Presentation(path_to_presentation)

    for slide in prs.slides:
        for shape in slide.shapes:
            if not shape.has_text_frame:
                continue
            for paragraph in shape.text_frame.paragraphs:
                par_text = ''
                for run in paragraph.runs:
                    s = run.text
                    s = s.replace(r"\\", "\\\\")
                    s = s.replace(r"\n", " ")
                    s = s.replace(r"\r", " ")
                    s = s.replace(r"\t", " ")
                    s = s.rstrip('\r\n')

                    # Convert left and right-hand quotes from Unicode to ASCII
                    # found http://stackoverflow.com/questions/816285/where-is-pythons-best-ascii-for-this-unicode-database
                    # go here if more power is needed  http://code.activestate.com/recipes/251871/
                    # or here                          https://pypi.python.org/pypi/Unidecode/0.04.1
                    punctuation = { 0x2018:0x27, 0x2019:0x27, 0x201C:0x22, 0x201D:0x22 }
                    s.translate(punctuation).encode('ascii', 'ignore')
                    s = s.encode('utf-8')
                    if s:
                        par_text += s
                print par_text
nmz787
  • 1,511
  • 1
  • 16
  • 29
  • Works great. Here's a version of your script that works with Python 3: https://gitlab.com/wolframroesler/snippets/-/blob/master/git-pptx-textconv.py – Wolfram Rösler Jan 15 '21 at 10:44
4

I was unable to install python-pptx, as suggested by the accepted answer, so I looked for a node.js solution (that may also work for several other file formats that it can handle).

Install https://github.com/dbashford/textract (npm install --global textract).

Define how to diff "textract" in your .git config. For my Windows machine,

[diff "textract"]
    binary = true
    textconv=textract.cmd

Define in your .gitattributes that *.pptx file should use diff "textract"

*.pptx diff=textract

git diff happily.

Community
  • 1
  • 1
xverges
  • 4,086
  • 1
  • 34
  • 57
  • For what it's worth this did not work for me with the "binary = true" flag. If I remove it, it works perfectly. – DTI-Matt Aug 28 '18 at 20:22
1

Not really. PowerPoint file is essentially an archive (zip) of the folder full of files. Git will treat it as a binary file (cause it is).

Maybe there's a 3rd party extension to do it but I've never heard of it.

Zepplock
  • 27,080
  • 4
  • 33
  • 49
0

I can't speak directly to git as we use Visual Studio + TFS at work. However, a bit of research reveals this should work. What I do on VS is to integrate WinMerge and its plugin which supports a text comparison of MS Office and PDF files. This allows me to do diffs of pptx, docx, pdf, etc. files published to version control.

For git, the way it should work is:

1) Get WinMerge with the xdocdiff plugin: http://freemind.s57.xrea.com/xdocdiffPlugin/en/index.html 2) Integrate WinMerge with git: https://coderwall.com/p/76wmzq/winmerge-as-git-difftool-on-windows

Hopefully this will allow you to see the text-based diffs for your PowerPoint.

Wes Wong
  • 1
  • 1