4

Assume that I have some text (for example given as a string). Later I am going to "edit" this text, which means that I want to add something somewhere or remove something. In this way I will get another version of the text. However, I do not want to have two strings representing each version of the text since there are a lot of "repetitions" (similarities) between the two subsequent versions. In other words, the differences between the strings are small, so that it makes more sense just to save differences between them. For example, the first versions.

This is my first version of the texts.

The second version:

This is the first version of the text, that I want to use as an example.

I would like to save these two versions as one object (it should not necessarily be XML, I use it just as an example):

This is the <removed>my</removed> <added>first</added> version of the text<added>, that I want to use as an example</added>.

Now I want to go further. I want to save all subsequent edits as one object. In other words, I am going to have more than two versions of the text, but I would like to save them as one object such that it is easy to get a given version of the text and easy to find out what are the difference between two subsequent (or any two given) versions.

So, to summarize, my question is: What is the standard way to represent changes in a text and to work with this representation using Python.

Roman
  • 97,757
  • 149
  • 317
  • 426
  • Take a look at http://stackoverflow.com/questions/2307472/generating-and-applying-diffs-in-python this is a similar question. – Bernhard Mar 30 '15 at 08:05

1 Answers1

1

I would probably go with difflib: https://docs.python.org/2/library/difflib.html

You can use it to represent changes between versions of string and create your own class to store consecutive diffs.

EDIT: I just realised it doesn't really make sense in your use case as the diffs from difflib are essentially storing both strings, so you will be better off in just storing them all. However I believe that this is the standard (library-wise) way of working with changes in text, so I won't delete this answer.

EDIT2: Although it seems that if you find a way to apply unified_diff to strings this may be your answer. It seems that there is no way to do this with difflib yet: https://bugs.python.org/issue2057

Nebril
  • 2,640
  • 1
  • 28
  • 47