5

Suppose I have a string template, e.g.,

string="This is a {object}"

Now i create two(or more) strings by formatting this string, i.e.,

string.format(object="car")
=>"This is a car"

string.format(object="2020-06-05 16:06:30")
=>"This is a 2020-06-05 16:06:30"

Now I have lost the original string somehow. Is there a way to find out the original string using the 2 new strings that I have now?

Note: I have a data set of these strings which were created from a template but the original template was lost because of editing. New strings were created from the new template and put in the same data set. I have tried using some ML based approach but it doesn't seem to work in general case. I am looking for an algorithm that gives me back the original string, it could be one or a group a strings in case the template has been changed multiple times.

Anshika Singh
  • 456
  • 5
  • 15
Akhil Garg
  • 69
  • 2
  • It's not generally possible to go back to the template from the output and inputs. What should the result be given `"This car is a car"` and `"car"` from `"This car is a {thing}".format(thing="car")`, for example? – jonrsharpe Jun 05 '20 at 10:47

3 Answers3

1

A possibility could be to match the words and formatted value options in the input strings and then compare:

import re
def get_vals(s):
   return re.findall('[\d\-]+\s[\d:]+|\w+', s)

vals = ["This is a car", "This is a 2020-06-05 16:06:30"]
r = ' '.join('{object}' if len(set(i)) > 1 else i[0] for i in zip(*map(get_vals, vals)))

Output:

'This is a {object}'
Ajax1234
  • 58,711
  • 7
  • 46
  • 83
0

You could use one of the many "sequence alignment" algorithms used mostly to align DNA sequences. This will return sequences of the string which are conserved. Then you would keep the conserved areas and add in placeholders where "mutation" happened to get the templates.

https://en.wikipedia.org/wiki/Multiple_sequence_alignment will get you started.

Finn
  • 1,599
  • 1
  • 17
  • 27
  • as someone familiar with sequence alignment, I am not clear how this would be used to help, maybe you could explain a little more in your answer? Thanks – user5359531 Jun 05 '20 at 11:38
0

You can find place of template, but won't be able to understand the names in template, so by getting difference between two strings you can understand place of templated strings.

Take a look on Python - getting just the difference between strings for suggestion of how to get difference between two strings.

Below some steps which may serve you as starting point:

  1. Get difference between strings A and B as list, collect only strings from A.
  2. Initialize template = A
  3. Iterate over different strings and replace them in template to {}

At the and you will have will have template string from A.

jonrsharpe
  • 99,167
  • 19
  • 183
  • 334
Andriy Ivaneyko
  • 15,838
  • 3
  • 43
  • 65