I have a (properly formed) large string variable that I turn into lists of dictionaries. I iterate over the massive string, split by newline characters, and run the following list(eval(i))
. This works for the majority of the cases, but for every exception thrown, I add the 'malformed' string into a failed_attempt
array. I have been inspecting the failed cases for an hour now, and believe what causes them to fail is whenever there is an extra quotation mark that is not part of the keys for a dictionary. For example,
eval('''[{"question":"What does "AR" stand for?","category":"DFB","answers":["Assault Rifle","Army Rifle","Automatic Rifle","Armalite Rifle"],"sources":["https://www.npr.org/2018/02/28/588861820/a-brief-history-of-the-ar-15"]}]''')
Will fail because there is quotation marks around the "AR." If you replace the quotation marks with single quotation marks, e.g.
eval('''[{"question":"What does 'AR' stand for?","category":"DFB","answers":["Assault Rifle","Army Rifle","Automatic Rifle","Armalite Rifle"],"sources":["https://www.npr.org/2018/02/28/588861820/a-brief-history-of-the-ar-15"]}]''')
It now succeeds.
Similarly:
eval('''[{"question":"Test Question, Test Question?","category":"DFB","answers":["2004","1930","1981","This has never occurred"],"sources":[""SOWELL: Exploding myths""]}]''')
Fails due to the quotes around "Sowell", but again succeeds if you replace them with single quotes.
So I need a way to identify quotes that appear anywhere other than around the keys of the dictionary (question
, category
, sources
) and replace them with single quotes. I'm not sure the right way to do this.
@Wiktor's submission nearly does the trick, but will fail on the following:
example = '''[{"question":"Which of the following is NOT considered to be "interstate commerce" by the Supreme Court, and this cannot be regulated by Congress?","category":"DFB","answers":["ANSWER 1","ANSWER 2","ANSWER 3","All of these are considered "Interstate Commerce""],"sources":["SOURCE 1","SOURCE 2","SOURCE 3"]}]'''
re.sub(r'("\w+":[[{]*")(.*?)("(?:,|]*}))', lambda x: "{}{}{}".format(x.group(1),x.group(2).replace('"', "'"),x.group(3)), example)
Out[170]: '[{"question":"Which of the following is NOT considered to be \'interstate commerce\' by the Supreme Court, and this cannot be regulated by Congress?","category":"DFB","answers":["ANSWER 1","ANSWER 2","ANSWER 3","All of these are considered "Interstate Commerce""],"sources":["SOURCE 1","SOURCE 2","SOURCE 3"]}]'
Notice that the second set of double quotation marks on "Interstate Commerce" in the answers is not replaced.