Extract all strings between start marker and end marker

Question

I have a single line string read from a txt file (file only has this string) in the following format:

[["abstract", "common", "reference", "introduction", "motivation"], ["real", "day", "night", "twenty", "blood", "truck", "major", "ship", "plane"], ["weapon", "guns", "nuclear", "revolver"], ["rose", "princess", "flower", "beauty", "sunflower", "soldier", "imaginary", "jasmine"], ["cloth", "shirt", "jeans", "trouser"]]

I want to retrieve all the text content between start marker = [" and end marker = "]

so my desired output should be (new line seperated):

"abstract", "common", "reference", "introduction", "motivation"

"real", "day", "night", "twenty", "blood", "truck", "major", "ship", "plane"

"weapon", "guns", "nuclear", "revolver"

"rose", "princess", "flower", "beauty", "sunflower", "soldier", "imaginary", "jasmine"

"cloth", "shirt", "jeans", "trouser"

I wrote the following code:

def fileRead(fpath):

    f = open(fpath, "r")

    for s in f:

            start = s.find('["')

            start += 1  # skip the bracket, move to the next character

            end = s.find('"]', start)

            print(s[start:end])

            return s[start:end]

But it is giving me the following output only:

"abstract", "common", "reference", "introduction", "motivation"

Please let me know what needs to be changed to get the desired output?

remove return statement as it will break the for loop, upvote if it works — Syed Khizaruddin, Dec 11 '19 at 07:00
tried removing return, but the problem still persists, it's a single line input string — pankaj kashyap, Dec 11 '19 at 07:12
put your string inside list() function and then try it will convert it in list then use for loop — Syed Khizaruddin, Dec 11 '19 at 07:17
I strongly recommend the following article: https://ericlippert.com/2014/03/05/how-to-debug-small-programs/. Also, this is a blatant duplicate. — AMC, Dec 11 '19 at 08:18
Does this answer your question? [Convert string representation of list to list](https://stackoverflow.com/questions/1894269/convert-string-representation-of-list-to-list) — AMC, Dec 11 '19 at 08:19

can · Accepted Answer · 2019-12-11T08:30:44.207

literal_eval is perfect for this. It basicly takes takes a list represented as string and gives out python list

a = """["hello"]"""
b = literal_eval(a)
b[0]
>>> "hello"

And for your case:

from ast import literal_eval


def fileRead(fpath):

    f = open(fpath, "r")
    f_string = f.readlines()
    f_list = literal_eval(f_string)
    print(f_list)
    for item in f_list:
        print(" ".join(item)) # joins words with space between them

Here is the documentation.

score 1 · Answer 2 · answered Dec 11 '19 at 07:56

This is a regex solution

import re
s = '[["abstract", "common", "reference", "introduction", "motivation"], ["real", "day", "night", "twenty", "blood", "truck", "major", "ship", "plane"], ["weapon", "guns", "nuclear", "revolver"], ["rose", "princess", "flower", "beauty", "sunflower", "soldier", "imaginary", "jasmine"], ["cloth", "shirt", "jeans", "trouser"]]'

s = re.compile(']\s*,').sub('\n',s) # Line feed
s = re.compile('\[|\]').sub('',s) # Remove []
print (s)

score 1 · Answer 3 · answered Dec 11 '19 at 08:10

you can try this code

def fileRead(fpath):
    f = open(fpath, "r")
    s=f.read()
    while (s.find('["')!=-1 and s.find('"]')!=-1):
        g=min(s.find('["'),s.find('"]'))
        s=s[:g]+s[g+2:]
    s=s[1:-2]
    f.close()
    return s

I hope that I was useful

Extract all strings between start marker and end marker

3 Answers3