1

I have a single line string read from a txt file (file only has this string) in the following format:

[["abstract", "common", "reference", "introduction", "motivation"], ["real", "day", "night", "twenty", "blood", "truck", "major", "ship", "plane"], ["weapon", "guns", "nuclear", "revolver"], ["rose", "princess", "flower", "beauty", "sunflower", "soldier", "imaginary", "jasmine"], ["cloth", "shirt", "jeans", "trouser"]]

I want to retrieve all the text content between start marker = [" and end marker = "]

so my desired output should be (new line seperated):

"abstract", "common", "reference", "introduction", "motivation"

"real", "day", "night", "twenty", "blood", "truck", "major", "ship", "plane"

"weapon", "guns", "nuclear", "revolver"

"rose", "princess", "flower", "beauty", "sunflower", "soldier", "imaginary", "jasmine"

"cloth", "shirt", "jeans", "trouser"

I wrote the following code:

def fileRead(fpath):

    f = open(fpath, "r")

    for s in f:

            start = s.find('["')

            start += 1  # skip the bracket, move to the next character

            end = s.find('"]', start)

            print(s[start:end])

            return s[start:end]

But it is giving me the following output only:

"abstract", "common", "reference", "introduction", "motivation"

Please let me know what needs to be changed to get the desired output?

can
  • 334
  • 4
  • 13
pankaj kashyap
  • 332
  • 2
  • 3
  • 12

3 Answers3

1

literal_eval is perfect for this. It basicly takes takes a list represented as string and gives out python list

a = """["hello"]"""
b = literal_eval(a)
b[0]
>>> "hello" 

And for your case:

from ast import literal_eval


def fileRead(fpath):

    f = open(fpath, "r")
    f_string = f.readlines()
    f_list = literal_eval(f_string)
    print(f_list)
    for item in f_list:
        print(" ".join(item)) # joins words with space between them

Here is the documentation.

can
  • 334
  • 4
  • 13
1

This is a regex solution

import re
s = '[["abstract", "common", "reference", "introduction", "motivation"], ["real", "day", "night", "twenty", "blood", "truck", "major", "ship", "plane"], ["weapon", "guns", "nuclear", "revolver"], ["rose", "princess", "flower", "beauty", "sunflower", "soldier", "imaginary", "jasmine"], ["cloth", "shirt", "jeans", "trouser"]]'

s = re.compile(']\s*,').sub('\n',s) # Line feed
s = re.compile('\[|\]').sub('',s) # Remove []
print (s)
dabingsou
  • 2,241
  • 1
  • 3
  • 7
1

you can try this code

def fileRead(fpath):
    f = open(fpath, "r")
    s=f.read()
    while (s.find('["')!=-1 and s.find('"]')!=-1):
        g=min(s.find('["'),s.find('"]'))
        s=s[:g]+s[g+2:]
    s=s[1:-2]
    f.close()
    return s

I hope that I was useful