1

I have input(output.txt with almost 2000lines) like

lorem ipsum 
lorem ipsum
["a","b","c","d"]
lorem ipsum
lorem ipsum
["e","f","g","h"]

My output1.txt should be

a
b
c
d
e
f
g
h

First, i am trying to put all values inside [] in one file. But ultimately my aim is to achieve output1.txt. It would be great if someone help me to achieve in one shot(extracting data inside [] and removing "" and comma and place values in each line)

My code is as of now

    reg="\[([^]]+)\]"
    while IFS='' read -r line || [[ -n "$line" ]]; do
    if [[ $line = ~$reg ]] ; then echo "$line" >>      home/hdpsrvc/sandeep/hbase/output1.txt ; fi
    done < /home/hdpsrvc/sandeep/hbase/output.txt

File is not created in specified path and also on the terminal there is no error. I followed following stackoverflow links to write above code

shell script. how to extract string using regular expressions Regular expression to extract text between square brackets

Community
  • 1
  • 1
sandeep007
  • 159
  • 12
  • You used `= ~$reg` instead of `=~ $reg`. You'll make things easier on yourself if you test the code you link to first to make sure it works, then gradually adapt it and see exactly at which point you get stuck – that other guy Dec 17 '15 at 18:00
  • Awesome..Thanks.You are right. Can you tell me how i should extract like output1.txt – sandeep007 Dec 17 '15 at 18:14
  • `reg="\[([^]]+)\]" reg1="(["'])(?:(?=(\\?))\2.)*?\1" while IFS='' read -r line || [[ -n "$line" ]]; do if [[ $line =~ $reg ]] then if [[ $line =~ $reg1 ]] then echo "1" echo "$line" >> /home/hdpsrvc/sandeep/hbase/output1.txt fi fi done < /home/hdpsrvc/sandeep/hbase/output.txt` I am getting error message extract1.sh: line 3: unexpected EOF while looking for matching `'' extract1.sh: line 13: syntax error: unexpected end of file I followed this link[http://stackoverflow.com/questions/171480/regex-grabbing-values-between-quotation-marks] – sandeep007 Dec 17 '15 at 19:10

1 Answers1

0

An awk can do this easily:

cd /home/hdpsrvc/sandeep/hbase

awk -F'[][," ]+' '/\[.*\]/{for (i=2; i<NF; i++) print $i}' output.txt > output1.txt
a
b
c
d
e
f
g
h
anubhava
  • 664,788
  • 59
  • 469
  • 547
  • No, its taking input from output.txt(line variable reads each and every line from output.txt). so now can you tell me where to put output.txt? – sandeep007 Dec 17 '15 at 19:34
  • i understood output1.txt is my output. But my input file is output.txt. If you see my code there are 2 different files. I am asking about output.txt. Thanks for your time and effort – sandeep007 Dec 17 '15 at 19:37
  • sorry for troubling you. my code in the question is in extract.sh. So can i remove my entire code and place your awk code? – sandeep007 Dec 17 '15 at 19:45
  • Yes of course, remove all that code and use this awk. [Also check this demo](http://ideone.com/0QsSu4) – anubhava Dec 17 '15 at 19:47
  • ok thanks for nice demo. But as i am newbie.. i have all these questions. Thanks for your patience. you have used lorem ipsum text directly in your cat command, but i need it to be in file name. so does the following work? cat << "output.txt" | awk -F'[][,"]+' '/\[.*\]/{for (i=2; i – sandeep007 Dec 17 '15 at 19:59
  • very very sorry. just saw your update code. i will try and let u know. thanks – sandeep007 Dec 17 '15 at 20:03
  • 1
    wow. super. Thanks for your time and effort. Also a very big thanks for revising your answer according to my requirement. I am accepting your answer. Thanks once again – sandeep007 Dec 17 '15 at 20:09
  • sorry to bother you again. I have closely seen my input so there is one space after comma like ["a", "b", "c"]. so that is why my output is having empty line after each element. do you have any fix for this? – sandeep007 Dec 17 '15 at 20:27