0

I am trying to extract the middle value from a register that is something like this:

NAME=PATH=USER=DATE

I need to get the path part and all I came up with is this:

=[^=]*=

the problem is that it includes the = at the beginning and at the end. How could i not include them?

Also, which unix command should I use to extract it? I was thinking sed, but I usually use it to replace strings not to get them. grep maybe?

I am new in bash programming...

Is there any place where I can learn some regex? because I think I will need to really know how to work with them.

Uri Agassi
  • 35,245
  • 12
  • 68
  • 90
user3013172
  • 1,273
  • 2
  • 11
  • 20

7 Answers7

3

Use captured groups:

if [[ $str =~ =([^=]+)= ]]
then
    echo "Part between = and = is ${BASH_REMATCH[1]}."
fi
gniourf_gniourf
  • 38,851
  • 8
  • 82
  • 94
Wouter J
  • 39,482
  • 15
  • 97
  • 108
  • 1
    You mustn't double-quote the regex - otheerwise, _literal_ string comparison will occur (applies since `bash` version `3.2`; exception: if `shopt -s compat31` is set). There's a typo in the variable name; should be `BASH_REMATCH`. – mklement0 Apr 27 '14 at 14:38
  • @mklement0 I was waiting for someone to point that out. Just wanted to see how long does it remain unnoticed :p – devnull Apr 27 '14 at 14:41
  • 1
    @devnull :) While this answer will be useful once fixed, I'm a little baffled that it got 2 up-votes in its present state. – mklement0 Apr 27 '14 at 14:44
  • @mklement0 Me too. I actually noticed this after I'd posted an answer. Pointing out mistakes after posting an answer doesn't seem great to me. Thought of fixing the problems, but decided against it. (It shows how people vote.) – devnull Apr 27 '14 at 14:46
2

In bash:

IFS="="
a="NAME=PATH=USER=DATE"
read -a b <<< "$a"
echo "${b[1]}"

UPDATE as suggested by mklement0

a="NAME=PATH=USER=DATE"
IFS="=" read -a b <<< "$a"
echo "${b[1]}"
a5hk
  • 6,552
  • 2
  • 22
  • 34
  • 1
    +1, but better to directly prepend `IFS="=" ` to the `read` command so as to localize the effect of changing `$IFS`. – mklement0 Apr 27 '14 at 14:49
  • 2
    Moreover, you don't necessarily need an array. `IFS='=' read -r name path _ <<< "$a"; echo "$path"` – devnull Apr 27 '14 at 14:50
  • 1
    @mklement0, I didn't know that, Thanks, Updated. – a5hk Apr 27 '14 at 15:00
  • But now that becomes exactly my answer and you get more upvotes. I hate SO `:D`. – gniourf_gniourf Apr 27 '14 at 15:03
  • @devnull, Is `_` something like `/dev/null` in this case? – a5hk Apr 27 '14 at 15:13
  • 2
    @Ashkan No, `_` is just an ordinary variable (garbage one in this case) which will capture the remaining part of the line which is or will never be used. – jaypal singh Apr 27 '14 at 15:18
  • @gniourf_gniourf: I hear you; your answer (which got my vote too) still provides value in that it is more detailed. – mklement0 Apr 27 '14 at 15:21
  • @JS웃 It has some special meaning [see](http://www.gnu.org/software/bash/manual/html_node/Special-Parameters.html) and I was not able to find what happens when we use it like this – a5hk Apr 27 '14 at 15:27
  • 1
    @Ashkan Yes, it has special meaning, so it may be better not to use it as a variable name to avoid confusion; it'll still work, though, and have no unwanted side-effects. – mklement0 Apr 27 '14 at 15:30
  • 1
    The value of `_` will be set _after_ the execution of the `read` command, so this has absolutely no unwanted effects. Besides it is the common way of saying _I don't care about this field_. – gniourf_gniourf Apr 27 '14 at 16:14
  • 2
    @Ashkan In the current context, you could think of `_` as a placeholder that would consume the rest of the input. – devnull Apr 27 '14 at 16:47
2

So as to have several possibilities, you can also use read and make an array with all your fields:

var="NAME=PATH=USER=DATE"
IFS== read -r -a var_ary <<< "$var"
echo "field1: ${var_ary[0]}"
echo "field2: ${var_ary[1]}"
echo "field3: ${var_ary[2]}"
echo "field4: ${var_ary[3]}"

will output:

field1: NAME
field2: PATH
field3: USER
field4: DATE

this will also enable you to check that you have the correct number of fields:

if ((${#var_ary[@]}==4)); then
    echo "Cool I have 4 fields"
else
    echo "Oh no, I don't have 4 fields (I have ${#var_ary[@]} fields)"
fi
gniourf_gniourf
  • 38,851
  • 8
  • 82
  • 94
2

No need for complicated regex, a simple awk can do it:

echo "NAME=PATH=USER=DATE" | awk -F= '{print $2}'
PATH
Jotne
  • 38,154
  • 10
  • 46
  • 52
  • 2
    +1; arguably even simpler (in this limited case): `echo "NAME=PATH=USER=DATE" | cut -d = -f 2` - @devnull had posted this in an answer he's since removed. – mklement0 Apr 27 '14 at 22:42
1

You can use bash parameter to remove the leading and trailing pieces.

$ s='NAME=PATH=USER=DATE'
$ s=${s#*=} && echo "${s%%=*}"
PATH

%% removes the longest match from behind and # removes the shortest match from front. Using them together allows you to remove pieces you don't need. You can read more about bash parameter expansion here.

jaypal singh
  • 67,706
  • 21
  • 93
  • 138
  • Just a small caveat: `$s` gets modified in the process. – mklement0 Apr 27 '14 at 14:52
  • Absolutely; I should have clarified that _as written_, the input variable is modified. – mklement0 Apr 27 '14 at 14:57
  • Come to think of it: If we also capture the output in a variable like so: `path=$(s=${s#*=} && echo "${s%%=*}")`, the command substitution, due to happening in a _subshell_, would leave the original `$s` untouched (though that may be a tad confusing). – mklement0 Apr 27 '14 at 15:06
  • @mklement0 If you wish to save the variable then you'd do `p=${s#*=} && echo "${p%%=*}"` – jaypal singh Apr 27 '14 at 15:07
  • Got it. In your example `$p` is just an _auxiliary_ variable, though; my command substitution example was meant to show how to capture the _result_ in a variable - without the need for an intermediate aux. variable; that said, we can do this with `path=${s#*=} && path=${path%%=*}`, without a subshell. – mklement0 Apr 27 '14 at 15:11
  • 1
    Absolutely. Just two different use cases: mere echoing vs. capturing in a variable. I plead guilty to conflating them without saying so, but at least we've now covered them both. :) – mklement0 Apr 27 '14 at 15:24
1

NAME=PATH=USER=DATE

Multiple ways to extract this data. The easiest may be pattern filtering. Pattern filtering has four forms:

  • ${VAR#PATTERN} - Remove the smallest left most part of the string that matches the pattern.
  • ${VAR##PATTERN} - Remove the largest left most part of the string that matches.
  • ${VAR%PATTERN} - Remove the **smallest right most part of the string that matches.
  • ${VAR%%PATTERN} - Remove the largest right most part of the string that matches.

You can remember that # is to the left of % on the keyboard, so # is left and % is right.

STRING="NAME=PATH=USER=DATE"
PATH=${STRING#*=}  # Removes NAME=
PATH=${PATH%%=*}   # Removes =USER=DATE
echo $PATH         # Echoes "PATH"

You might be able to use the read to get all four at once. I am on an iPad, so I can't test this right now.

OLD_IFS="$IFS"
IFS="="
read NAME PATH USER DATE <<<"$STRING"
IFS="$OLDIFS"

$IFS is thee Input File Separator and is set to space/tab/NL by default. I save the value of $IFS before I change it. I set it to = which separates your various values in your input string..

The read will read in the values using $IFS to separate each one. The <<< is a way to get the shell variable in as input.

Once I finish getting the values, I reset IFS. Otherwise, I would have problems later on.

David W.
  • 98,713
  • 36
  • 205
  • 318
  • Useful info, but it's better to stay away from all-uppercase variable names to avoid conflicts with environment variables; case in point `$PATH`. You can localize the effect of changing `$IFS`: `IFS="=" read name path user date << – mklement0 Apr 27 '14 at 14:55
  • I usually avoid uppercase variables, but I wanted to make it obvious the names of the variables. – David W. Apr 28 '14 at 13:38
0

Excellent place to start for regex on Stackoverflow is : Reference - What does this regex mean?

For your actual question - you were looking for a regex and a way to use it in bash so:

josephs-mbp$more temp.txt 
NAME=PATH=USER=DATE
josephs-mbp$sed 's/^.*=\([^=]*\)=.*=.*$/\1/' temp.txt 
PATH
josephs-mbp$

Let's break down the important bit. Occasionally you want to refer to bits of a regular expression separately, it turns out that the easy way of doing this is by putting things in parenthesis '(' ')' - but in regex land we have to escape them so it's '(' and ')'. These make no difference at all to the match but they make a difference to what happens now.

In general the bit of a regex between a '(' and ')' is stored in a location that can be accessed later with \1, \2,... with the first pair stored in \1 and the second in \2 and so on. Here I just put the bit of the regex you wanted into the brackets and then replaced (thats the 's/fu/bar' bit) in sed. It's ugly and there are probably much more effective ways of doing it, but I think you are starting from about the same point I am and I think that this is the next step for you.

Community
  • 1
  • 1
Joe
  • 4,000
  • 7
  • 29
  • 49
  • This is wahat i wanted!. But i have a question. if i "read" the regex i would say something like: From the beginning of the line(^),any character any number of times(.*) and then an equal sign (=). There i would have gotten just:NAME= What i don't understand is what you do next, the \( \) part. I've seen this multiple times and i can't get it. I know it has something to do with the /\1 at the end of the expression but i don't quite get it. The other part (=.*=.*$) would be: from an equal sign(=) any character any numer of times (.*) until it reads an equal (=) and again, any character til EOL – user3013172 Apr 27 '14 at 14:44
  • Thanks @user3013172 glad to help - I've added a bit of an explanation. Let me know if that clears up nicely. I hope it does - this might be my first ever (none-self) accepted answer on the site... :) – Joe Apr 28 '14 at 09:17