0

I am trying to understand how the following code sample for extracting the first twitter handles mentioned in a tweet works:

a = load '/user/pig/full_text.txt' AS (id:chararray, ts:chararray, location:chararray, lat:float, lon:float, tweet:chararray);
b = foreach a generate id, ts, location, LOWER(tweet) as tweet;
c = foreach b generate id, ts, location, REGEX_EXTRACT(tweet, '(.*)@user_(\\S{8})([:| ])(.*)',2) as tweet;
d = limit c 5;
dump d;

The data are in the file full_text.txt is in the following format:

USER_79321756   2010-03-03T04:15:26 ÜT: 47.528139,-122.197916   47.528139   -122.197916 RT @USER_2ff4faca: IF SHE DO IT 1 MORE TIME......IMA KNOCK HER DAMN KOOFIE OFF.....ON MY MOMMA>>haha. #cutthatout
USER_79321756   2010-03-03T04:55:32 ÜT: 47.528139,-122.197916   47.528139   -122.197916 @USER_77a4822d @USER_2ff4faca okay:) lol. Saying ok to both of yall about to different things!:*
USER_79321756   2010-03-03T05:13:34 ÜT: 47.528139,-122.197916   47.528139   -122.197916 RT @USER_5d4d777a: YOURE A FOR GETTING IN THE MIDDLE OF THIS @USER_ab059bdc WHO THE FUCK ARE YOU ? A FUCKING NOBODY !!!!>>Lol! Dayum! Aye!
USER_79321756   2010-03-03T05:28:02 ÜT: 47.528139,-122.197916   47.528139   -122.197916 @USER_77a4822d yea ok..well answer that cheap as Sweden phone you came up on when I call.
USER_79321756   2010-03-03T05:56:13 ÜT: 47.528139,-122.197916   47.528139   -122.197916 A sprite can disappear in her mouth - lil kim hmmmmm the can not the bottle right?

However, I am unable to understand how the function REGEX_EXTRACT(tweet, '(.*)@user_(\\S{8})([:| ])(.*)',2) works. Can someone explain in simple terms what the regex in this case is searching for and how an index of selects the first twitter handle.

Anonymouse
  • 147
  • 1
  • 8
  • 3
    Possible duplicate of [Reference - What does this regex mean?](https://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean) – CAustin Aug 07 '18 at 21:04
  • 2
    Perhaps seeing how the expression works live can help you, check [here](https://regex101.com/r/WT7L4d/1/). – Paolo Aug 07 '18 at 21:41

0 Answers0