-1

I have the following strings, which always follow a standard format:

'On 10/31/2018, Sally Brown picked 25 apples at the orchard.'
'On 11/01/2018, John Smith picked 12 peaches at the orchard.'
'On 09/15/2018, Jim Roe picked 10 pears at the orchard.'

I want to extract certain data fields into a series of lists:

['10/31/2018','Sally Brown','25','apples']
['11/01/2018','John Smith','12','peaches']
['09/15/2018','Jim Roe','10','pears']

As you can see, I need some of the sentence structure to be recognized, but not captured, so the program has context for where the data is located. The Regex that I thought would work is:

(?<=On\s)\d{2}\/\d{2}\/\d{4},\s(?=[A-Z][a-z]+\s[A-Z][a-z]+)\s.+?(?=\d+)\s(?=[a-z]+)\sat\sthe\sorchard\.

But of course, that is incorrect somehow.

This may be a simple question for someone, but I'm having trouble finding the answer. Thanks in advance, and someday when I'm more skilled I'll pay it forward on here.

Alex Heebs
  • 438
  • 1
  • 3
  • 12
  • 2
    Do you just need anything starting with data and the number after that? – Ashok KS Oct 31 '18 at 04:33
  • 1
    Really not a whole lot of information to go off of, but a start: `u\w+?\\\(.*?)\\\` [Live](https://regex101.com/r/HGOTeT/1) – K.Dᴀᴠɪs Oct 31 '18 at 04:33
  • `re.findall('data\d',string)` should fetch you the list. – Ashok KS Oct 31 '18 at 04:36
  • Well, I'm using data1 and data2 as stand-ins for the actual data (because the actual string I want to analyze is huge). But basically, I need the "\u0025" items to recognized by Regex, but not captured by it (because they are the only markers that indicate where the data fields are). I want a Regex that says "Ah, here's a pattern, data comes next," then grabs that data. – Alex Heebs Oct 31 '18 at 04:42
  • OP, I'd like to refer you to the following meta post: https://meta.stackoverflow.com/questions/285733/should-give-me-a-regex-that-does-x-questions-be-closed – Mad Physicist Oct 31 '18 at 04:42
  • I'm not asking for someone to build me a Regex. But I'm having trouble finding an answer to what concept would apply to a "pattern.for.refference-data.to.capture-pattern.for.refference-data.to.capture" type string. – Alex Heebs Oct 31 '18 at 04:46
  • I'm not clear on what you *are* asking then. Would you mind updating your question to include what you expect as an answer? – Mad Physicist Oct 31 '18 at 06:22
  • Ok. I thought of a much simpler way to present, so I'll change. – Alex Heebs Oct 31 '18 at 06:23
  • And in response to your comment about not having read the docs, I highly recommend that you do. More than the documentation of the re module, which is excellent, you may want to check out the howto intro: https://docs.python.org/3/howto/regex.html – Mad Physicist Oct 31 '18 at 06:25
  • Question modified – hope it's better now. Thanks for the feedback. – Alex Heebs Oct 31 '18 at 06:44
  • 1
    Is this something you need? https://regex101.com/r/YY2KDp/2 – Asunez Oct 31 '18 at 09:36

1 Answers1

1

use \w+ to match any word or [a-zA-Z0-9_]

import re

str = ''''On 10/31/2018, Sally Brown picked 25 apples at the orchard.'
'On 11/01/2018, John Smith picked 12 peaches at the orchard.'
'On 09/15/2018, Jim Roe picked 10 pears at the orchard.'''

arr = re.findall('On\s(.*?),\s(\w+\s\w+)\s\w+\s(\d+)\s(\w+)', str)
print arr

# [('10/31/2018', 'Sally Brown', '25', 'apples'),
# ('11/01/2018', 'John Smith', '12', 'peaches'),
# ('09/15/2018', 'Jim Roe', '10', 'pears')]
ewwink
  • 15,852
  • 2
  • 35
  • 50