Extract string in cell from entire Dataframe

Question

Working on a pdf extraction tool. Say I have the following Dataframe. I don't know the column names, or how many columns there are. All I know is in this dataframe, I can find the string extract this: xxxx. I need to extract that string.

data = {'these':['Value1', 'padding'], 'are':['Value2', np.nan], 'random':[123, 'dont'], 'names':['extract this: 1236', 'find']} 
df = pd.DataFrame(data)      


+---------+--------+--------+--------------------+
|  these  |  are   | random |       names        |
+---------+--------+--------+--------------------+
| Value1  | Value2 | 123    | extract this: 1236 |
| padding | nan    | dont   | find               |
+---------+--------+--------+--------------------+

I'm able to get it to an array where I could then clean to remove all non-string elements as shown below and I could then find the substring, but I don't like this approach. Is there a better way of doing this?

mask = np.column_stack([df[col].str.contains(r"extract this: ", na=False) for col in df])
inv_num_arr = df.loc[mask.any(axis=1)].values[0]

The output should just the string extract this: 1236

kindly post your expected output - will it be a series/dataframe/or simply a string? — sammywemmy, Aug 18 '20 at 09:36
In your whole DataFrame, is there only one string matching that pattern? If not, which one would you like to extract? The first one? The last one? All of them? — pierre_loic, Aug 18 '20 at 09:42
There should only be a single. But as a safety measure it's safe to assume one could take the first one that is found — notverygood, Aug 18 '20 at 09:44
Would this thread be of any help? https://stackoverflow.com/questions/11350770/select-by-partial-string-from-a-pandas-dataframe — pierre_loic, Aug 18 '20 at 09:46

score 1 · Answer 1 · answered Aug 18 '20 at 09:47

1

You can use re.search by converting dataframe into string like

import re
re.search('extract this:\s\d+', df.to_string()).group(0)

'extract this: 1236'

answered Aug 18 '20 at 09:47

Dishin H Goyani

5,636
3
18
29

Extract string in cell from entire Dataframe

1 Answers1