3

I have sample data which looks like the following (these are two separate rows delimited by tabs):

Details
[{'name': 'Irrelevant_Data',
  'parentName': 'Irrelevant_Scrape',
  'parentId': '2662610',
  'id': '2684157'},
 {'name': 'Irrelevant_Data',
  'parentName': 'Irrelevant_Scrape',
  'parentId': '068111',
  'id': '291005'}]
[{'name': 'Desired_Data',
  'parentName': 'Relevant_Scrape',
  'parentId': '6123777',
  'id': '31568812'},
 {'name': 'Desired_Data2',
  'parentName': 'Relevant_Scrape',
  'parentId': '6123777',
  'id': '2892718'},
 {'name': 'Irrelevant',
  'parentName': 'Irrelevant_Scrape',
  'parentId': '068111',
  'id': '8001822'}]

It's stored in a Pandas DataFrame series in one column (let's call the column "Details"). I want to select only those "name" elements whose "parentName" in the same row = "Relevant_Scrape."

I'm familiar with the different data structures in Python and also am somewhat familiar with Pandas, but the combination of the two is throwing me off. When I try to loop through the series, my data is transformed into a string, making extraction much harder.

import pandas as pd
from pandas import DataFrame, read_csv

df = pd.read_csv('dataset.csv', sep = '\t')
for row in df['Details']:
    if "Relevant_Scrape" in "parentname":
        print("name")

Thank you in advance.

Edit 2: expanded sample

queryName   date    summary tagging Details
query1  3/31/2016   negative    ['Dummy - Dummy']   [{'name': 'Irrelevant_Data', 'parentName': 'Irrelevant_Scrape', 'parentId': '2517840', 'id': '2565351'}]
query2  3/26/2016   positive    ['Dummy', 'Dummy', 'Dummy'] [{'name': 'Irrelevant_Data', 'parentName': 'Irrelevant_Scrape', 'parentId': '2662610', 'id': '2684157'}, {'name': 'Irrelevant_Data', 'parentName': 'Irrelevant_Scrape', 'parentId': '2517840', 'id': '2565351'}]
query3  3/26/2016   neutral ['Dummy']   [{'name': 'Irrelevant_Data', 'parentName': 'Irrelevant_Scrape', 'parentId': '2662610', 'id': '2684157'}, {'name': 'Irrelevant_Data', 'parentName': 'Irrelevant_Scrape', 'parentId': '2517840', 'id': '2565351'}]
query4  3/19/2016   positive    ['Dummy', 'Dummy']  [{'name': 'Relevant_Data', 'parentName': 'Relevant_Scrape', 'parentId': '2892458', 'id': '2892601'}, {'name': 'Relevant_Data', 'parentName': 'Relevant_Scrape', 'parentId': '2892458', 'id': '2892718'}, {'name': 'Irrelevant_Data', 'parentName': 'Irrelevant_Scrape', 'parentId': '2517840', 'id': '2565351'}]
dataelephant
  • 493
  • 1
  • 5
  • 16

0 Answers0