0

I have imported a csv using pandas and I now want to remove rows that contain certain wildcards, it might be dev or testing-dashboard - these are part of a larger string in the field.

I've tried various ways to do this for my minus_testing_dashboard variable but none of them work

import numpy as np
import pandas as pd

raw_data = pd.read_csv('No License Key.csv', delimiter = ',', keep_default_na=False, low_memory=False)

selected_raw_data = raw_data[['App Config', 'App Name', 'App UUID', 'Machine ID', 'Estimated Company']].reset_index()

print(selected_raw_data.head(25))

minus_testing_dashboard = selected_raw_data.apply(lambda row: row.astype(str).str.contains('testing-dashboard').any(), axis=1).reset_index()

unique_desktops = minus_testing_dashboard['Machine ID'].nunique()
print(unique_desktops)
Martijn Pieters
  • 889,049
  • 245
  • 3,507
  • 2,997

1 Answers1

1

IIUC str.contains after joining your keywords into a bitwise OR string.

df = pd.DataFrame({'A' : ['dev_testing_123_456',
                         'just a test',
                         'testing-dashboard',
                         'keep me',
                         'and me']})

pat = '|'.join(['testing-dashboard','dev'])
#'testing-dashboard|dev'

print(df[~df['A'].str.contains(pat)])

             A
1  just a test
3      keep me
4       and me
Umar.H
  • 18,427
  • 4
  • 26
  • 52
  • Where you specific 'A' - can you also specify other columns in a table? So the search would go over multiple columns? – Steve Wood May 19 '20 at 16:20
  • You'll need to stack the columns, provide a sample of data and I'll edit my answer @Steve Wood – Umar.H May 19 '20 at 16:54