0

I am asking for any other algorithm or method that you would use to detect anomalies on a single column.

Filtering by columns not showing the data.

I am using the following approach to limit my dataframe only to two columns

X=pd.read_csv(‘C:/Users/Path/file.csv’, usecols=[“Describe_File”, "numbers"])
Describe_File   numbers
0   This is the start   25
1   Ending is coming    42
2   Middle of the story 525
3   This is the start   65
4   This is the start   25
5   Middle of the story 35
6   This is the start   28
7   This is the start   24
8   Ending is coming    24
9   Ending is coming    35
10  Ending is coming    25
11  Ending is coming    24
12  This is the start   215

Now I want to go to column ** Describe_File** , filter by the string This is the start and then show my the values of numbers

To do so I usually use the following code, by for some reason it is not giving me anything. The string exists on my csv file

X = X[X.Describe_File == "This is the start"]
E199504
  • 393
  • 1
  • 8

1 Answers1

1

You can use the .str.contains() - vectorized substring search, i.e.

df = X[X.Describe_File.str.contains("This is the start", regex=False)]
Oleg O
  • 873
  • 3
  • 10
  • @Oleg O I am still getting the same result when I type ```df.shape``` getting ```(0, 2) ``` – E199504 Mar 02 '20 at 09:27
  • There's something wrong with strings then (some crazy symbols that snuck in). You can try to find it out by reducing the substring, e.g. `contains("start", regex=False)` – Oleg O Mar 02 '20 at 09:42