Pandas - Find some specific rows and output the filtered dataframe into a new CSV file

Question

I have a csv file with 15 columns and around 17000 rows. My problem is to search in a specific column (for example: column 'name') for an input string, if it matches, print the the row [i] that contains the string, the previous row [i-1] and the next row [i+1], in order i-1, i, i+1. Repeat the process till the last element of the column (my data file is formated so that it contains no duplicate).

I use this reference to find the rows and the program runs well. Below is my python code:

import pandas as pd
x = input('Please input the name: ')
df = pd.read_csv("input.csv", sep = ",")

idx = df[df.name.str.contains(x, na=False)].index.tolist()

for i in idx:
    print(df.iloc[[i-1, i, i+1]])

I would like to ask how to export the filtered data above to a new dataframe and output it to a new csv file? I follow this reference:

df.iloc[[i-1, i, i+1]].to_csv('result.csv', index=True, mode='a')

The output file is ok but it doesn't include the columns' names and I also think that it is not so formal and optimal acording to the author of the topic.

Thank you very much.

score 0 · Answer 1 · answered Nov 06 '18 at 08:04

0

You can try this:

tmp = pd.DataFrame() # A blank dataframe
for i in idx:
    tmp.append(df.iloc[[i-1, i, i+1]]) # Keep appending the rows to the tmp df

Then, assign column names to tmp

tmp.columns = df.columns

Write this to csv:

tmp.to_csv('result.csv', index=True)

Let me know if this helps.

answered Nov 06 '18 at 08:04

Mayank Porwal

27,201
7
25
45

Thank you for your help. I try your solution but it returned an error code: "ValueError: Length mismatch: Expected axis has 0 elements, new values have 16 elements". – flamingheart Nov 06 '18 at 08:29
File "C:/PycharmProjects/name/name-detection.py", line 9, in tmp.columns = df.columns ValueError: Length mismatch: Expected axis has 0 elements, new values have 16 elements. – flamingheart Nov 06 '18 at 08:41

jezrael · Accepted Answer · 2018-11-06T08:12:09.143

I think you need min and max for avoid selecting not exist row before first and after last matched rows, then for new file first save only columns names and then in loop save only data with no header:

df = pd.DataFrame({
        'A':list('abcdef'),
         'B':[4,5,4,5,5,4],
         'C':[7,8,9,4,2,3],
         'D':[1,3,5,7,1,0],
         'E':[5,3,6,9,2,4],
         'name':list('aaabbb')
})

print (df)

#tested matching first row
x = 'a'
#tested matching last row
#x = 'b'

idx = df[df.name.str.contains(x, na=False)].index.tolist()

pd.DataFrame(columns=df.columns).to_csv('result.csv')
for i in idx:
    df1 = df.iloc[[max(0, i-1), i, min(df.index[-1], i+1)]]

    df1.to_csv('result.csv', index=False, mode='a', header=None)
    #if need index values
    #df1.to_csv('result.csv', mode='a', header=None)

Another solution is use concat of list of DataFrames, then save to csv with no mode append:

x = 'a'
idx = df[df.name.str.contains(x, na=False)].index.tolist()

dfs = []
for i in idx:
    dfs.append(df.iloc[[max(0, i-1), i, min(df.index[-1], i+1)]])

#list comprehension alternative
#dfs = [df.iloc[[max(0, i-1), i, min(df.index[-1], i+1)]] for i in idx]

pd.concat(dfs).to_csv('result.csv', index=False)
#if need index
#pd.concat(dfs).to_csv('result.csv')

Thank you very much for your help! I tried your first solution, it returned the columns name but shifted one column: the first column name is empty, columns names start from the second column (I checked Index=False). The second solution works flawlessly, since I'm new to programming, this solution provides new information [concat](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.concat.html) for working on DataFrame. — flamingheart, Nov 06 '18 at 08:37
I figured it out in the first solutions, just add `index=False` to `to_csv` method and the problem solved. The shifted column is for index. `pd.DataFrame(columns=df.columns).to_csv('result.csv', index=False)`. Thank you again for your help. — flamingheart, Nov 06 '18 at 09:12

Pandas - Find some specific rows and output the filtered dataframe into a new CSV file

2 Answers2