How can i delete missing values in my csv_file

Question

How can i delete missing values in my csv_file,either in a,b,c...: This is my coding:

import numpy as np
FNAME ="C:/Users/lenovo/Desktop/table.csv"

my_data = np.genfromtxt (FNAME, delimiter = ',')
a= my_data [:,0]
b= my_data [:,1]
c= my_data [:,2]
d=my_data[:,3]
e= my_data[:,4]
f= my_data[:,5]
g= my_data[:,6]

An extract of my csv_file:

0,1,135,3,82,4,1
0,1,98,5,82,3,1
21175,1,98,5,82,3,1
9147,2,80,5,82,2,2
1829,2,80,5,82,2,2
3659,2,80,5,82,2,2
10976,2,80,5,82,2,2
0,2,40,2,24,1,2
0,2,40,2,24,1,2
29710,2,40,2,24,1,2
0,1,90,3,31,2,2
0,1,90,3,31,2,2
11434,1,90,3,31,2,2
0,2,85,4,72,3,2
6039,2,85,4,72,3,2
34758,1,100,,52,,
0,1,100,,52,,

Thanx

Felix Zumstein · Answer 1 · 2013-09-02T16:35:48.413

1

Pandas has a built-in method for this:

from pandas import DataFrame, read_csv

FNAME ="C:/Users/lenovo/Desktop/table.csv"
df = read_csv(FNAME, header=None, index_col=None)
print df.dropna()

edited Sep 02 '13 at 16:35

answered Sep 02 '13 at 15:18

Felix Zumstein

5,297
1
23
51

`read_csv` is the preferred way to read a CSV file in to a `DataFrame`. – Phillip Cloud Sep 02 '13 at 15:25

score 0 · Accepted Answer · answered Sep 02 '13 at 14:16

0

It isn't clear from your question if you want to delete just values from the columns (which sounds wrong to me) or from all the columns. Either way it is better to use the power of genfromtxt. I recommend you read this marvellous guide or just the docs.

In there you will find an argument missing values with this you could specify how you want to handle such occurrences when it is imported. There are many different ways to do this but one example could be using the fact that genfromtxt replaces missing floats with nan. Checking for the occurrence of nan in a row and disregarding if true:

import numpy as np
from StringIO import StringIO

data = """
0,4,1
34758,1,100
52,,
"""

my_data = np.genfromtxt(StringIO(data), delimiter=",")

index_to_use=[]
for i, row in enumerate(my_data):
    if True not in np.isnan(row):
        index_to_use.append(i)

print my_data[index_to_use]

>>>
[[  0.00000000e+00   4.00000000e+00   1.00000000e+00]
[  3.47580000e+04   1.00000000e+00   1.00000000e+02]]

For readability I have reduced your data sample.

answered Sep 02 '13 at 14:16

Greg

9,973
1
40
47

Instead of your `index_to_use` loop, you could use `np.isnan` in vectorized fashion: `my_data[~np.isnan(my_data).any(axis=1)]`. – DSM Sep 02 '13 at 14:30
In fact, I will wish to delete all the rows that contain missing values in either a, b, c, d, e, f, g. – salma Sep 02 '13 at 14:31
The suggestion I have given will remove the rows from all the columns (see the output line has 2 while the input csv data 3 rows). @DSM A very good point, I have not included it as I think it is less readable (there is quite a few obscured features such as `logical_not` hidden away). This would however be a preferred final solution, and most likely faster. – Greg Sep 02 '13 at 15:00
ok thanx but how can i make this result of my_data[index_to_use] in a csv_file? – salma Sep 02 '13 at 20:39
If you mean save the resulting array in a csv file see [this related post](http://stackoverflow.com/questions/6081008/dump-a-numpy-array-into-a-csv-file) – Greg Sep 02 '13 at 21:40

How can i delete missing values in my csv_file

2 Answers2