0

I have a large csv file in which I want to replace values with zero in a particular range of time. For example in between 20:00:00 to 05:00:00 I want to replace all the values greater than zero with 0. How do I do it?

dff = pd.read_csv('108e.csv', header=None)  # reading the data set
data =  df.copy()
df = pd.DataFrame(data)
df['timeStamp'] =  pd.to_datetime(df['timeStamp'])

for i in df.set_index('timeStamp').between_time('20:00:00' , '05:00:00')['luminosity']:

   if( i > 0):
        df[['luminosity']] = df[["luminosity"]].replace({i:0})
  • Try this: https://stackoverflow.com/questions/43757977/replacing-values-greater-than-a-number-in-pandas-dataframe – alikhtag Jul 08 '20 at 12:14

2 Answers2

0

You can use the function select from numpy.

import numpy as np
df['luminosity'] = np.select((df['timeStamp']>='20:00:00') & (df['timeStamp']<='05:00:00') & (df['luminosity']>=0), 0, df['luminosity'])

Here are other examples to use it and here are the official docs.

PV8
  • 4,547
  • 3
  • 24
  • 52
0

Assume that your DataFrame contains:

             timeStamp  luminosity
0  2020-01-02 18:00:00          10
1  2020-01-02 20:00:00          11
2  2020-01-02 22:00:00          12
3  2020-01-03 02:00:00          13
4  2020-01-03 05:00:00          14
5  2020-01-03 07:00:00          15
6  2020-01-03 18:00:00          16
7  2020-01-03 20:10:00          17
8  2020-01-03 22:10:00          18
9  2020-01-04 02:10:00          19
10 2020-01-04 05:00:00          20
11 2020-01-04 05:10:00          21
12 2020-01-04 07:00:00          22

To only retrieve rows in the time range of interest you could run:

df.set_index('timeStamp').between_time('20:00' , '05:00')

But if you attempted to modify these data, e.g.

df = df.set_index('timeStamp')
df.between_time('20:00' , '05:00')['luminosity'] = 0

you would get SettingWithCopyWarning. The reason is that this function returns a view of the original data.

To circumvent this limitation, you can use indexer_between_time, on the index of a DataFrame, which returns a Numpy array - locations of rows meeting your time range criterion.

To update the underlying data, with setting index only to get row positions, you can run:

df.iloc[df.set_index('timeStamp').index\
    .indexer_between_time('20:00', '05:00'), 1] = 0

Note that to keep the code short, I passed the int location of the column of interest.

Access by iloc should be quite fast.

When you print the df again, the result is:

             timeStamp  luminosity
0  2020-01-02 18:00:00          10
1  2020-01-02 20:00:00           0
2  2020-01-02 22:00:00           0
3  2020-01-03 02:00:00           0
4  2020-01-03 05:00:00           0
5  2020-01-03 07:00:00          15
6  2020-01-03 18:00:00          16
7  2020-01-03 20:10:00           0
8  2020-01-03 22:10:00           0
9  2020-01-04 02:10:00           0
10 2020-01-04 05:00:00           0
11 2020-01-04 05:10:00          21
12 2020-01-04 07:00:00          22
Valdi_Bo
  • 24,530
  • 2
  • 17
  • 30