0

I combined two dataframes , and as an output I get this results :

 Proj    CF  VPC
0   A   [2021-01-26]  [NaT,2019-03-18]
1   B   [NaT]  [2016-03-18,2018-03-24]
2   C   [NaT,NaT]  [2018-01-26,NaT]

so I want to remove all the NaT , so the result expected is :

Proj    CF  VPC
    0   A   [2021-01-26]  [2019-03-18]
    1   B                 [2016-03-18,2018-03-24]
    2   C                 [2018-01-26]

I tried with this code below but it doesn't work (the comma and the [] stay):

df.fillna('', inplace=True)
df 

Any suggestion ?

  • Have you looked at this [Pandas Replace NaN with blank/empty string](https://stackoverflow.com/questions/26837998/pandas-replace-nan-with-blank-empty-string) for a possible solution – itprorh66 Jan 26 '21 at 14:37
  • Does this answer your question? [Pandas Replace NaN with blank/empty string](https://stackoverflow.com/questions/26837998/pandas-replace-nan-with-blank-empty-string) – itprorh66 Jan 26 '21 at 14:37
  • @itprorh66 yes I tried with this code but it doesn't work because in my case I want to replace NaT with nothing even with '' because when the dataframes combine, if there is something in a cell (nan or NaT or white space) it gives bad results – Fazia Chenna Jan 26 '21 at 14:55
  • I don't understand. You show the results of combining your data frames using an image, if you then replace the NaN values in the combined DataFrame with " " and output to excel, you should get the desired results. What do you mean it gives bad results? – itprorh66 Jan 26 '21 at 15:05
  • @itprorh66 I added the code I had tried in my question, I mean by "bad results" the NaT was still there! – Fazia Chenna Jan 26 '21 at 15:36
  • I have edited my answer to reflect your updated question – itprorh66 Jan 27 '21 at 15:34

1 Answers1

1

Now that you have explained the problem further. Given a DataFrame like:

di = {'Proj':['A', 'B', 'C'], 'CF':[[pd.to_datetime('2021/01/26')], [pd.to_datetime(np.nan)], [pd.to_datetime(np.nan), pd.to_datetime(np.nan)] ], 
      'VPC':[[pd.to_datetime(np.nan), pd.to_datetime('2019/03/18')], [pd.to_datetime('2016/03/18'), pd.to_datetime('2018/03/24')], [pd.to_datetime('2018/03/26'), pd.to_datetime(np.nan)]]}
df = pd.DataFrame(di)
df

The frame looks like:

    Proj    CF                  VPC
0   A   [2021-01-26 00:00:00]   [NaT, 2019-03-18 00:00:00]
1   B   [NaT]                   [2016-03-18 00:00:00, 2018-03-24 00:00:00]
2   C   [NaT, NaT]              [2018-03-26 00:00:00, NaT]  

Because the NaTs are embedded within Frame Row cell lists, I would proceed as follows:

def replaceNaTsvalue(col_data):
    rslt = []    
    for row in col_data:
        row_data = []
        for itm in row:
            if not pd.isnull(itm):
                row_data.append(itm)
        
        if len(row_data) > 0:
            rslt.append(row_data)
        else: 
            rslt.append(' ')
    return rslt  

def replace_all_NaTs(cols, dx):
    for col_name in cols:
        rslt = replaceNaTsvalue(dx[col_name])
        dx[col_name] = rslt
    return dx

Now by executing:

replace_all_NaTs(['CF', 'VPC'], df)  

The resulting DF looks like:

    Proj    CF                  VPC
0   A   [2021-01-26 00:00:00]   [2019-03-18 00:00:00]
1   B                           [2016-03-18 00:00:00, 2018-03-24 00:00:00]
2   C                           [2018-03-26 00:00:00]
itprorh66
  • 1,350
  • 3
  • 4
  • 14