How to delete a column in panda if in a row we can not see a value SpaceX in panda?

Question

I have an excel file to analyze but have a lot of data that I don't want to analyze, can we delete a column if we don't find the value SpaceX string in the first row like following

SL#   State   District  10/01/2021  10/01/2021  10/01/2021  11/01/2021  11/01/2021   11/01/2021
                        SpaceX in     Star in       StarX out       SpaceX out      Star out       StarX in
1      wb      al        10           11          12          13        14           15
2      wb      not       23           22          20          24        25           25

Now here I want to delete the columns where in the rows SpaceX not there. And then Want to delete the SpaceX as well to shift up the rows ultimate output will look like as follows

SL#   State   District  10/01/2021    11/01/2021
1      wb      al        10            13      
2      wb      not       23            24

Tried with loc and iloc functions but no clue at the moment.

Also checked the answer: Drop columns if rows contain a specific value in Pandas but it's different. I'm checking the substring not the exact value match.

Anurag Dabas · Accepted Answer · 2021-05-23T08:08:30.733

2

Firstly create a boolean mask with startswith() method and fillna() method:

mask=df.loc[0].str.startswith('SpaceX').fillna(True)

Finally use Transpose(T) attribute,loc accessor and drop() method:

df=df.T.loc[mask].T.drop(0)

Output of df:

    SL#     State   District    2021-01-10 00:00:00     2021-01-11 00:00:00     2021-01-12 00:00:00
1   1.0     wb      al          10                              13                  16
2   2.0     wb      not         23                              13                  16

edited May 23 '21 at 08:08

answered May 23 '21 at 07:27

Anurag Dabas

7,118
4
9
28

Nope, its not coming like this. Sample file added in github. https://github.com/DiceInstitute/data-analysis/blob/main/sample.xlsx – ThunderStorm May 23 '21 at 07:44
Updated answer kindly have a look **:)** – Anurag Dabas May 23 '21 at 07:49
Not coming as required output. https://github.com/DiceInstitute/data-analysis/blob/main/required_output.xlsx – ThunderStorm May 23 '21 at 07:56
Updated answer...kindly have a look **:)** – Anurag Dabas May 23 '21 at 08:08
But it added some extra value to the dates like 2021-01-16 00:00:00.3 instead of 2021-01-16 – ThunderStorm May 23 '21 at 08:29
It is not adding by itself...you are reading files from excel with date as columns maybe it is due to this – Anurag Dabas May 23 '21 at 08:33
1

Since the column names are same for that dates it's giving a positional value. Tried with csv read as well, coming as 19/05/2021.3 20/05/2021.3 21/05/2021.3 22/05/2021.3 Can we clean it? Like truncate ? – ThunderStorm May 23 '21 at 08:37
try: `df.columns=df.columns.astype(str).str.split('.').str[0]` – Anurag Dabas May 23 '21 at 08:48

How to delete a column in panda if in a row we can not see a value SpaceX in panda?

1 Answers1