Have you tried using Pandas dataframe.infer_objects()
?
# importing pandas as pd
import pandas as pd
# Creating the dataframe
df = pd.DataFrame({"A":["alpha", 15, 81, 1, 100],
"B":[2, 78, 7, 4, 12],
"C":["beta", 21, 14, 61, 5]})
# data frame info and data
df.info()
print(df)
# slice all rows except first into a new frame
df_temp = df[1:]
# print it
print(df_temp)
df_temp.info()
# infer the object types
df_inferred = df_temp.infer_objects()
# print inferred
print(df_inferred)
df_inferred.info()
Here's the output from the above py script.
Initially df is inferred as object, int64 and object for A, B and C respectively.
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 A 5 non-null object
1 B 5 non-null int64
2 C 5 non-null object
dtypes: int64(1), object(2)
memory usage: 248.0+ bytes
A B C
0 alpha 2 beta
1 15 78 21
2 81 7 14
3 1 4 61
4 100 12 5
A B C
1 15 78 21
2 81 7 14
3 1 4 61
4 100 12 5
After removing the first exception row which has the strings, the data frame is still showing the same type.
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 1 to 4
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 A 4 non-null object
1 B 4 non-null int64
2 C 4 non-null object
dtypes: int64(1), object(2)
memory usage: 228.0+ bytes
A B C
1 15 78 21
2 81 7 14
3 1 4 61
4 100 12 5
After infer_objects(), the types have been correctly inferred as int64.
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 1 to 4
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 A 4 non-null int64
1 B 4 non-null int64
2 C 4 non-null int64
dtypes: int64(3)
memory usage: 228.0 bytes
Is this what you need?