objective
I am trying to automatically generate an EDA report for each column in my dataframe, starting with value_counts().
problem
the problem is that my function doesn't return anything. So while it does print to console, it doesn't print that same output to my text file. I was using this to just generate syntax and then run it line-by-line in my IDE to look at all the variables, but that is not a very programmatic solution.
notes
Once this is working, I am going to add some syntax for graphs and the output of df.describe(), but for now I can't even get the basics of what I want.
Output doesnt have to be .txt, but I thought that would be easiest while getting this to work.
I tried
import pandas as pd
def EDA(df, name):
df.name = name # name == string version of df
print('#', df.name)
for val in df.columns:
print('# ', val, '\n', df[val].value_counts(dropna=False), '\n', sep='')
print(df[val].value_counts(dropna=False))
path = 'Data/nameofmyfile.csv'
# name of df
activeWD = pd.read_csv(path, skiprows=6)
f = open('Output/outtext.txt', 'a+', encoding='utf-8')
f.write(EDA(activeWD, 'activeWD'))
f.close()
also tried
various version of replacing
print
withreturn
def EDA(df, name):
df.name = name # name == string version of df print('#', df.name) for val in df.columns: print('# ', val, '\n', df[val].value_counts(dropna=False), '\n', sep='') return(df[val].value_counts(dropna=False))
running file from anaconda prompt
Python Syntax\newdataEDA.5.py >> Output.outtext.txt
which results in the following codec error:
(base) C:\Users\auracoll\Analytic Projects\IDL Attrition>Python Syntax\newdatanewlife11.5.py >> Output.outtext.txt
sys:1: DtypeWarning: Columns (3,16,39,40,41,42,49) have mixed types. Specify dtype option on import or set low_memory=False.
Traceback (most recent call last):
File "Syntax\newdatanewlife11.5.py", line 46, in <module>
EDA(activeWD, name='activeWD')
File "Syntax\newdatanewlife11.5.py", line 38, in EDA
print(df[col].value_counts(dropna=False))
File "C:\ProgramData\Anaconda3\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 382-385: character maps to <undefined>
I tried encoding='utf-8'
and encoding='ISO-8859-1'
, neither of which resolve this problem.
I have tried to save intermediary variables, which return none type.
testvar = for val in df.columns: df[val].value_counts(dropna=False)
when I do this, testvar is NoneType object of builtins module