0

I have a csv that looks like this:

someFile.csv

Header1 Header2 Header3
aa      aaa     a
bbbb    bbbbbb  aa

I want to calculate the average string length in each column and create a csv of the results. This is what it would look like in the example:

results.csv

Header1 Header2 Header3
3       4.5     1.5

I've been trying to use the csv library in Python but I'm not having success. Is there a simple way to accomplish this?

martineau
  • 99,260
  • 22
  • 139
  • 249
yalpsid eman
  • 1,521
  • 1
  • 20
  • 45

4 Answers4

2

You can zip the rows and map the columns to len and use statistics.mean to calculate averages:

import csv
from statistics import mean
with open('someFile.csv', 'r', newline='') as f, open('results.csv', 'w', newline='') as output:
    reader = csv.reader(f, delimiter=' ', skipinitialspace=True)
    headers = next(reader)
    writer = csv.writer(output, delimiter = ' ')
    writer.writerow(headers)
    writer.writerow([mean(map(len, col)) for col in zip(*reader)])
blhsing
  • 70,627
  • 6
  • 41
  • 76
1

You can try pandas. In case you don't have pandas installed, do pip install pandas to install pandas.

import pandas as pd
# df = pd.read_csv('my_csv.csv')
df = pd.DataFrame([['aa', 'aaa', 'a'], ['bbbb', 'bbbbbb', 'aa']], 
                  columns=['Header1', 'Header2', 'Header3'])
result = pd.DataFrame([[]])
for col in df:
    result[col] = df[col].apply(len).mean()

result.to_csv('result.csv')

Hope this helps!

Pinyi Wang
  • 563
  • 2
  • 13
0

Here is a simple code. I provided two blocks, if no null in dataframe and if Nulls are present.

import pandas as pd

#df = pd.DataFrame([['aa','aaa','a'],['bbbb','bbbbbb','aa']],columns=['Header1','Header2','Header3'])
df = pd.read_csv('file.csv')

#if No Null
No_of_Row=df.shape[0]
for Col in df.sum():
    print(len(Col)/No_of_Row)

#if Null are there
for Col,Header in zip(df.sum(),df.columns):
    print(len(Col)/df[[Header]].dropna().shape[0])
Bhanu Tez
  • 308
  • 2
  • 13
0

This isn't the best way to do it. There are other ways to do this quickly. However, I do think this is a fairly straight forward and easy-to-understand example that was put together very hastily. I used this on your example and it works.

import csv

# replace "yourusername" with your PC user name
input_file = 'C:/Users/yourusername/Desktop/someFile.csv' 
output_file = 'C:/Users/yourusername/Desktop/output.csv'

csv_file = open(input_file, newline='')  # opening csv file
info = list(csv.reader(csv_file))  # convert data in csv file to array/list
csv_file.close()

length = len(info[0])  # if you ever add more headers, this will account for it
avg_container = [0 for i in range(length)]  # creates empty array with zeros for each header
n = len(info[1:])  # for dividing by n to get average

# adding the lengths of all the items to one sum for each "column"
for k in info[1:]:
    for n,i in enumerate(k):
        avg_container[n] += len(i)

# diviving all sums by n
for i in range(len(avg_container)):
    avg_container[i] = avg_container[i]/n

# combine header and average array into one item to write to csv
avg_output = []
avg_output.extend((info[0],avg_container))
print(avg_output)  # just for you to see for yourself

# outputting the new file
output_csv = open(output_file, 'w', newline='')  # creates an instance of the file
csv_writer = csv.writer(output_csv)  # creates an "Writer" to write to the csv
csv_writer.writerows(avg_output)  # outputs the avg_output variable to the csv file
output_csv.close()  # finished

References

How to import a csv-file into a data array?

Create a .csv file with values from a Python list

Writing a Python list of lists to a csv file

Noctsol
  • 418
  • 7
  • 11