0

I have a file which shows me many variables such as A1, A2, A3... A333, B1, B2, B3 and C1, C2, C3... C151. The variables are separeted by semicolon. How can I count how often each of the multiples values occur on my file?

Each of the value/variable refers to a specific crime commited by companies in Brazil.

After organizing the data, I should be able to say, for example, that the value A9 (which stands for the slave work) appears 100 times. Or that the value C88 appears 20 times.

Here is a print of my screen: http://tinypic.com/view.php?pic=4j4ac5&s=9#.WiGss2nyuUk

And I also uploaded my txt file online: https://file.io/Bxlrgt and here: https://ufile.io/3wsmu

Here is a part of my file, which is on the link above:

A7; C38; C25; C3; C20; C18; C27; A1; A2; D1
A21; A22; C29; C7; C14
A1; A5; C4; C15; C23
A1; A5; C26; C23; C7
A1; A2; C4; C51; C52; C23
A12; C1; C53; C35
C30; C31; C22; C54; C51; C1; C55; C53; C56; C52; C57; C58; C59; C26; C3; C36; C60; C13; C15; C14; A12; A4
A9; A1; A2; A5; C47; 
A23; A1; A2; B1; A21; F1; A4; C29; C61; C26; C1; C56; C27; C37; C20; C23; C62; C5; C15; C63; C50
A24; A49; A46; A25; A26; A17; A12; A30; A51; A31; A53; A29; A54; A28; A27; A32
A1; C4; C26; C1; C3; A2; C23; A6
A1; A4; C13; C22; C65; C21; C64; A33; C19; C23; A7; C20
A1; A2; A3; A4; A5; A6; A7; B1; C1; C2; C3; C4; C5; C6; C7; C8; C9; C10; C11; C12; C13; C14; C15; C16; C17; C18; C19; C20; C21; C22
A1; A2; C23
Bhargav Rao
  • 41,091
  • 27
  • 112
  • 129
agccaesar
  • 47
  • 6
  • Is this a CSV or only in Google Docs? – jhpratt Dec 02 '17 at 23:04
  • 1
    Please post example data *as text* a screenshot is not helpful. It looks like you need help parsing some text. Help us help you. – juanpa.arrivillaga Dec 02 '17 at 23:18
  • Here is a link to my txt file: https://file.io/Bxlrgt – agccaesar Dec 02 '17 at 23:25
  • @agccaesar Try again! – tonypdmtr Dec 02 '17 at 23:27
  • Uploaded again here: https://ufile.io/3wsmu – agccaesar Dec 02 '17 at 23:28
  • A Stack Exchange question needs to be self-contained. Links can be used to enhance the question, but the question needs to be understandable from the text it contains. We don't need to see your whole file, just a small typical sample consisting of a few lines, the expected output corresponding to that data, and your code attempt in the form of a [mcve] that we can run on that data. – PM 2Ring Dec 02 '17 at 23:41

3 Answers3

2

Using Pandas:

df = pd.read_csv(csvfile, header=None)

df[0].str.split(';', expand=True).stack().str.strip().value_counts()

Output:

A1     10
A2      7
C23     7
C1      5
C26     4
C3      4
A5      4
C15     4
A4      4
C20     4
dtype: int64
Scott Boston
  • 114,762
  • 11
  • 99
  • 130
1

This should work:

ans = {}
for line in open('file.txt','r'):
  for item in line.strip().split(';'):
    item = item.strip()
    try: ans[item] += 1
    except: ans[item] = 1

for k in ans:
  print(f'Item {k} appears {ans[k]} time(s)')
tonypdmtr
  • 2,832
  • 2
  • 15
  • 27
0

Based on the file shown in the question, you can read the file elements into a list like this:

def read_file(filename):
    data = []

    with open(filename) as file:
        for line in file.readlines():
            temp = line.replace(";", " ").split()

            for item in temp:
                data.append(item)

    return data

print(read_file("data.txt"))
# ['A7', 'C38', 'C25', 'C3', 'C20', 'C18', 'C27', 'A1', 'A2', 'D1', 'A21', 'A22', 'C29', 'C7', 'C14', 'A1', 'A5', 'C4', 'C15', 'C23', 'A1', 'A5', 'C26', 'C23', 'C7', 'A1', 'A2', 'C4', 'C51', 'C52', 'C23', 'A12', 'C1', 'C53', 'C35', 'C30', 'C31', 'C22', 'C54', 'C51', 'C1', 'C55', 'C53', 'C56', 'C52', 'C57', 'C58', 'C59', 'C26', 'C3', 'C36', 'C60', 'C13', 'C15', 'C14', 'A12', 'A4', 'A9', 'A1', 'A2', 'A5', 'C47', 'A23', 'A1', 'A2', 'B1', 'A21', 'F1', 'A4', 'C29', 'C61', 'C26', 'C1', 'C56', 'C27', 'C37', 'C20', 'C23', 'C62', 'C5', 'C15', 'C63', 'C50', 'A24', 'A49', 'A46', 'A25', 'A26', 'A17', 'A12', 'A30', 'A51', 'A31', 'A53', 'A29', 'A54', 'A28', 'A27', 'A32', 'A1', 'C4', 'C26', 'C1', 'C3', 'A2', 'C23', 'A6', 'A1', 'A4', 'C13', 'C22', 'C65', 'C21', 'C64', 'A33', 'C19', 'C23', 'A7', 'C20', 'A1', 'A2', 'A3', 'A4', 'A5', 'A6', 'A7', 'B1', 'C1', 'C2', 'C3', 'C4', 'C5', 'C6', 'C7', 'C8', 'C9', 'C10', 'C11', 'C12', 'C13', 'C14', 'C15', 'C16', 'C17', 'C18', 'C19', 'C20', 'C21', 'C22', 'A1', 'A2', 'C23']

Then you can simply use collections.defaultdict to collect the characters frequency in a dictionary:

import collections

def get_counts(data):
    freq = collections.defaultdict(int)

    for char in data:
        freq[char] += 1

    return freq

print(get_counts(read_file("data.txt")))
# defaultdict(<class 'int'>, {'A54': 1, 'C3': 4, 'C1': 5, 'C50': 1, 'A53': 1, 'C60': 1, 'C17': 1, 'C65': 1, 'C10': 1, 'C9': 1, 'A5': 4, 'C59': 1, 'A25': 1, 'C37': 1, 'A49': 1, 'C55': 1, 'C4': 4, 'C36': 1, 'C58': 1, 'A23': 1, 'C14': 3, 'A29': 1, 'A6': 2, 'A2': 7, 'C15': 4, 'C6': 1, 'C2': 1, 'B1': 2, 'C30': 1, 'C29': 2, 'A4': 4, 'C52': 2, 'C12': 1, 'A32': 1, 'C16': 1, 'C27': 2, 'C57': 1, 'A24': 1, 'C63': 1, 'D1': 1, 'C5': 2, 'C23': 7, 'A28': 1, 'C54': 1, 'A30': 1, 'C7': 3, 'A12': 3, 'A21': 2, 'C25': 1, 'C19': 2, 'C61': 1, 'C56': 2, 'A9': 1, 'C47': 1, 'C38': 1, 'C13': 3, 'A31': 1, 'C35': 1, 'A17': 1, 'A46': 1, 'A51': 1, 'A3': 1, 'C11': 1, 'C62': 1, 'C22': 3, 'A33': 1, 'A7': 3, 'C51': 2, 'C64': 1, 'C26': 4, 'F1': 1, 'A1': 10, 'C8': 1, 'C20': 4, 'C18': 2, 'A22': 1, 'A26': 1, 'A27': 1, 'C53': 2, 'C21': 2, 'C31': 1})

Or collections.Counter:

import collections

def get_counts(data):
    return collections.Counter(data)

print(get_counts(read_file("data.txt")))
# Counter({'A1': 10, 'A2': 7, 'C23': 7, 'C1': 5, 'C3': 4, 'A5': 4, 'A4': 4, 'C4': 4, 'C15': 4, 'C26': 4, 'C20': 4, 'C22': 3, 'A12': 3, 'A7': 3, 'C7': 3, 'C14': 3, 'C13': 3, 'C56': 2, 'A21': 2, 'B1': 2, 'C27': 2, 'A6': 2, 'C18': 2, 'C5': 2, 'C51': 2, 'C53': 2, 'C19': 2, 'C29': 2, 'C52': 2, 'C21': 2, 'A46': 1, 'A24': 1, 'A31': 1, 'C65': 1, 'C12': 1, 'A23': 1, 'C11': 1, 'F1': 1, 'C16': 1, 'C10': 1, 'C36': 1, 'C61': 1, 'A27': 1, 'C8': 1, 'C37': 1, 'C59': 1, 'A28': 1, 'C50': 1, 'A49': 1, 'C38': 1, 'A25': 1, 'C35': 1, 'A32': 1, 'C25': 1, 'A9': 1, 'A17': 1, 'A54': 1, 'C58': 1, 'C2': 1, 'A26': 1, 'D1': 1, 'C6': 1, 'C63': 1, 'C47': 1, 'C62': 1, 'C31': 1, 'C30': 1, 'C55': 1, 'A29': 1, 'A51': 1, 'A53': 1, 'C9': 1, 'A3': 1, 'C17': 1, 'C60': 1, 'C57': 1, 'A30': 1, 'C64': 1, 'A22': 1, 'A33': 1, 'C54': 1})
RoadRunner
  • 23,173
  • 5
  • 28
  • 59