-3

I'm new in Python.

I created a code that should allow me to find the percentage of items that follows a given item in a list.

Given a list:

list1=["a", "b", "a", "c", "a", "b", "c", "d", "e", "a", "b", "d", "e", "a", "c"]

I would like to find, for each, say, "a", in which percentage every item is following. The code returns:

[(33, 'a'), (25, 'b'), (16, 'e'), (16, 'd'), (16, 'c')]
[(30, 'a'), (20, 'e'), (20, 'd'), (20, 'c'), (20, 'b')]
[(25, 'e'), (25, 'd'), (25, 'b'), (25, 'a'), (12, 'c')]
[(33, 'e'), (33, 'd'), (33, 'b'), (33, 'a')]
[]  

The output is right, and it's what i wanted.
But I would also like to sum every key of the different dictionaries, so I can have something like:

[(121, 'a'), (103, 'b'), (94, 'e'), (94, 'd'), (48, 'c')]

I didn't find a way to do that. I know that there are some ways to sum values of every key in different dictionaries, but the problem here is that dictionaries are created inside a for loop, because i need as dictionaries as much target items are (in this case, "a").

I tried to iterate in every dict with

   for key, value in dictio.items():
        dictio[key]=value + dictio.get(key, 0)
        print (dictio)

But the result is a mess, and it's not even far from what I would like to have.

I would like to know from you if it is possible to join multiple dictionaries, without knowing their number (because they are created in a for loop).

And, as I would like to understand better Python logic, I would like not to use external libraries, if it's possibile.

Thank you in advance!

Niccolò

Niccolò
  • 169
  • 7
  • 3
    Couldn't `max(range(len(list1)))` just be `len(list1)`? – SuperBiasedMan Jul 15 '15 at 09:52
  • A dict can be combined with another using the update function, that is dict1.update(dict2) and this updates dict1 in place. In case dict1 and dict2 have a key in common the value of that key in dict2 wins. –  Jul 15 '15 at 10:01
  • 4
    @SuperBiasedMan you are right, but more accurately len(list1)-1. he wants the index of last element. – yosemite_k Jul 15 '15 at 10:10
  • Thank you SuperBiasedMan for your suggestion, and yes @yosemite_k your suggestion is even more accurate. – Niccolò Jul 15 '15 at 10:25
  • 4
    Are you just trying to do [this](http://stackoverflow.com/q/31430384/2336725)? You have far too much code for asking a question (which has been noticed on [meta](http://meta.stackoverflow.com/q/299361/2336725) as well). – Teepeemm Jul 16 '15 at 19:52
  • 10
    TMC (Too Much Code)... – AStopher Jul 16 '15 at 21:26
  • Should be duplicate... maybe of "item frequency" like http://stackoverflow.com/questions/893417/item-frequency-count-in-python, but... (BTW, there is META effect in progress on the question - http://meta.stackoverflow.com/questions/299361/question-with-too-much-proven-working-code-what-to-do?cb=1) – Alexei Levenkov Jul 16 '15 at 21:38
  • As Kevin pointed out on meta, you may want to check out http://codereview.stackexchange.com if you really want people to read over *all* of your code. – Zsw Jul 16 '15 at 21:46
  • Ok, sorry guys, I'm new even in this forum. Thanks @Zsw for your suggestion. I wasn't really searching for someone to debug my code, I was just trying to be specific in order to let others understand what was going on. From now on I'll be more straight to the point. Anyway, thanks for all your suggestions, I got what I wanted :) – Niccolò Jul 17 '15 at 16:13
  • @Niccolò For the record, Stack Overflow is *not* a forum, it is a Q&A site. – AStopher Jul 21 '15 at 15:23

3 Answers3

6

Just a lazy way using Counter

from collections import Counter
d = Counter()


mylist = [[(33, 'a'), (25, 'b'), (16, 'e'), (16, 'd'), (16, 'c')],
            [(30, 'a'), (20, 'e'), (20, 'd'), (20, 'c'), (20, 'b')],
            [(25, 'e'), (25, 'd'), (25, 'b'), (25, 'a'), (12, 'c')],
            [(33, 'e'), (33, 'd'), (33, 'b'), (33, 'a')],
            []]

for i in mylist:
    d.update(dict([(m,n) for n,m in i]))
>>>[(j,i) for i,j in d.items()]
[(121, 'a'), (48, 'c'), (103, 'b'), (94, 'e'), (94, 'd')]

To sort

>>>sorted([(j,i) for i,j in d.items()], key=lambda x:x[1])
[(121, 'a'), (103, 'b'), (48, 'c'), (94, 'd'), (94, 'e')]

To get percent(assuming)

>>>[(j*100/sum(d.values()),i) for i,j in d.items()] # caution==> sum(d.values()) save in a variable, otherwise it will execute in every iteration
[(26, 'a'), (10, 'c'), (22, 'b'), (20, 'e'), (20, 'd')]
itzMEonTV
  • 17,660
  • 3
  • 31
  • 40
  • Thank you for your suggestion, but it gives me: Traceback (most recent call last): File "", line 1, in function(list1) File "", line 31, in function d.update(dict([(m, n) for n, m in i])) File "", line 31, in d.update(dict([(m, n) for n, m in i])) TypeError: 'int' object is not iterable – Niccolò Jul 15 '15 at 10:33
  • Thank you, now I get it to work. – Niccolò Jul 18 '15 at 08:42
  • Good Answer. Do add the documentation links for `collections.Counter`. – Bhargav Rao Jul 22 '15 at 19:05
0

The following will sum your keys and calculate the percentages:

import collections, itertools

d = collections.Counter()

mylist = [[(33, 'a'), (25, 'b'), (16, 'e'), (16, 'd'), (16, 'c')],
            [(30, 'a'), (20, 'e'), (20, 'd'), (20, 'c'), (20, 'b')],
            [(25, 'e'), (25, 'd'), (25, 'b'), (25, 'a'), (12, 'c')],
            [(33, 'e'), (33, 'd'), (33, 'b'), (33, 'a')],
            []]

for count, item in itertools.chain.from_iterable(mylist):
    d.update(itertools.repeat(item, count))

print "Usage order:", d.most_common()
lsorted = sorted(d.items())
print "Key order:", lsorted

total = sum(d.values())
print "Percentages:", [(key, (value * 100.0)/total) for key,value in lsorted]

Giving:

Usage order: [('a', 121), ('b', 103), ('e', 94), ('d', 94), ('c', 48)]
Key order: [('a', 121), ('b', 103), ('c', 48), ('d', 94), ('e', 94)]
Percentages: [('a', 26.304347826086957), ('b', 22.391304347826086), ('c', 10.434782608695652), ('d', 20.434782608695652), ('e', 20.434782608695652)]
Martin Evans
  • 37,882
  • 15
  • 62
  • 83
0

If you need unique followers for each item in a list you can consider taking only first occurrence of each item then counting items after it, in this case item "e" will have no new items after it (0%). But if the question is number of occurrence of an item after a given element I would proceed as follows:

list1=["a", "b", "a", "c", "a", "b", "c", "d", "e", "a", "b", "d", "e", "a", "c"]
indexlist=[list1.index(item) for item in list(set(list1))]
newlist=[list1[j] for j in sorted(indexlist)]

for item in newlist:
    print '\n',item,'Followers:'
    a=list1[list1.index(item)+1:]
    for follower in a:
        if item!=follower:
            fol=(follower,Counter.get(Counter(a),follower)*100.0/Counter.get(Counter(list1),follower))
    print fol,'round'
yosemite_k
  • 2,085
  • 10
  • 24
  • i can't understand your answer. With your code I cannot choose the target item, and it seems to me like a frequency counter, as i get: a has 100.0 % followers b has 75.0 % followers c has 50.0 % followers d has 25.0 % followers e has 0.0 % followers and also "e has 0.0% followers" seems incorrect to me. – Niccolò Jul 15 '15 at 11:35
  • e has 0.0% followers means e has no new items appearing after it. I have modified my answer to include percent of each item that appear after a given element. hope it helps. – yosemite_k Jul 15 '15 at 12:16