0

I have a very long array (over 2 million values) with repeating value. It looks something like this:

array  = [1,1,1,1,......,2,2,2.....3,3,3.....]

With a bunch of different values. I want to create individual arrays for each group of points. IE: an array for the ones, an array for the twos, and so forth. So something that would look like:

array1 = [1,1,1,1...]
array2 = [2,2,2,2.....]
array3 = [3,3,3,3....]
.
.
.
.


None of the values occur an equal amount of time however, and I don't know how many times each value occurs.  Any advice?

wilada
  • 69
  • 3

5 Answers5

3

Assuming that repeated values are grouped together (otherwise you simply need to sort the list), you can create a nested list (rather than a new list for every different value) using itertools.groupby:

from itertools import groupby
array  = [1,1,1,1,2,2,2,3,3]

[list(v) for k,v in groupby(array)]
[[1, 1, 1, 1], [2, 2, 2], [3, 3]]

Note that this will be more convenient than creating n new lists created dinamically as shown for instance in this post, as you have no idea of how many lists will be created, and you will have to refer to each list by its name rather by simply indexing a nested list

yatu
  • 75,195
  • 11
  • 47
  • 89
  • awesome, and would there be a way to see how many groups there are then? would len() tell you the length? – wilada Mar 19 '19 at 14:22
  • Yes just take the `len` of the list @wilada. Don't forget you can upvote/accept if it helped :) – yatu Mar 19 '19 at 14:26
1

You can use bisect.bisect_left to find the indices of the first occurence of each element. This works only if the list is sorted:

from bisect import bisect_left

def count_values(l, values=None):
    if values is None:
        values = range(1, l[-1]+1)  # Default assume list is [1..n]
    counts = {}
    consumed = 0
    val_iter = iter(values)
    curr_value = next(val_iter)
    next_value = next(val_iter)
    while True:
        ind = bisect_left(l, next_value, consumed)
        counts[curr_value] = ind - consumed
        consumed = ind
        try:
            curr_value, next_value = next_value, next(val_iter)
        except StopIteration:
            break
    counts[next_value] = len(l) - consumed
    return counts

l = [1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3]

print(count_values(l))
# {1: 9, 2: 8, 3: 7}

This avoids scanning the entire list, trading that for a binary search for each value. Expect this to be more performant where there are very many of each element, and less performant where there are few of each element.

Patrick Haugh
  • 49,982
  • 11
  • 66
  • 73
0

Well, it seems to be wasteful and redundant to create all those arrays, each of which just stores repeating values.

You might want to just create a dictionary of unique values and their respective counts.

From this dictionary, you can always selectively create any of the individual arrays easily, whenever you want, and whichever particular one you want.

To create such a dictionary, you can use:

from collections import Counter

my_counts_dict = Counter(my_array)

Once you have this dict, you can get the number of 23's, for example, with my_counts_dict[23].

And if this returns 200, you can create your list of 200 23's with:

my_list23 = [23]*200
fountainhead
  • 3,294
  • 1
  • 5
  • 15
0

****Use this code ****

<?php 
$arrayName =  array(2,2,5,1,1,1,2,3,3,3,4,5,4,5,4,6,6,6,7,8,9,7,8,9,7,8,9);
$arr = array();
foreach ($arrayName as $value) {
 $arr[$value][] = $value;
 }
  sort($arr);
 print_r($arr);
 ?>
Piyush Dhanotiya
  • 481
  • 4
  • 18
0

Solution with no helper functions:

array  = [1,1,2,2,2,3,4]

result = [[array[0]]]
for i in array[1:]:
    if i == result[-1][-1]:
        result[-1].append(i)
    else:
        result.append([i])

print(result)
# [[1, 1], [2, 2, 2], [3], [4]]
Mykola Zotko
  • 8,778
  • 2
  • 14
  • 39