How do you split a list into evenly sized chunks?

Question

I have a list of arbitrary length, and I need to split it up into equal size chunks and operate on it. There are some obvious ways to do this, like keeping a counter and two lists, and when the second list fills up, add it to the first list and empty the second list for the next round of data, but this is potentially extremely expensive.

I was wondering if anyone had a good solution to this for lists of any length, e.g. using generators.

I was looking for something useful in itertools but I couldn't find anything obviously useful. Might've missed it, though.

Related question: What is the most “pythonic” way to iterate over a list in chunks?

Before you post a new answer, consider there are already 60+ answers for this question. Please, make sure that your answer contributes information that is not among existing answers. — janniks, Feb 03 '20 at 12:17

Ned Batchelder · Accepted Answer · 2019-11-28T01:43:27.053

3668

Here's a generator that yields the chunks you want:

def chunks(lst, n):
    """Yield successive n-sized chunks from lst."""
    for i in range(0, len(lst), n):
        yield lst[i:i + n]

import pprint
pprint.pprint(list(chunks(range(10, 75), 10)))
[[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
 [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
 [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
 [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
 [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
 [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
 [70, 71, 72, 73, 74]]

If you're using Python 2, you should use xrange() instead of range():

def chunks(lst, n):
    """Yield successive n-sized chunks from lst."""
    for i in xrange(0, len(lst), n):
        yield lst[i:i + n]

Also you can simply use list comprehension instead of writing a function, though it's a good idea to encapsulate operations like this in named functions so that your code is easier to understand. Python 3:

[lst[i:i + n] for i in range(0, len(lst), n)]

Python 2 version:

[lst[i:i + n] for i in xrange(0, len(lst), n)]

edited Nov 28 '19 at 01:43

answered Nov 23 '08 at 12:33

Ned Batchelder

323,515
67
518
625

80

What happens if we can't tell the length of the list? Try this on itertools.repeat([ 1, 2, 3 ]), e.g. – jespern Nov 23 '08 at 12:51
50

That's an interesting extension to the question, but the original question clearly asked about operating on a list. – Ned Batchelder Nov 23 '08 at 13:53
2

@jespern I guess with an infinite or indefinite-length list you go to the [related question](http://stackoverflow.com/questions/434287/) that [J.F. Sebastian](http://stackoverflow.com/users/4279/j-f-sebastian) linked: [What is the most “pythonic” way to iterate over a list in chunks?](http://stackoverflow.com/questions/434287/) – n611x007 Apr 24 '14 at 09:19
67

this functions needs to be in the damn standard library – dgan Feb 04 '18 at 14:19
2

I'd add a generator expression example, in addition to the list comprehension ones. (Simply use `()` rather than `[]`) – SomethingSomething Apr 03 '18 at 11:33
1

-1. The question specifically asks for "evenly sized chunks". This isn't a valid answer to the question as the last chunk will be arbitrarily small. – Calimo Jun 14 '18 at 11:51
7

@Calimo: what do you suggest? I hand you a list with 47 elements. How would you like to split it into "evenly sized chunks"? The OP accepted the answer, so they are clearly OK with the last differently sized chunk. Perhaps the English phrase is imprecise? – Ned Batchelder Jun 14 '18 at 15:29
3

@NedBatchelder I agree the question is pretty ill-defined, but you can split a list of 47 elements in 5 chunks of 9, 9, 9, 10 and 10 elements, instead of 7, 10, 10, 10 and 10. It is not exactly even, but that's what I had in mind when I googled the "even sized chunks" keywords. This means you need n to define the number of chunks, not their size. An other answer below suggests a way to do it actually. Your answer is basically the same as the ones in the linked "related question". – Calimo Jun 14 '18 at 15:46
4

Most people will be looking at this for batch processing and rate limiting, so it usually doesn't matter if the last chunk is smaller – Alvaro Jul 04 '19 at 12:46
1

On my system, I don't get the same output, but rather a list of range objects. To get the same output as you I used this: `[list(s) for s in chunks(range(10, 75), 10)] ` – marsipan Oct 26 '19 at 15:41
@Alvaro you will find that if your last batch only has one element, you have a wasted core with nothing to do, whilst others may have quite a few left which could easily have be done elsewhere. – user2589273 May 12 '20 at 19:52
I prefer to pad the last one so that all chunks are the same size. So `def ichunks(it, n, pad=None):` etc. And yes, +1 for iterator on a (possibly infinite) stream. Much more general than assuming a finite-sized collection. – Pierre D May 20 '20 at 02:11
1

I'm surprised there isn't a function for this in the standard library already. – Mr. Lance E Sloan Aug 17 '20 at 03:17
why is this accepted answer, it doesn't spread them in the best even way. – Bob Bobster Feb 28 '21 at 21:32

score 597 · Answer 2 · edited Jan 06 '20 at 10:56

597

If you want something super simple:

def chunks(l, n):
    n = max(1, n)
    return (l[i:i+n] for i in range(0, len(l), n))

Use xrange() instead of range() in the case of Python 2.x

edited Jan 06 '20 at 10:56

Roelant

3,185
1
18
49

answered Nov 17 '09 at 20:17

oremj

6,066
1
13
3

6

Or (if we're doing different representations of this particular function) you could define a lambda function via: lambda x,y: [ x[i:i+y] for i in range(0,len(x),y)] . I love this list-comprehension method! – J-P Aug 20 '11 at 13:54
4

after return there must be [, not ( – alwbtc Jun 01 '17 at 06:45
2

"Super simple" means not having to debug infinite loops -- kudos for the `max()`. – Bob Stein May 15 '18 at 17:49
1

there is nothing simple about this solution – mit Oct 19 '18 at 12:36
Note that the outcome with input list `['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J']`would be `[['A', 'B', 'C'], ['D', 'E', 'F'], ['G', 'H', 'I'], ['J']]` and not `[['A', 'B', 'C'], ['D', 'E', 'F'], ['G', 'H'], ['I', 'J']]` – np8 Aug 14 '19 at 08:59
1

@BobStein I am a newbie, how does max() have an effect? – CountDOOKU Apr 27 '20 at 02:55
1

@Nhoj_Gonk Oops it's not an infinite loop, but chunks(L, 0) would raise a ValueError without the max(). Instead, the max() turns anything less than 1 into a 1. – Bob Stein Apr 27 '20 at 09:58

tzot · Answer 3 · 2017-09-21T09:47:29.687

316

Directly from the (old) Python documentation (recipes for itertools):

from itertools import izip, chain, repeat

def grouper(n, iterable, padvalue=None):
    "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
    return izip(*[chain(iterable, repeat(padvalue, n-1))]*n)

The current version, as suggested by J.F.Sebastian:

#from itertools import izip_longest as zip_longest # for Python 2.x
from itertools import zip_longest # for Python 3.x
#from six.moves import zip_longest # for both (uses the six compat library)

def grouper(n, iterable, padvalue=None):
    "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
    return zip_longest(*[iter(iterable)]*n, fillvalue=padvalue)

I guess Guido's time machine works—worked—will work—will have worked—was working again.

These solutions work because [iter(iterable)]*n (or the equivalent in the earlier version) creates one iterator, repeated n times in the list. izip_longest then effectively performs a round-robin of "each" iterator; because this is the same iterator, it is advanced by each such call, resulting in each such zip-roundrobin generating one tuple of n items.

edited Sep 21 '17 at 09:47

answered Nov 23 '08 at 15:48

tzot

81,264
25
129
197

@ninjagecko: `list(grouper(3, range(10)))` returns `[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, None, None)]`, and all tuples are of length 3. Please elaborate on your comment because I can't understand it; what do you call a _thing_ and how do you define it being _a multiple of 3_ in “expecting your thing to be a multiple of 3”? Thank you in advance. – tzot Apr 19 '11 at 13:09
If it is incorrect behavior for the user's code to have a tuple with None, they need to explicitly raise an error if `len('0123456789')%3 != 0`. This is not a bad thing, but a thing which could be documented. Oh wait my apologies... it is documented implicitly in by the padvalue=None argument. (Also by '3' I meant 'n') Nice code. – ninjagecko Apr 19 '11 at 15:51
18

upvoted this because it works on generators (no len) and uses the generally faster itertools module. – Michael Dillon Jan 30 '12 at 23:47
95

A classic example of fancy `itertools` functional approach turning out some unreadable sludge, when compared to a simple and naive pure python implementation – wim Apr 12 '13 at 05:40
15

@wim Given that this answer began as a snippet from the Python documentation, I'd suggest you open an issue on http://bugs.python.org/ . – tzot Apr 12 '13 at 11:36
For reference, the solution is part of the docs, under itertools recipes: https://docs.python.org/3/library/itertools.html#itertools-recipes – Juan Carlos Ramirez Mar 14 '19 at 16:11
Can someone explain or point me to the right concept of why is there a `*` before `[iter(iterable)]*n` ? – pedrosaurio Aug 20 '19 at 06:37
1

@pedrosaurio if `l==[1, 2, 3]` then `f(*l)` is equivalent to `f(1, 2, 3)`. See [that question](https://stackoverflow.com/questions/36901/) and [the official documentation](https://docs.python.org/3/reference/expressions.html#calls). – tzot Aug 21 '19 at 08:02
My problem with this solution is that the result is not only the contents of the input iterator split up, but additional elements are added to the last chunk to make it the "right" size. – Graham Lea Jul 23 '20 at 02:05
@GrahamLea I am not sure what you mean by “split up”; do you mean “consumed”? As for the padding of the last group, I'll come back with a non-padding version. EDIT: it's already provided by senderle. – tzot Jul 23 '20 at 16:04

score 303 · Answer 4 · edited Oct 14 '19 at 08:16

303

I know this is kind of old but nobody yet mentioned numpy.array_split:

import numpy as np

lst = range(50)
np.array_split(lst, 5)
# [array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
#  array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19]),
#  array([20, 21, 22, 23, 24, 25, 26, 27, 28, 29]),
#  array([30, 31, 32, 33, 34, 35, 36, 37, 38, 39]),
#  array([40, 41, 42, 43, 44, 45, 46, 47, 48, 49])]

edited Oct 14 '19 at 08:16

Georgy

6,348
7
46
58

answered Jun 05 '13 at 08:54

Moj

4,733
2
19
33

17

This allows you to set the total number of chunks, not the number of elements per chunk. – FizxMike Sep 09 '15 at 03:03
7

you can do the math yourself. if you have 10 elements you can group them into 2, 5 elements chunks or five 2 elements chunks – Moj Sep 09 '15 at 07:27
33

+1 This is my favorite solution, as it splits the array into *evenly* sized arrays, while other solutions don't (in all other solutions I looked at, the last array may be arbitrarily small). – MiniQuark Jun 28 '16 at 17:26
1

@MiniQuark but what does this do when the number of blocks isn't a factor of the original array size? – Baldrickk May 18 '18 at 11:12
4

@Baldrickk If you split N elements into K chunks, then the first N%K chunks will have N//K+1 elements, and the rest will have N//K elements. For example, if you split an array containing 108 elements into 5 chunks, then the first 108%5=3 chunks will contain 108//5+1=22 elements, and the rest of the chunks will have 108//5=21 elements. – MiniQuark May 18 '18 at 15:31
Note that if there're more chunks than array elements, this solution will pad the result with empty arrays. – schrödingercöder Jan 30 '20 at 18:52
How do you split it into an array of equal size? As an example, Let's say I want to split np.arange(14) array into an equal size of an array size of 5. The final output I am looking for is three arrays of two size of 5 and one of 4 – sushmit Oct 19 '20 at 21:40

senderle · Answer 5 · 2018-11-17T01:16:27.813

207

I'm surprised nobody has thought of using iter's two-argument form:

from itertools import islice

def chunk(it, size):
    it = iter(it)
    return iter(lambda: tuple(islice(it, size)), ())

Demo:

>>> list(chunk(range(14), 3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13)]

This works with any iterable and produces output lazily. It returns tuples rather than iterators, but I think it has a certain elegance nonetheless. It also doesn't pad; if you want padding, a simple variation on the above will suffice:

from itertools import islice, chain, repeat

def chunk_pad(it, size, padval=None):
    it = chain(iter(it), repeat(padval))
    return iter(lambda: tuple(islice(it, size)), (padval,) * size)

Demo:

>>> list(chunk_pad(range(14), 3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, None)]
>>> list(chunk_pad(range(14), 3, 'a'))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, 'a')]

Like the izip_longest-based solutions, the above always pads. As far as I know, there's no one- or two-line itertools recipe for a function that optionally pads. By combining the above two approaches, this one comes pretty close:

_no_padding = object()

def chunk(it, size, padval=_no_padding):
    if padval == _no_padding:
        it = iter(it)
        sentinel = ()
    else:
        it = chain(iter(it), repeat(padval))
        sentinel = (padval,) * size
    return iter(lambda: tuple(islice(it, size)), sentinel)

Demo:

>>> list(chunk(range(14), 3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13)]
>>> list(chunk(range(14), 3, None))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, None)]
>>> list(chunk(range(14), 3, 'a'))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, 'a')]

I believe this is the shortest chunker proposed that offers optional padding.

As Tomasz Gandor observed, the two padding chunkers will stop unexpectedly if they encounter a long sequence of pad values. Here's a final variation that works around that problem in a reasonable way:

_no_padding = object()
def chunk(it, size, padval=_no_padding):
    it = iter(it)
    chunker = iter(lambda: tuple(islice(it, size)), ())
    if padval == _no_padding:
        yield from chunker
    else:
        for ch in chunker:
            yield ch if len(ch) == size else ch + (padval,) * (size - len(ch))

Demo:

>>> list(chunk([1, 2, (), (), 5], 2))
[(1, 2), ((), ()), (5,)]
>>> list(chunk([1, 2, None, None, 5], 2, None))
[(1, 2), (None, None), (5, None)]

edited Nov 17 '18 at 01:16

answered Feb 26 '14 at 15:02

senderle

125,265
32
201
223

9

Wonderful, your simple version is my favorite. Others too came up with the basic `islice(it, size)` expression and embedded it (like I had done) in a loop construct. Only you thought of the two-argument version of `iter()` (I was completely unaware of), which makes it super-elegant (and probably most performance-effective). I had no idea that the first argument to `iter` changes to a 0-argument function when given the sentinel. You return a (pot. infinite) iterator of chunks, can use a (pot. infinite) iterator as input, have no `len()` and no array slices. Awesome! – ThomasH Sep 15 '16 at 19:58
2

This is why I read down through the answers rather than scanning just the top couple. Optional padding was a requirement in my case, and I too learned about the two-argument form of iter. – Kerr Aug 16 '17 at 14:30
1

I upvoted this, but still - let's not overhype it! First of all, lambda can be bad (slow closure over `it` iterator. Secondly, and most importanlty - you will end prematurely if a chunk of `padval` actually exists in your iterable, and should be processed. – Tomasz Gandor Nov 16 '18 at 11:34
1

@TomaszGandor, I take your first point! Although my understanding is that lambda isn't any slower than an ordinary function, of course you're right that the function call and closure look-up will slow this down. I don't know what the relative performance hit of this would be vs. the `izip_longest` approach, for example -- I suspect it might be a complex trade-off. But... isn't the `padval` issue shared by every answer here that offers a `padval` parameter? – senderle Nov 16 '18 at 12:38
I don't know if 'every' (just read top answers, and noticed this one). Let me illustrate this:`chunk_pad([1, 2, None, None, 5], 2)` SHOULD generate: `(1, 2), (None, None), (5, None)`, instead it just generates `(1, 2)`. Same for `chunk([1, 2, (), (), 5], 2)`, but the second generated item should be `((), ())`. The problem can't be made to go away by adding more `if`s, it's intrinsic to using `iter` with sentinel. – Tomasz Gandor Nov 16 '18 at 18:48
@TomaszGandor I see, I hadn't understood what you were saying. You're right, that problem is somewhat unique to this answer. I believe there is a way around it though — I'll give it a bit of thought. – senderle Nov 16 '18 at 22:09
You don't need a way around, if you know that all-padding chunks are indeed invalid. This is not always the case, but for many scenarios this will work OK. – Tomasz Gandor Nov 16 '18 at 22:14
1

@TomaszGandor, fair enough! But it wasn't too hard to create a version that fixes this. (Also, note that the very first version, which uses `()` as the sentinel, *does* work correctly. This is because `tuple(islice(it, size))` yields `()` when `it` is empty.) – senderle Nov 17 '18 at 01:19

score 103 · Answer 6 · edited Sep 17 '12 at 21:22

Here is a generator that work on arbitrary iterables:

def split_seq(iterable, size):
    it = iter(iterable)
    item = list(itertools.islice(it, size))
    while item:
        yield item
        item = list(itertools.islice(it, size))

Example:

>>> import pprint
>>> pprint.pprint(list(split_seq(xrange(75), 10)))
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
 [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
 [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
 [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
 [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
 [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
 [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
 [70, 71, 72, 73, 74]]

score 62 · Answer 7 · edited Apr 26 '21 at 08:35

62

Simple yet elegant

L = range(1, 1000)
print [L[x:x+10] for x in xrange(0, len(L), 10)]

or if you prefer:

def chunks(L, n): return [L[x: x+n] for x in xrange(0, len(L), n)]
chunks(L, 10)

edited Apr 26 '21 at 08:35

kevinarpe

17,685
21
107
133

answered Jul 12 '10 at 07:58

lebenf

728
5
4

22

Thou shalt not dub a variable in the likeness of an Arabic number. In some fonts, `1` and `l` are indistinguishable. As are `0` and `O`. And sometimes even `I` and `1`. – Alfe Aug 14 '13 at 23:02
21

@Alfe Defective fonts. People shouldn't use such fonts. Not for programming, not for **anything**. – Jerry B Oct 05 '13 at 08:14
18

Lambdas are meant to be used as unnamed functions. There is no point in using them like that. In addition it makes debugging more difficult as the traceback will report "in " instead of "in chunks" in case of error. I wish you luck finding a problem if you have whole bunch of these :) – Chris Koston Nov 26 '13 at 19:45
1

it should be 0 and not 1 inside xrange in `print [l[x:x+10] for x in xrange(1, len(l), 10)]` – scottydelta Dec 28 '13 at 19:11
2

**NOTE:** For Python 3 users use `range`. – Christian Dean Aug 31 '17 at 12:32

score 57 · Answer 8 · edited Sep 17 '12 at 21:22

57

def chunk(input, size):
    return map(None, *([iter(input)] * size))

edited Sep 17 '12 at 21:22

ThiefMaster

285,213
77
557
610

answered Jun 26 '10 at 19:10

Tomasz Wysocki

9,948
5
42
62

`map(None, iter)` equals `izip_longest(iter)`. – Thomas Ahle Jan 29 '12 at 15:18
1

@TomaszWysocki Can you explain the `*` in front of you iterator tuple? Possibly in your answer text, but I have note seen that `*` used that way in Python before. Thanks! – theJollySin Oct 07 '13 at 18:58
1

@theJollySin In this context, it is called the splat operator. Its use is explained here - http://stackoverflow.com/questions/5917522/unzipping-and-the-operator. – rlms Nov 15 '13 at 21:14
2

Close but the last chunk has None elements to fill it out. This may or may not be a defect. Really cool pattern though. – Apr 25 '14 at 01:49

Aaron Hall · Answer 9 · 2021-01-24T04:42:44.117

How do you split a list into evenly sized chunks?

"Evenly sized chunks", to me, implies that they are all the same length, or barring that option, at minimal variance in length. E.g. 5 baskets for 21 items could have the following results:

>>> import statistics
>>> statistics.variance([5,5,5,5,1]) 
3.2
>>> statistics.variance([5,4,4,4,4]) 
0.19999999999999998

A practical reason to prefer the latter result: if you were using these functions to distribute work, you've built-in the prospect of one likely finishing well before the others, so it would sit around doing nothing while the others continued working hard.

Critique of other answers here

When I originally wrote this answer, none of the other answers were evenly sized chunks - they all leave a runt chunk at the end, so they're not well balanced, and have a higher than necessary variance of lengths.

For example, the current top answer ends with:

[60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
[70, 71, 72, 73, 74]]

Others, like list(grouper(3, range(7))), and chunk(range(7), 3) both return: [(0, 1, 2), (3, 4, 5), (6, None, None)]. The None's are just padding, and rather inelegant in my opinion. They are NOT evenly chunking the iterables.

Why can't we divide these better?

Cycle Solution

A high-level balanced solution using itertools.cycle, which is the way I might do it today. Here's the setup:

from itertools import cycle
items = range(10, 75)
number_of_baskets = 10

Now we need our lists into which to populate the elements:

baskets = [[] for _ in range(number_of_baskets)]

Finally, we zip the elements we're going to allocate together with a cycle of the baskets until we run out of elements, which, semantically, it exactly what we want:

for element, basket in zip(items, cycle(baskets)):
    basket.append(element)

Here's the result:

>>> from pprint import pprint
>>> pprint(baskets)
[[10, 20, 30, 40, 50, 60, 70],
 [11, 21, 31, 41, 51, 61, 71],
 [12, 22, 32, 42, 52, 62, 72],
 [13, 23, 33, 43, 53, 63, 73],
 [14, 24, 34, 44, 54, 64, 74],
 [15, 25, 35, 45, 55, 65],
 [16, 26, 36, 46, 56, 66],
 [17, 27, 37, 47, 57, 67],
 [18, 28, 38, 48, 58, 68],
 [19, 29, 39, 49, 59, 69]]

To productionize this solution, we write a function, and provide the type annotations:

from itertools import cycle
from typing import List, Any

def cycle_baskets(items: List[Any], maxbaskets: int) -> List[List[Any]]:
    baskets = [[] for _ in range(min(maxbaskets, len(items)))]
    for item, basket in zip(items, cycle(baskets)):
        basket.append(item)
    return baskets

In the above, we take our list of items, and the max number of baskets. We create a list of empty lists, in which to append each element, in a round-robin style.

Slices

Another elegant solution is to use slices - specifically the less-commonly used step argument to slices. i.e.:

start = 0
stop = None
step = number_of_baskets

first_basket = items[start:stop:step]

This is especially elegant in that slices don't care how long the data are - the result, our first basket, is only as long as it needs to be. We'll only need to increment the starting point for each basket.

In fact this could be a one-liner, but we'll go multiline for readability and to avoid an overlong line of code:

from typing import List, Any

def slice_baskets(items: List[Any], maxbaskets: int) -> List[List[Any]]:
    n_baskets = min(maxbaskets, len(items))
    return [items[i::n_baskets] for i in range(n_baskets)]

And islice from the itertools module will provide a lazily iterating approach, like that which was originally asked for in the question.

I don't expect most use-cases to benefit very much, as the original data is already fully materialized in a list, but for large datasets, it could save nearly half the memory usage.

from itertools import islice
from typing import List, Any, Generator
    
def yield_islice_baskets(items: List[Any], maxbaskets: int) -> Generator[List[Any], None, None]:
    n_baskets = min(maxbaskets, len(items))
    for i in range(n_baskets):
        yield islice(items, i, None, n_baskets)

View results with:

from pprint import pprint

items = list(range(10, 75))
pprint(cycle_baskets(items, 10))
pprint(slice_baskets(items, 10))
pprint([list(s) for s in yield_islice_baskets(items, 10)])

Updated prior solutions

Here's another balanced solution, adapted from a function I've used in production in the past, that uses the modulo operator:

def baskets_from(items, maxbaskets=25):
    baskets = [[] for _ in range(maxbaskets)]
    for i, item in enumerate(items):
        baskets[i % maxbaskets].append(item)
    return filter(None, baskets)

And I created a generator that does the same if you put it into a list:

def iter_baskets_from(items, maxbaskets=3):
    '''generates evenly balanced baskets from indexable iterable'''
    item_count = len(items)
    baskets = min(item_count, maxbaskets)
    for x_i in range(baskets):
        yield [items[y_i] for y_i in range(x_i, item_count, baskets)]

And finally, since I see that all of the above functions return elements in a contiguous order (as they were given):

def iter_baskets_contiguous(items, maxbaskets=3, item_count=None):
    '''
    generates balanced baskets from iterable, contiguous contents
    provide item_count if providing a iterator that doesn't support len()
    '''
    item_count = item_count or len(items)
    baskets = min(item_count, maxbaskets)
    items = iter(items)
    floor = item_count // baskets 
    ceiling = floor + 1
    stepdown = item_count % baskets
    for x_i in range(baskets):
        length = ceiling if x_i < stepdown else floor
        yield [items.next() for _ in range(length)]

Output

To test them out:

print(baskets_from(range(6), 8))
print(list(iter_baskets_from(range(6), 8)))
print(list(iter_baskets_contiguous(range(6), 8)))
print(baskets_from(range(22), 8))
print(list(iter_baskets_from(range(22), 8)))
print(list(iter_baskets_contiguous(range(22), 8)))
print(baskets_from('ABCDEFG', 3))
print(list(iter_baskets_from('ABCDEFG', 3)))
print(list(iter_baskets_contiguous('ABCDEFG', 3)))
print(baskets_from(range(26), 5))
print(list(iter_baskets_from(range(26), 5)))
print(list(iter_baskets_contiguous(range(26), 5)))

Which prints out:

[[0], [1], [2], [3], [4], [5]]
[[0], [1], [2], [3], [4], [5]]
[[0], [1], [2], [3], [4], [5]]
[[0, 8, 16], [1, 9, 17], [2, 10, 18], [3, 11, 19], [4, 12, 20], [5, 13, 21], [6, 14], [7, 15]]
[[0, 8, 16], [1, 9, 17], [2, 10, 18], [3, 11, 19], [4, 12, 20], [5, 13, 21], [6, 14], [7, 15]]
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11], [12, 13, 14], [15, 16, 17], [18, 19], [20, 21]]
[['A', 'D', 'G'], ['B', 'E'], ['C', 'F']]
[['A', 'D', 'G'], ['B', 'E'], ['C', 'F']]
[['A', 'B', 'C'], ['D', 'E'], ['F', 'G']]
[[0, 5, 10, 15, 20, 25], [1, 6, 11, 16, 21], [2, 7, 12, 17, 22], [3, 8, 13, 18, 23], [4, 9, 14, 19, 24]]
[[0, 5, 10, 15, 20, 25], [1, 6, 11, 16, 21], [2, 7, 12, 17, 22], [3, 8, 13, 18, 23], [4, 9, 14, 19, 24]]
[[0, 1, 2, 3, 4, 5], [6, 7, 8, 9, 10], [11, 12, 13, 14, 15], [16, 17, 18, 19, 20], [21, 22, 23, 24, 25]]

Notice that the contiguous generator provide chunks in the same length patterns as the other two, but the items are all in order, and they are as evenly divided as one may divide a list of discrete elements.

You say that none of the above provides evenly-sized chunks. But [this one](http://stackoverflow.com/a/312644/577088) does, as does [this one](http://stackoverflow.com/a/3125186/577088). — senderle, Feb 26 '14 at 15:00
@senderle, The first one, `list(grouper(3, xrange(7)))`, and the second one, `chunk(xrange(7), 3)` both return: `[(0, 1, 2), (3, 4, 5), (6, None, None)]`. The `None`'s are just padding, and rather inelegant in my opinion. They are NOT evenly chunking the iterables. Thanks for your vote! — Aaron Hall, Feb 26 '14 at 16:07
You raise the question (without doing it explicitly, so I do that now here) whether equally-sized chunks (except the last, if not possible) or whether a balanced (as good as possible) result is more often what will be needed. You assume that the balanced solution is to prefer; this might be true if what you program is close to the real world (e. g. a card-dealing algorithm for a simulated card game). In other cases (like filling lines with words) one will rather like to keep the lines as full as possible. So I can't really prefer one over the other; they are just for different use cases. — Alfe, Aug 02 '14 at 23:14
@ChristopherBarrington-Leigh Good point, for DataFrames, you should probably use slices, since I believe DataFrame objects do not usually copy on slicing, e.g. `import pandas as pd; [pd.DataFrame(np.arange(7))[i::3] for i in xrange(3)]` — Aaron Hall, Sep 03 '14 at 17:10
@AaronHall Oops. I deleted my comment because I second-guessed my critique, but you were quick on the draw. Thanks! In fact, my claim that it doesn't work for dataframes is true. If items is a dataframe, just use yield items[range(x_i, item_count, baskets)] as the last line. I offered a separate (yet another) answer, in which you specify the desired (minimum) group size. — CPBL, Sep 03 '14 at 17:47
@ChristopherBarrington-Leigh Thanks, very nice of you. I wouldn't use the code from my answer to do this, though. If you're iterating over a DataFrame, you can use iterrows. I wouldn't use range to slice, it creates an object in memory. I'd prefer a slice object, created with the slicing syntax e.g. `i::3`, or equivalently, `slice(i, None, 3)`. — Aaron Hall, Sep 03 '14 at 18:06
I like this, but I wish it would work on an arbitrary length lambda instead of just len() — Steve Yeago, Sep 23 '16 at 22:04
Don't be tempted to use `[[]]*maxbaskets` it is not the same thing as `[[] for _ in range(maxbaskets)]`. In the first case there is realy only a single instance of a bucket referenced multiple times. — qbolec, Nov 14 '16 at 10:18
In many cases the "runt" is actually the more desirable option. Since nothing in this question specified whether there should appear a smaller chunk at end or not, perhaps your answer would have been more at home on the related 2010 question [Splitting a list into N parts of approximately equal length](https://stackoverflow.com/q/2130016/674039) instead. — wim, Aug 19 '20 at 17:09
@wim, thank you for the criticism, I appreciate your input - I'm going to think about this some more and potentially update my answer. "evenly sized chunks", to me, implies that they are all the same length, or barring that, at minimal variance. E.g. 5 buckets for 21 items could have >>> import statistics >>> statistics.variance([5,5,5,5,1]) 3.2 >>> statistics.variance([5,4,4,4,4]) 0.19999999999999998 — Aaron Hall, Aug 19 '20 at 17:49
Variance is but one way - another reasonable interpretation may be "number of buckets with equal length maximized" when having a runt is the better strategy. And in many real world cases you can't know the number of elements in advance (e.g. sending packets over network when you don't know the data size, but you have an upper bound on the packet size, for example streaming compression). I guess I am repeating what Alfe said already - neither approach is objectively better, depends on the problem :) — wim, Aug 19 '20 at 18:27

score 43 · Answer 10 · edited Jun 21 '17 at 13:36

43

I saw the most awesome Python-ish answer in a duplicate of this question:

from itertools import zip_longest

a = range(1, 16)
i = iter(a)
r = list(zip_longest(i, i, i))
>>> print(r)
[(1, 2, 3), (4, 5, 6), (7, 8, 9), (10, 11, 12), (13, 14, 15)]

You can create n-tuple for any n. If a = range(1, 15), then the result will be:

[(1, 2, 3), (4, 5, 6), (7, 8, 9), (10, 11, 12), (13, 14, None)]

If the list is divided evenly, then you can replace zip_longest with zip, otherwise the triplet (13, 14, None) would be lost. Python 3 is used above. For Python 2, use izip_longest.

edited Jun 21 '17 at 13:36

Ioannis Filippidis

8,272
8
66
96

answered Mar 12 '15 at 12:36

Noich

11,783
13
59
89

that is nice if your list and chunks are short, how could you adapt this to split your list in to chunks of 1000 though? you"re not going to code zip(i,i,i,i,i,i,i,i,i,i.....i=1000) – Tom Smith May 18 '15 at 14:21
10

`zip(i, i, i, ... i)` with "chunk_size" arguments to zip() can be written as `zip(*[i]*chunk_size)` Whether that's a good idea or not is debatable, of course. – Wilson F Jun 28 '15 at 04:52
1

The downside of this is that if you aren't dividing evenly, you'll drop elements, as zip stops at the shortest iterable - & izip_longest would add default elements. – Aaron Hall Jul 08 '16 at 03:37
`zip_longest` should be used, as done in: https://stackoverflow.com/a/434411/1959808 – Ioannis Filippidis Jun 21 '17 at 13:28
The answer with `range(1, 15)` is already missing elements, because there are 14 elements in `range(1, 15)`, not 15. – Ioannis Filippidis Jun 21 '17 at 13:34

score 42 · Answer 11 · edited Apr 18 '19 at 11:18

If you know list size:

def SplitList(mylist, chunk_size):
    return [mylist[offs:offs+chunk_size] for offs in range(0, len(mylist), chunk_size)]

If you don't (an iterator):

def IterChunks(sequence, chunk_size):
    res = []
    for item in sequence:
        res.append(item)
        if len(res) >= chunk_size:
            yield res
            res = []
    if res:
        yield res  # yield the last, incomplete, portion

In the latter case, it can be rephrased in a more beautiful way if you can be sure that the sequence always contains a whole number of chunks of given size (i.e. there is no incomplete last chunk).

I am sad this is buried so far down. The IterChunks works for everything and is the general solution and has no caveats that I know of. — Jason Dunkelberger, Aug 07 '15 at 23:31

score 22 · Answer 12 · answered Apr 19 '11 at 05:27

22

If you had a chunk size of 3 for example, you could do:

zip(*[iterable[i::3] for i in range(3)])

source: http://code.activestate.com/recipes/303060-group-a-list-into-sequential-n-tuples/

I would use this when my chunk size is fixed number I can type, e.g. '3', and would never change.

answered Apr 19 '11 at 05:27

ninjagecko

77,349
22
129
137

12

This doesn't work if len(iterable)%3 != 0. The last (short) group of numbers won't be returned. – sherbang Jul 03 '12 at 19:28

score 21 · Answer 13 · answered Nov 20 '13 at 20:55

21

The toolz library has the partition function for this:

from toolz.itertoolz.core import partition

list(partition(2, [1, 2, 3, 4]))
[(1, 2), (3, 4)]

answered Nov 20 '13 at 20:55

zach

22,141
16
57
86

This looks like the simplest of all the suggestions. I am just wondering if it really can be true that one has to use a third party library to get such a partition function. I would have expected something equivalent with that partition function to exist as a language builtin. – kasperd Mar 29 '15 at 14:45
1

you can do a partition with itertools. but I like the toolz library. its a clojure-inspired library for working on collections in a functional style. you don't get immutability but you get a small vocabulary for working on simple collections. As a plus, cytoolz is written in cython and gets a nice performance boost. https://github.com/pytoolz/cytoolz http://matthewrocklin.com/blog/work/2014/05/01/Introducing-CyToolz/ – zach Mar 30 '15 at 15:28
The link from zach's comment works if you ommit the trailing slash: http://matthewrocklin.com/blog/work/2014/05/01/Introducing-CyToolz – mit Oct 19 '18 at 12:46

score 21 · Answer 14 · answered Dec 16 '15 at 21:42

21

[AA[i:i+SS] for i in range(len(AA))[::SS]]

Where AA is array, SS is chunk size. For example:

>>> AA=range(10,21);SS=3
>>> [AA[i:i+SS] for i in range(len(AA))[::SS]]
[[10, 11, 12], [13, 14, 15], [16, 17, 18], [19, 20]]
# or [range(10, 13), range(13, 16), range(16, 19), range(19, 21)] in py3

answered Dec 16 '15 at 21:42

Riaz Rizvi

7,507
2
33
31

3

it is the best and simple. – F.Tamy Oct 26 '19 at 10:12
3

short and simple. simplicity over complexity. – darkman Jan 28 '20 at 08:41
Very usefull! Thanks! – Mateus da Silva Teixeira Oct 29 '20 at 15:17

score 20 · Answer 15 · answered Dec 10 '19 at 11:59

With Assignment Expressions in Python 3.8 it becomes quite nice:

import itertools

def batch(iterable, size):
    it = iter(iterable)
    while item := list(itertools.islice(it, size)):
        yield item

This works on an arbitrary iterable, not just a list.

>>> import pprint
>>> pprint.pprint(list(batch(range(75), 10)))
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
 [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
 [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
 [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
 [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
 [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
 [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
 [70, 71, 72, 73, 74]]

Now this is a worthy new answer to this question. I actually quite like this. I am skeptical of assignment expressions, but when they work they work. — juanpa.arrivillaga, May 02 '20 at 06:08

nikipore · Answer 16 · 2013-11-09T08:21:23.130

17

I like the Python doc's version proposed by tzot and J.F.Sebastian a lot, but it has two shortcomings:

it is not very explicit
I usually don't want a fill value in the last chunk

I'm using this one a lot in my code:

from itertools import islice

def chunks(n, iterable):
    iterable = iter(iterable)
    while True:
        yield tuple(islice(iterable, n)) or iterable.next()

UPDATE: A lazy chunks version:

from itertools import chain, islice

def chunks(n, iterable):
   iterable = iter(iterable)
   while True:
       yield chain([next(iterable)], islice(iterable, n-1))

edited Nov 09 '13 at 08:21

answered Oct 09 '13 at 06:17

nikipore

209
2
5

What's the break condition for the `while True` loop? – wjandrea Sep 06 '19 at 13:40
@wjandrea: The `StopIteration` raised when the `tuple` is empty and `iterable.next()` gets executed. Doesn't work properly in modern Python though, where exiting a generator should be done with `return`, not raising `StopIteration`. A `try/except StopIteration: return` around the whole loop (and changing `iterable.next()` to `next(iterable)` for cross-version compat) fixes this with minimal overhead at least. – ShadowRanger Jan 22 '20 at 06:10

score 17 · Answer 17 · answered Jan 07 '18 at 08:58

I was curious about the performance of different approaches and here it is:

Tested on Python 3.5.1

import time
batch_size = 7
arr_len = 298937

#---------slice-------------

print("\r\nslice")
start = time.time()
arr = [i for i in range(0, arr_len)]
while True:
    if not arr:
        break

    tmp = arr[0:batch_size]
    arr = arr[batch_size:-1]
print(time.time() - start)

#-----------index-----------

print("\r\nindex")
arr = [i for i in range(0, arr_len)]
start = time.time()
for i in range(0, round(len(arr) / batch_size + 1)):
    tmp = arr[batch_size * i : batch_size * (i + 1)]
print(time.time() - start)

#----------batches 1------------

def batch(iterable, n=1):
    l = len(iterable)
    for ndx in range(0, l, n):
        yield iterable[ndx:min(ndx + n, l)]

print("\r\nbatches 1")
arr = [i for i in range(0, arr_len)]
start = time.time()
for x in batch(arr, batch_size):
    tmp = x
print(time.time() - start)

#----------batches 2------------

from itertools import islice, chain

def batch(iterable, size):
    sourceiter = iter(iterable)
    while True:
        batchiter = islice(sourceiter, size)
        yield chain([next(batchiter)], batchiter)


print("\r\nbatches 2")
arr = [i for i in range(0, arr_len)]
start = time.time()
for x in batch(arr, batch_size):
    tmp = x
print(time.time() - start)

#---------chunks-------------
def chunks(l, n):
    """Yield successive n-sized chunks from l."""
    for i in range(0, len(l), n):
        yield l[i:i + n]
print("\r\nchunks")
arr = [i for i in range(0, arr_len)]
start = time.time()
for x in chunks(arr, batch_size):
    tmp = x
print(time.time() - start)

#-----------grouper-----------

from itertools import zip_longest # for Python 3.x
#from six.moves import zip_longest # for both (uses the six compat library)

def grouper(iterable, n, padvalue=None):
    "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
    return zip_longest(*[iter(iterable)]*n, fillvalue=padvalue)

arr = [i for i in range(0, arr_len)]
print("\r\ngrouper")
start = time.time()
for x in grouper(arr, batch_size):
    tmp = x
print(time.time() - start)

Results:

slice
31.18285083770752

index
0.02184295654296875

batches 1
0.03503894805908203

batches 2
0.22681021690368652

chunks
0.019841909408569336

grouper
0.006506919860839844

benchmarking using `time` library is not a great idea when we have `timeit` module — Azat Ibrakov, Oct 06 '18 at 09:24

score 14 · Answer 18 · edited Aug 31 '16 at 17:30

14

code:

def split_list(the_list, chunk_size):
    result_list = []
    while the_list:
        result_list.append(the_list[:chunk_size])
        the_list = the_list[chunk_size:]
    return result_list

a_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

print split_list(a_list, 3)

result:

[[1, 2, 3], [4, 5, 6], [7, 8, 9], [10]]

edited Aug 31 '16 at 17:30

buhtz says get vaccinated

6,394
9
52
91

answered Jul 02 '15 at 07:32

Art B

153
1
3

score 14 · Answer 19 · answered Jan 27 '17 at 23:12

You may also use get_chunks function of utilspie library as:

>>> from utilspie import iterutils
>>> a = [1, 2, 3, 4, 5, 6, 7, 8, 9]

>>> list(iterutils.get_chunks(a, 5))
[[1, 2, 3, 4, 5], [6, 7, 8, 9]]

You can install utilspie via pip:

sudo pip install utilspie

Disclaimer: I am the creator of utilspie library.

mazieres · Answer 20 · 2015-11-03T23:42:58.330

At this point, I think we need a recursive generator, just in case...

In python 2:

def chunks(li, n):
    if li == []:
        return
    yield li[:n]
    for e in chunks(li[n:], n):
        yield e

In python 3:

def chunks(li, n):
    if li == []:
        return
    yield li[:n]
    yield from chunks(li[n:], n)

Also, in case of massive Alien invasion, a decorated recursive generator might become handy:

def dec(gen):
    def new_gen(li, n):
        for e in gen(li, n):
            if e == []:
                return
            yield e
    return new_gen

@dec
def chunks(li, n):
    yield li[:n]
    for e in chunks(li[n:], n):
        yield e

score 12 · Answer 21 · answered Aug 26 '18 at 01:40

Here is a list of additional approaches:

Given

import itertools as it
import collections as ct

import more_itertools as mit


iterable = range(11)
n = 3

Code

The Standard Library

list(it.zip_longest(*[iter(iterable)] * n))
# [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, None)]

d = {}
for i, x in enumerate(iterable):
    d.setdefault(i//n, []).append(x)

list(d.values())
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]

dd = ct.defaultdict(list)
for i, x in enumerate(iterable):
    dd[i//n].append(x)

list(dd.values())
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]

more_itertools⁺

list(mit.chunked(iterable, n))
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]

list(mit.sliced(iterable, n))
# [range(0, 3), range(3, 6), range(6, 9), range(9, 11)]

list(mit.grouper(n, iterable))
# [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, None)]

list(mit.windowed(iterable, len(iterable)//n, step=n))
# [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, None)]

References

zip_longest (related post, related post)
setdefault (ordered results requires Python 3.6+)
collections.defaultdict (ordered results requires Python 3.6+)
more_itertools.chunked (related posted)
more_itertools.sliced
more_itertools.grouper (related post)
more_itertools.windowed (see also stagger, zip_offset)

_{⁺ A third-party library that implements itertools recipes and more. > pip install more_itertools}

score 9 · Answer 22 · answered Feb 28 '15 at 20:05

9

Another more explicit version.

def chunkList(initialList, chunkSize):
    """
    This function chunks a list into sub lists 
    that have a length equals to chunkSize.

    Example:
    lst = [3, 4, 9, 7, 1, 1, 2, 3]
    print(chunkList(lst, 3)) 
    returns
    [[3, 4, 9], [7, 1, 1], [2, 3]]
    """
    finalList = []
    for i in range(0, len(initialList), chunkSize):
        finalList.append(initialList[i:i+chunkSize])
    return finalList

answered Feb 28 '15 at 20:05

Ranaivo

1,510
12
5

(2016 Sep 12) This answer is the most language independent and easiest to read. – D Adams Sep 14 '16 at 00:36

score 8 · Answer 23 · edited May 16 '16 at 06:29

Without calling len() which is good for large lists:

def splitter(l, n):
    i = 0
    chunk = l[:n]
    while chunk:
        yield chunk
        i += n
        chunk = l[i:i+n]

And this is for iterables:

def isplitter(l, n):
    l = iter(l)
    chunk = list(islice(l, n))
    while chunk:
        yield chunk
        chunk = list(islice(l, n))

The functional flavour of the above:

def isplitter2(l, n):
    return takewhile(bool,
                     (tuple(islice(start, n))
                            for start in repeat(iter(l))))

OR:

def chunks_gen_sentinel(n, seq):
    continuous_slices = imap(islice, repeat(iter(seq)), repeat(0), repeat(n))
    return iter(imap(tuple, continuous_slices).next,())

OR:

def chunks_gen_filter(n, seq):
    continuous_slices = imap(islice, repeat(iter(seq)), repeat(0), repeat(n))
    return takewhile(bool,imap(tuple, continuous_slices))

There is no reason to avoid `len()` on large lists; it's a constant-time operation. — Thomas Wouters, May 30 '11 at 10:03

score 8 · Answer 24 · answered Nov 23 '08 at 12:51

8

heh, one line version

In [48]: chunk = lambda ulist, step:  map(lambda i: ulist[i:i+step],  xrange(0, len(ulist), step))

In [49]: chunk(range(1,100), 10)
Out[49]: 
[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
 [11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
 [21, 22, 23, 24, 25, 26, 27, 28, 29, 30],
 [31, 32, 33, 34, 35, 36, 37, 38, 39, 40],
 [41, 42, 43, 44, 45, 46, 47, 48, 49, 50],
 [51, 52, 53, 54, 55, 56, 57, 58, 59, 60],
 [61, 62, 63, 64, 65, 66, 67, 68, 69, 70],
 [71, 72, 73, 74, 75, 76, 77, 78, 79, 80],
 [81, 82, 83, 84, 85, 86, 87, 88, 89, 90],
 [91, 92, 93, 94, 95, 96, 97, 98, 99]]

answered Nov 23 '08 at 12:51

slav0nic

2,909
1
18
13

37

Please, use "def chunk" instead of "chunk = lambda". It works the same. One line. Same features. MUCH easier to the n00bz to read and understand. – S.Lott Nov 23 '08 at 13:45
4

@S.Lott: not if the n00bz come from scheme :P this isn't a real problem. there's even a keyword to google! what other features show we avoid for the sake of the n00bz? i guess yield isn't imperative/c-like enough to be n00b friendly either then. – Janus Troelsen May 11 '12 at 21:10
17

The function object resulting from `def chunk` instead of `chunk=lambda` has .__name__ attribute 'chunk' instead of ''. The specific name is more useful in tracebacks. – Terry Jan Reedy Jun 27 '12 at 04:20
1

@Alfe: I'm not sure if could be called a main semantic difference, but whether there's a useful name in a traceback instead of `` or not is, at least, a notable difference. – martineau Jan 11 '15 at 20:33
1

After testing a bunch of them for performance, THIS is great! – Sunny Patel Oct 05 '18 at 16:36

score 8 · Answer 25 · answered Nov 24 '08 at 16:56

def split_seq(seq, num_pieces):
    start = 0
    for i in xrange(num_pieces):
        stop = start + len(seq[i::num_pieces])
        yield seq[start:stop]
        start = stop

usage:

seq = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

for seq in split_seq(seq, 3):
    print seq

score 7 · Answer 26 · edited Nov 14 '14 at 09:48

7

See this reference

>>> orange = range(1, 1001)
>>> otuples = list( zip(*[iter(orange)]*10))
>>> print(otuples)
[(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), ... (991, 992, 993, 994, 995, 996, 997, 998, 999, 1000)]
>>> olist = [list(i) for i in otuples]
>>> print(olist)
[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], ..., [991, 992, 993, 994, 995, 996, 997, 998, 999, 1000]]
>>>

Python3

edited Nov 14 '14 at 09:48

BomberMan

1,080
3
13
33

answered Feb 18 '13 at 13:31

macm

1,709
21
22

3

Nice, but drops elements at the end if the size does not match whole numbers of chunks, e. g. `zip(*[iter(range(7))]*3)` only returns `[(0, 1, 2), (3, 4, 5)]` and forgets the `6` from the input. – Alfe Aug 14 '13 at 23:17
OP wrote: 'I have a list of arbitrary length, and I need to split it up into equal size chunks and operate on it.'. Maybe I miss something but how to get 'equal size chunks' from list of arbitrary length without dropping chunk which is shorter than 'equal size' – Aivar Paalberg Sep 27 '20 at 09:57

score 7 · Answer 27 · answered Nov 03 '16 at 19:10

Since everybody here talking about iterators. boltons has perfect method for that, called iterutils.chunked_iter.

from boltons import iterutils

list(iterutils.chunked_iter(list(range(50)), 11))

Output:

[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
 [11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21],
 [22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32],
 [33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43],
 [44, 45, 46, 47, 48, 49]]

But if you don't want to be mercy on memory, you can use old-way and store the full list in the first place with iterutils.chunked.

And this one actually works regardless of order one looks at the subiterators!! — Peter Gerdes, Dec 19 '17 at 10:32

score 6 · Answer 28 · answered Jul 15 '15 at 23:27

6

a = [1, 2, 3, 4, 5, 6, 7, 8, 9]
CHUNK = 4
[a[i*CHUNK:(i+1)*CHUNK] for i in xrange((len(a) + CHUNK - 1) / CHUNK )]

answered Jul 15 '15 at 23:27

AdvilUser

2,432
2
22
15

Can you explain more your answer please ? – Zulu Jul 16 '15 at 00:06
Working from backwards: (len(a) + CHUNK -1) / CHUNK Gives you the number of chunks that you will end up with. Then, for each chunk at index i, we are generating a sub-array of the original array like this: a[ i * CHUNK : (i + 1) * CHUNK ] where, i * CHUNK is the index of the first element to put into the subarray, and, (i + 1) * CHUNK is 1 past the last element to put into the subarray. This solution uses list comprehension, so it might be faster for large arrays. – AdvilUser Jul 29 '15 at 00:29

Анатолий Панин · Answer 29 · 2017-12-19T10:27:18.713

6

One more solution

def make_chunks(data, chunk_size): 
    while data:
        chunk, data = data[:chunk_size], data[chunk_size:]
        yield chunk

>>> for chunk in make_chunks([1, 2, 3, 4, 5, 6, 7], 2):
...     print chunk
... 
[1, 2]
[3, 4]
[5, 6]
[7]
>>>

edited Dec 19 '17 at 10:27

answered Apr 17 '17 at 15:38

Анатолий Панин

449
6
9

score 6 · Answer 30 · edited Mar 08 '12 at 18:27

6

Consider using matplotlib.cbook pieces

for example:

import matplotlib.cbook as cbook
segments = cbook.pieces(np.arange(20), 3)
for s in segments:
     print s

edited Mar 08 '12 at 18:27

schwartrer

11
3

answered May 03 '11 at 16:27

schwater

95
1
1

Looks like you accidentally created two accounts. You can [contact the team](https://stackoverflow.com/contact) to have them merged, which will allow you to regain direct editing privileges on your contributions. – Georgy May 15 '19 at 15:15

score 6 · Answer 31 · answered Feb 13 '12 at 04:50

def chunks(iterable,n):
    """assumes n is an integer>0
    """
    iterable=iter(iterable)
    while True:
        result=[]
        for i in range(n):
            try:
                a=next(iterable)
            except StopIteration:
                break
            else:
                result.append(a)
        if result:
            yield result
        else:
            break

g1=(i*i for i in range(10))
g2=chunks(g1,3)
print g2
'<generator object chunks at 0x0337B9B8>'
print list(g2)
'[[0, 1, 4], [9, 16, 25], [36, 49, 64], [81]]'

While this may not look as short or as pretty as many of the itertools based responses this one actually works if you want to print out the second sub-list before accessing the first, i.e., you can set i0=next(g2); i1=next(g2); and use i1 before using i0 and it doesn't break!! — Peter Gerdes, Dec 19 '17 at 10:25

score 5 · Answer 32 · edited May 23 '17 at 12:18

As per this answer, the top-voted answer leaves a 'runt' at the end. Here's my solution to really get about as evenly-sized chunks as you can, with no runts. It basically tries to pick exactly the fractional spot where it should split the list, but just rounds it off to the nearest integer:

from __future__ import division  # not needed in Python 3
def n_even_chunks(l, n):
    """Yield n as even chunks as possible from l."""
    last = 0
    for i in range(1, n+1):
        cur = int(round(i * (len(l) / n)))
        yield l[last:cur]
        last = cur

Demonstration:

>>> pprint.pprint(list(n_even_chunks(list(range(100)), 9)))
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
 [11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21],
 [22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32],
 [33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43],
 [44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55],
 [56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66],
 [67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77],
 [78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88],
 [89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]]
>>> pprint.pprint(list(n_even_chunks(list(range(100)), 11)))
[[0, 1, 2, 3, 4, 5, 6, 7, 8],
 [9, 10, 11, 12, 13, 14, 15, 16, 17],
 [18, 19, 20, 21, 22, 23, 24, 25, 26],
 [27, 28, 29, 30, 31, 32, 33, 34, 35],
 [36, 37, 38, 39, 40, 41, 42, 43, 44],
 [45, 46, 47, 48, 49, 50, 51, 52, 53, 54],
 [55, 56, 57, 58, 59, 60, 61, 62, 63],
 [64, 65, 66, 67, 68, 69, 70, 71, 72],
 [73, 74, 75, 76, 77, 78, 79, 80, 81],
 [82, 83, 84, 85, 86, 87, 88, 89, 90],
 [91, 92, 93, 94, 95, 96, 97, 98, 99]]

Compare to the top-voted chunks answer:

>>> pprint.pprint(list(chunks(list(range(100)), 100//9)))
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
 [11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21],
 [22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32],
 [33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43],
 [44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54],
 [55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65],
 [66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76],
 [77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87],
 [88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98],
 [99]]
>>> pprint.pprint(list(chunks(list(range(100)), 100//11)))
[[0, 1, 2, 3, 4, 5, 6, 7, 8],
 [9, 10, 11, 12, 13, 14, 15, 16, 17],
 [18, 19, 20, 21, 22, 23, 24, 25, 26],
 [27, 28, 29, 30, 31, 32, 33, 34, 35],
 [36, 37, 38, 39, 40, 41, 42, 43, 44],
 [45, 46, 47, 48, 49, 50, 51, 52, 53],
 [54, 55, 56, 57, 58, 59, 60, 61, 62],
 [63, 64, 65, 66, 67, 68, 69, 70, 71],
 [72, 73, 74, 75, 76, 77, 78, 79, 80],
 [81, 82, 83, 84, 85, 86, 87, 88, 89],
 [90, 91, 92, 93, 94, 95, 96, 97, 98],
 [99]]

This solution seems to fail in some situations: - when n > len(l) - for l = [0,1,2,3,4] and n=3 it returns [[0], [1], [2]] instead of [[0,1], [2,3], [4]] — DragonTux, Sep 05 '16 at 10:45
@DragonTux: Ah I wrote the function for Python 3 - it gives `[[0, 1], [2], [3, 4]]`. I added the future import so it works in Python 2 as well — Claudiu, Sep 05 '16 at 17:24
Thanks a lot. I keep forgetting the subtle differences between Python 2 and 3. — DragonTux, Sep 09 '16 at 15:18

score 4 · Answer 33 · answered Aug 27 '12 at 22:58

I realise this question is old (stumbled over it on Google), but surely something like the following is far simpler and clearer than any of the huge complex suggestions and only uses slicing:

def chunker(iterable, chunksize):
    for i,c in enumerate(iterable[::chunksize]):
        yield iterable[i*chunksize:(i+1)*chunksize]

>>> for chunk in chunker(range(0,100), 10):
...     print list(chunk)
... 
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29]
... etc ...

score 4 · Answer 34 · edited Jun 02 '20 at 11:50

4

>>> def f(x, n, acc=[]): return f(x[n:], n, acc+[(x[:n])]) if x else acc
>>> f("Hallo Welt", 3)
['Hal', 'lo ', 'Wel', 't']
>>>

If you are into brackets - I picked up a book on Erlang :)

edited Jun 02 '20 at 11:50

Corman

403
4
10

answered Nov 03 '09 at 16:45

hcvst

2,126
2
17
23

score 4 · Answer 35 · answered Dec 09 '14 at 03:54

4

letting r be the chunk size and L be the initial list, you can do.

chunkL = [ [i for i in L[r*k:r*(k+1)] ] for k in range(len(L)/r)]

answered Dec 09 '14 at 03:54

Be Wake Pandey

417
1
4
11

score 4 · Answer 36 · answered Feb 27 '15 at 02:33

4

Use list comprehensions:

l = [1,2,3,4,5,6,7,8,9,10,11,12]
k = 5 #chunk size
print [tuple(l[x:y]) for (x, y) in [(x, x+k) for x in range(0, len(l), k)]]

answered Feb 27 '15 at 02:33

Saksham Varma

1,984
10
15

score 4 · Answer 37 · answered Nov 20 '16 at 04:32

4

You could use numpy's array_split function e.g., np.array_split(np.array(data), 20) to split into 20 nearly equal size chunks.

To make sure chunks are exactly equal in size use np.split.

answered Nov 20 '16 at 04:32

AlexG

8,548
5
52
60

Peter Gerdes · Answer 38 · 2017-12-19T11:04:34.257

I have one solution below which does work but more important than that solution is a few comments on other approaches. First, a good solution shouldn't require that one loop through the sub-iterators in order. If I run

g = paged_iter(list(range(50)), 11))
i0 = next(g)
i1 = next(g)
list(i1)
list(i0)

The appropriate output for the last command is

 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

not

[]

As most of the itertools based solutions here return. This isn't just the usual boring restriction about accessing iterators in order. Imagine a consumer trying to clean up poorly entered data which reversed the appropriate order of blocks of 5, i.e., the data looks like [B5, A5, D5, C5] and should look like [A5, B5, C5, D5] (where A5 is just five elements not a sublist). This consumer would look at the claimed behavior of the grouping function and not hesitate to write a loop like

i = 0
out = []
for it in paged_iter(data,5)
    if (i % 2 == 0):
         swapped = it
    else: 
         out += list(it)
         out += list(swapped)
    i = i + 1

This will produce mysteriously wrong results if you sneakily assume that sub-iterators are always fully used in order. It gets even worse if you want to interleave elements from the chunks.

Second, a decent number of the suggested solutions implicitly rely on the fact that iterators have a deterministic order (they don't e.g. set) and while some of the solutions using islice may be ok it worries me.

Third, the itertools grouper approach works but the recipe relies on internal behavior of the zip_longest (or zip) functions that isn't part of their published behavior. In particular, the grouper function only works because in zip_longest(i0...in) the next function is always called in order next(i0), next(i1), ... next(in) before starting over. As grouper passes n copies of the same iterator object it relies on this behavior.

Finally, while the solution below can be improved if you make the assumption criticized above that sub-iterators are accessed in order and fully perused without this assumption one MUST implicitly (via call chain) or explicitly (via deques or other data structure) store elements for each subiterator somewhere. So don't bother wasting time (as I did) assuming one could get around this with some clever trick.

def paged_iter(iterat, n):
    itr = iter(iterat)
    deq = None
    try:
        while(True):
            deq = collections.deque(maxlen=n)
            for q in range(n):
                deq.append(next(itr))
            yield (i for i in deq)
    except StopIteration:
        yield (i for i in deq)

score 4 · Answer 39 · answered Mar 08 '17 at 17:03

4

Here's an idea using itertools.groupby:

def chunks(l, n):
    c = itertools.count()
    return (it for _, it in itertools.groupby(l, lambda x: next(c)//n))

This returns a generator of generators. If you want a list of lists, just replace the last line with

    return [list(it) for _, it in itertools.groupby(l, lambda x: next(c)//n)]

Example returning list of lists:

>>> chunks('abcdefghij', 4)
[['a', 'b', 'c', 'd'], ['e', 'f', 'g', 'h'], ['i', 'j']]

(So yes, this suffers form the "runt problem", which may or may not be a problem in a given situation.)

answered Mar 08 '17 at 17:03

itub

1,054
1
9
11

Again this fails if the sub-iterators are not evaluated in order in the generator case. Let c = chunks('abcdefghij', 4) (as generator). Then set i0 = next(c); i1 = next(c); list(i1) //FINE; list(i0) //UHHOH – Peter Gerdes Dec 19 '17 at 10:19
@PeterGerdes, thank you for noting that omission; I forgot because I always used the groupby generators in order. The documentation does mention this limitation: "Because the source is shared, when the groupby() object is advanced, the previous group is no longer visible." – itub Dec 19 '17 at 16:12
@PeterGerdes I think this can be solved using enumerate instead, like so: `[[x for _, x in it] for _, it in itertools.groupby(enumerate(l), lambda x: x[0]//n)]` (list(it) is a list of (index, element) pairs due to enumerate) – Yuri Feldman May 13 '19 at 18:18

score 4 · Answer 40 · answered Nov 03 '17 at 12:38

4

I don't think I saw this option, so just to add another one :)) :

def chunks(iterable, chunk_size):
  i = 0;
  while i < len(iterable):
    yield iterable[i:i+chunk_size]
    i += chunk_size

answered Nov 03 '17 at 12:38

George B

344
5
15

score 4 · Answer 41 · answered Jul 09 '19 at 14:04

4

python pydash package could be a good choice.

from pydash.arrays import chunk
ids = ['22', '89', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '1']
chunk_ids = chunk(ids,5)
print(chunk_ids)
# output: [['22', '89', '2', '3', '4'], ['5', '6', '7', '8', '9'], ['10', '11', '1']]

for more checkout pydash chunk list

answered Jul 09 '19 at 14:04

Ravi Anand

3,719
6
32
66

neat! and this is what actualy sits under the hood of pydash.arrays.chunk: chunks = int(ceil(len(array) / float(size))) return [array[i * size:(i + 1) * size] for i in range(chunks)] – darkman Mar 27 '20 at 14:47

score 3 · Answer 42 · answered Mar 03 '14 at 04:30

I wrote a small library expressly for this purpose, available here. The library's chunked function is particularly efficient because it's implemented as a generator, so a substantial amount of memory can be saved in certain situations. It also doesn't rely on the slice notation, so any arbitrary iterator can be used.

import iterlib

print list(iterlib.chunked(xrange(1, 1000), 10))
# prints [(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), (11, 12, 13, 14, 15, 16, 17, 18, 19, 20), ...]

score 3 · Answer 43 · answered Apr 17 '15 at 18:48

3

The answer above (by koffein) has a little problem: the list is always split into an equal number of splits, not equal number of items per partition. This is my version. The "// chs + 1" takes into account that the number of items may not be divideable exactly by the partition size, so the last partition will only be partially filled.

# Given 'l' is your list

chs = 12 # Your chunksize
partitioned = [ l[i*chs:(i*chs)+chs] for i in range((len(l) // chs)+1) ]

answered Apr 17 '15 at 18:48

Flo

397
3
6

But if the chunk size *does* exactly divide the number of elements then this includes a zero-length list at the end. – Arthur Tacca Dec 03 '18 at 15:34

score 3 · Answer 44 · edited Jun 23 '12 at 15:19

3

def chunk(lst):
    out = []
    for x in xrange(2, len(lst) + 1):
        if not len(lst) % x:
            factor = len(lst) / x
            break
    while lst:
        out.append([lst.pop(0) for x in xrange(factor)])
    return out

edited Jun 23 '12 at 15:19

dbr

153,498
65
266
333

answered Nov 26 '08 at 07:24

J.T. Hurley

521
9
12

score 3 · Answer 45 · answered Nov 04 '15 at 09:12

3

At this point, I think we need the obligatory anonymous-recursive function.

Y = lambda f: (lambda x: x(x))(lambda y: f(lambda *args: y(y)(*args)))
chunks = Y(lambda f: lambda n: [n[0][:n[1]]] + f((n[0][n[1]:], n[1])) if len(n[0]) > 0 else [])

answered Nov 04 '15 at 09:12

Julien Palard

6,644
2
30
36

lambda function are slow. list comprehension would be faster – Sanjay Poongunran Jul 24 '18 at 19:47
@SanjayPoongunran thanks for you feedback, but this is Python, we're not here for performance (we would write in C), but for readability. – Julien Palard Jul 24 '18 at 22:29
5

@JulienPalard Oh yes, readability is what this reply is all about. – Ibolit Oct 16 '18 at 07:44

score 3 · Answer 46 · answered Oct 30 '17 at 08:13

No magic, but simple and correct:

def chunks(iterable, n):
    """Yield successive n-sized chunks from iterable."""
    values = []
    for i, item in enumerate(iterable, 1):
        values.append(item)
        if i % n == 0:
            yield values
            values = []
    if values:
        yield values

score 3 · Answer 47 · answered Jun 29 '20 at 17:54

3

An abstraction would be

l = [1,2,3,4,5,6,7,8,9]
n = 3
outList = []
for i in range(n, len(l) + n, n):
    outList.append(l[i-n:i])

print(outList)

This will print:

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

answered Jun 29 '20 at 17:54

Kandarp

187
2
7

balki · Answer 48 · 2013-09-13T19:17:08.720

2

Works with any iterable
Inner data is generator object (not a list)
One liner

In [259]: get_in_chunks = lambda itr,n: ( (v for _,v in g) for _,g in itertools.groupby(enumerate(itr),lambda (ind,_): ind/n))

In [260]: list(list(x) for x in get_in_chunks(range(30),7))
Out[260]:
[[0, 1, 2, 3, 4, 5, 6],
 [7, 8, 9, 10, 11, 12, 13],
 [14, 15, 16, 17, 18, 19, 20],
 [21, 22, 23, 24, 25, 26, 27],
 [28, 29]]

edited Sep 13 '13 at 19:17

answered Sep 13 '13 at 19:11

balki

22,482
26
85
135

g = get_in_chunks(range(30),7); i0=next(g);i1=next(g);list(i1);list(i0); Last evaluation is empty. Hidden requirement about accessing all the sublists in order seems really bad here to me because the goal with these kind of utils is often to shuffle data around in various ways. – Peter Gerdes Dec 19 '17 at 10:30

score 2 · Answer 49 · edited May 23 '17 at 12:02

Like @AaronHall I got here looking for roughly evenly sized chunks. There are different interpretations of that. In my case, if the desired size is N, I would like each group to be of size>=N. Thus, the orphans which are created in most of the above should be redistributed to other groups.

This can be done using:

def nChunks(l, n):
    """ Yield n successive chunks from l.
    Works for lists,  pandas dataframes, etc
    """
    newn = int(1.0 * len(l) / n + 0.5)
    for i in xrange(0, n-1):
        yield l[i*newn:i*newn+newn]
    yield l[n*newn-newn:]

(from Splitting a list of into N parts of approximately equal length) by simply calling it as nChunks(l,l/n) or nChunks(l,floor(l/n))

seems to yield some empty chunks (len=26, 10) , or a final very unbalanced chunk (len=26, 11). — idij, Nov 27 '14 at 13:11

score 2 · Answer 50 · answered Sep 18 '15 at 17:54

I have come up to following solution without creation temorary list object, which should work with any iterable object. Please note that this version for Python 2.x:

def chunked(iterable, size):
    stop = []
    it = iter(iterable)
    def _next_chunk():
        try:
            for _ in xrange(size):
                yield next(it)
        except StopIteration:
            stop.append(True)
            return

    while not stop:
        yield _next_chunk()

for it in chunked(xrange(16), 4):
   print list(it)

Output:

[0, 1, 2, 3]
[4, 5, 6, 7]
[8, 9, 10, 11]
[12, 13, 14, 15] 
[]

As you can see if len(iterable) % size == 0 then we have additional empty iterator object. But I do not think that it is big problem.

Try only executing list(it) on every other iteration through the loop, i.e. add a counter and check if it 0 mod 2. The expected behavior is to only print every other line of your output. The actual behavior is to print every line. — Peter Gerdes, Dec 19 '17 at 10:10

score 2 · Answer 51 · answered Jul 06 '17 at 22:24

2

This works in v2/v3, is inlineable, generator-based and uses only the standard library:

import itertools
def split_groups(iter_in, group_size):
    return ((x for _, x in item) for _, item in itertools.groupby(enumerate(iter_in), key=lambda x: x[0] // group_size))

answered Jul 06 '17 at 22:24

Andrey Cizov

565
7
18

Just do a `(list(x) for x in split_groups('abcdefghij', 4))`, then iterate through them: as opposed to many examples here this would work with groups of any size. – Andrey Cizov Feb 24 '18 at 21:55

score 1 · Answer 52 · answered Oct 16 '15 at 22:09

Since I had to do something like this, here's my solution given a generator and a batch size:

def pop_n_elems_from_generator(g, n):
    elems = []
    try:
        for idx in xrange(0, n):
            elems.append(g.next())
        return elems
    except StopIteration:
        return elems

score 1 · Answer 53 · answered Jun 05 '20 at 13:34

A generic chunker for any iterable, which gives the user a choice of how to handle a partial chunk at the end.

Tested on Python 3.

chunker.py

from enum import Enum

class PartialChunkOptions(Enum):
    INCLUDE = 0
    EXCLUDE = 1
    PAD = 2
    ERROR = 3

class PartialChunkException(Exception):
    pass

def chunker(iterable, n, on_partial=PartialChunkOptions.INCLUDE, pad=None):
    """
    A chunker yielding n-element lists from an iterable, with various options
    about what to do about a partial chunk at the end.

    on_partial=PartialChunkOptions.INCLUDE (the default):
                     include the partial chunk as a short (<n) element list

    on_partial=PartialChunkOptions.EXCLUDE
                     do not include the partial chunk

    on_partial=PartialChunkOptions.PAD
                     pad to an n-element list 
                     (also pass pad=<pad_value>, default None)

    on_partial=PartialChunkOptions.ERROR
                     raise a RuntimeError if a partial chunk is encountered
    """

    on_partial = PartialChunkOptions(on_partial)        

    iterator = iter(iterable)
    while True:
        vals = []
        for i in range(n):
            try:
                vals.append(next(iterator))
            except StopIteration:
                if vals:
                    if on_partial == PartialChunkOptions.INCLUDE:
                        yield vals
                    elif on_partial == PartialChunkOptions.EXCLUDE:
                        pass
                    elif on_partial == PartialChunkOptions.PAD:
                        yield vals + [pad] * (n - len(vals))
                    elif on_partial == PartialChunkOptions.ERROR:
                        raise PartialChunkException
                    return
                return
        yield vals

test.py

import chunker

chunk_size = 3

for it in (range(100, 107),
          range(100, 109)):

    print("\nITERABLE TO CHUNK: {}".format(it))
    print("CHUNK SIZE: {}".format(chunk_size))

    for option in chunker.PartialChunkOptions.__members__.values():
        print("\noption {} used".format(option))
        try:
            for chunk in chunker.chunker(it, chunk_size, on_partial=option):
                print(chunk)
        except chunker.PartialChunkException:
            print("PartialChunkException was raised")
    print("")

output of test.py


ITERABLE TO CHUNK: range(100, 107)
CHUNK SIZE: 3

option PartialChunkOptions.INCLUDE used
[100, 101, 102]
[103, 104, 105]
[106]

option PartialChunkOptions.EXCLUDE used
[100, 101, 102]
[103, 104, 105]

option PartialChunkOptions.PAD used
[100, 101, 102]
[103, 104, 105]
[106, None, None]

option PartialChunkOptions.ERROR used
[100, 101, 102]
[103, 104, 105]
PartialChunkException was raised


ITERABLE TO CHUNK: range(100, 109)
CHUNK SIZE: 3

option PartialChunkOptions.INCLUDE used
[100, 101, 102]
[103, 104, 105]
[106, 107, 108]

option PartialChunkOptions.EXCLUDE used
[100, 101, 102]
[103, 104, 105]
[106, 107, 108]

option PartialChunkOptions.PAD used
[100, 101, 102]
[103, 104, 105]
[106, 107, 108]

option PartialChunkOptions.ERROR used
[100, 101, 102]
[103, 104, 105]
[106, 107, 108]

score 0 · Answer 54 · answered Mar 23 '18 at 18:27

I dislike idea of splitting elements by chunk size, e.g. script can devide 101 to 3 chunks as [50, 50, 1]. For my needs I needed spliting proportionly, and keeping order same. First I wrote my own script, which works fine, and it's very simple. But I've seen later this answer, where script is better than mine, I reccomend it. Here's my script:

def proportional_dividing(N, n):
    """
    N - length of array (bigger number)
    n - number of chunks (smaller number)
    output - arr, containing N numbers, diveded roundly to n chunks
    """
    arr = []
    if N == 0:
        return arr
    elif n == 0:
        arr.append(N)
        return arr
    r = N // n
    for i in range(n-1):
        arr.append(r)
    arr.append(N-r*(n-1))

    last_n = arr[-1]
    # last number always will be r <= last_n < 2*r
    # when last_n == r it's ok, but when last_n > r ...
    if last_n > r:
        # ... and if difference too big (bigger than 1), then
        if abs(r-last_n) > 1:
            #[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 7] # N=29, n=12
            # we need to give unnecessary numbers to first elements back
            diff = last_n - r
            for k in range(diff):
                arr[k] += 1
            arr[-1] = r
            # and we receive [3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 2, 2]
    return arr

def split_items(items, chunks):
    arr = proportional_dividing(len(items), chunks)
    splitted = []
    for chunk_size in arr:
        splitted.append(items[:chunk_size])
        items = items[chunk_size:]
    print(splitted)
    return splitted

items = [1,2,3,4,5,6,7,8,9,10,11]
chunks = 3
split_items(items, chunks)
split_items(['a','b','c','d','e','f','g','h','i','g','k','l', 'm'], 3)
split_items(['a','b','c','d','e','f','g','h','i','g','k','l', 'm', 'n'], 3)
split_items(range(100), 4)
split_items(range(99), 4)
split_items(range(101), 4)

and output:

[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11]]
[['a', 'b', 'c', 'd'], ['e', 'f', 'g', 'h'], ['i', 'g', 'k', 'l', 'm']]
[['a', 'b', 'c', 'd', 'e'], ['f', 'g', 'h', 'i', 'g'], ['k', 'l', 'm', 'n']]
[range(0, 25), range(25, 50), range(50, 75), range(75, 100)]
[range(0, 25), range(25, 50), range(50, 75), range(75, 99)]
[range(0, 25), range(25, 50), range(50, 75), range(75, 101)]

score 0 · Answer 55 · answered May 23 '19 at 10:35

If you don't care about the order:

> from itertools import groupby
> batch_no = 3
> data = 'abcdefgh'

> [
    [x[1] for x in x[1]] 
    for x in 
    groupby(
      sorted(
        (x[0] % batch_no, x[1]) 
        for x in 
        enumerate(data)
      ),
      key=lambda x: x[0]
    )
  ]

[['a', 'd', 'g'], ['b', 'e', 'h'], ['c', 'f']]

This solution doesn't generates sets of same size, but distributes values so batches are as big as possible while keeping the number of generated batches.

J-L · Answer 56 · 2020-09-17T16:53:20.043

This question reminds me of the Raku (formerly Perl 6) .comb(n) method. It breaks up strings into n-sized chunks. (There's more to it than that, but I'll leave out the details.)

It's easy enough to implement a similar function in Python3 as a lambda expression:

comb = lambda s,n: (s[i:i+n] for i in range(0,len(s),n))

Then you can call it like this:

some_list = list(range(0, 20))  # creates a list of 20 elements
generator = comb(some_list, 4)  # creates a generator that will generate lists of 4 elements
for sublist in generator:
    print(sublist)  # prints a sublist of four elements, as it's generated

Of course, you don't have to assign the generator to a variable; you can just loop over it directly like this:

for sublist in comb(some_list, 4):
    print(sublist)  # prints a sublist of four elements, as it's generated

As a bonus, this comb() function also operates on strings:

list( comb('catdogant', 3) )  # returns ['cat', 'dog', 'ant']

Arty · Answer 57 · 2020-09-25T03:51:33.170

I've created these two fancy one-liners which are efficient and lazy, both input and output are iterables, also they doen't depend on any module:

First one-liner is totally lazy meaning that it returns iterator producing iterators (i.e. each chunk produced is iterator iterating over chunk's elements), this version is good for the case if chunks are very large or elements are produced slowly one by one and should become available immediately as they are produced:

Try it online!

chunk_iters = lambda it, n: ((e for i, g in enumerate(((f,), cit)) for j, e in zip(range((1, n - 1)[i]), g)) for cit in (iter(it),) for f in cit)

Second one-liner returns iterator that produces lists. Each list is produced as soon as elements of whole chunk become available through input iterator or if very last element of last chunk is reached. This version should be used if input elements are produced fast or all available immediately. Other wise first more-lazy one-liner version should be used.

Try it online!

chunk_lists = lambda it, n: (l for l in ([],) for i, g in enumerate((it, ((),))) for e in g for l in (l[:len(l) % n] + [e][:1 - i],) if (len(l) % n == 0) != i)

Also I provide multi-line version of first chunk_iters one-liner, which returns iterator producing another iterators (going through each chunk's elements):

Try it online!

def chunk_iters(it, n):
    cit = iter(it)
    def one_chunk(f):
        yield f
        for i, e in zip(range(n - 1), cit):
            yield e
    for f in cit:
        yield one_chunk(f)

Ариж Аль Адел · Answer 58 · 2021-03-21T17:36:17.567

Although there is a lot of answers I have very simple way:


x = list(range(10, 75))
indices = x[0::10]
print("indices: ", indices)
xx = [x[i-10:i] for i in indices ]
print("x= ", x)
print ("xx= ",xx)

the result will be :

indices: [10, 20, 30, 40, 50, 60, 70] x= [10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74]

xx = [[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[20, 21, 22, 23, 24, 25,26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
[40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
[50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
[70, 71, 72, 73, 74]]

Brandt · Answer 59 · 2021-04-16T08:12:39.527

A simple solution

The OP has requested "equal sized chunk". I understand "equal sized" as "balanced" sizes. That means that we are looking for groups of items of approximately the same sizes; not necessarily equal.

Inputs here are:

the list of items: input_list (list of 23 numbers, for instance)
the number of groups to split those items: n_groups (5, for instance)

Input:

input_list = list(range(23))
n_groups = 5

Groups of contiguous elements:

approx_sizes = len(input_list)/n_groups 

groups_cont = [input_list[int(i*approx_sizes):int((i+1)*approx_sizes)] 
               for i in range(n_groups)]

Groups of "every-Nth" elements:

groups_leap = [input_list[i::n_groups] 
               for i in range(n_groups)]

Results

print(len(input_list))

print('Contiguous elements lists:')
print(groups_cont)

print('Leap every "N" items lists:')
print(groups_leap)

Will output:

23

Contiguous elements lists:
[[0, 1, 2, 3], [4, 5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16, 17], [18, 19, 20, 21, 22]]

Leap every "N" items lists:
[[0, 5, 10, 15, 20], [1, 6, 11, 16, 21], [2, 7, 12, 17, 22], [3, 8, 13, 18], [4, 9, 14, 19]]

score 0 · Answer 60 · answered Apr 20 '21 at 14:46

0

from itertools import islice
l=[1,2,3,4,5,6]
chuncksize=input("Enter chunk size")
m=[]
obj=iter(l)
m.append(list(islice(l,3)))
m.append(list(islice(l,3)))
print(m)

answered Apr 20 '21 at 14:46

Bapan Biswas

29
4

1

Code only answers are discouraged. Please provide a short summary of how your answer solves the problem and why it may be preferable to the other answers provided. – DaveL17 Apr 20 '21 at 22:44
Welcome to Stack Overflow. Code dumps without any explanation are rarely helpful. Stack Overflow is about learning, not providing snippets to blindly copy and paste. Please edit your question and explain how it answers the specific question being asked. See [answer]. This is particularly important when answering old questions (this one is nearly 12.5 years old) with existing answers (there are already _81 other answers_). How does this answer improve upon what's already here? – Chris Apr 21 '21 at 00:19

luckydonald · Answer 61 · 2019-04-20T18:34:39.610

Lazy loading version

import pprint
pprint.pprint(list(chunks(range(10, 75), 10)))
[range(10, 20),
 range(20, 30),
 range(30, 40),
 range(40, 50),
 range(50, 60),
 range(60, 70),
 range(70, 75)]
^{_{Confer this implementation's result with the example usage result of the accepted answer.}}

Many of the above functions assume that the length of the whole iterable are known up front, or at least are cheap to calculate.

For some streamed objects that would mean loading the full data into memory first (e.g. to download the whole file) to get the length information.

If you however don't know the the full size yet, you can use this code instead:

def chunks(iterable, size):
    """
    Yield successive chunks from iterable, being `size` long.

    https://stackoverflow.com/a/55776536/3423324
    :param iterable: The object you want to split into pieces.
    :param size: The size each of the resulting pieces should have.
    """
    i = 0
    while True:
        sliced = iterable[i:i + size]
        if len(sliced) == 0:
            # to suppress stuff like `range(max, max)`.
            break
        # end if
        yield sliced
        if len(sliced) < size:
            # our slice is not the full length, so we must have passed the end of the iterator
            break
        # end if
        i += size  # so we start the next chunk at the right place.
    # end while
# end def

This works because the slice command will return less/no elements if you passed the end of an iterable:

"abc"[0:2] == 'ab'
"abc"[2:4] == 'c'
"abc"[4:6] == ''

We now use that result of the slice, and calculate the length of that generated chunk. If it is less than what we expect, we know we can end the iteration.

That way the iterator will not be executed unless access.

score -1 · Answer 62 · answered Dec 03 '19 at 18:34

An old school approach that does not require itertools but still works with arbitrary generators:

def chunks(g, n):
  """divide a generator 'g' into small chunks
  Yields:
    a chunk that has 'n' or less items
  """
  n = max(1, n)
  buff = []
  for item in g:
    buff.append(item)
    if len(buff) == n:
      yield buff
      buff = []
  if buff:
    yield buff

score -1 · Answer 63 · answered Apr 14 '20 at 16:18

-1

def main():
  print(chunkify([1,2,3,4,5,6],2))

def chunkify(list, n):
  chunks = []
  for i in range(0, len(list), n):
    chunks.append(list[i:i+n])
  return chunks

main()

I think that it's simple and can give you a chunk of an array.

answered Apr 14 '20 at 16:18

Matheus Vinícius de Andrade

189
3
7

score -1 · Answer 64 · answered Jul 28 '20 at 14:07

Here is a bit of code written in Python3 that does the same as np.array_split.

list(map(list, map(functools.partial(filter, None), itertools.zip_longest(*iter(lambda: tuple(itertools.islice(a, n)), ())))))

It's quite a long one-liner but it does divide the items evenly amongst the resulting sublists.

score -2 · Answer 65 · edited Nov 14 '14 at 09:47

-2

using List Comprehensions of python

[range(t,t+10) for t in range(1,1000,10)]

[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
 [11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
 [21, 22, 23, 24, 25, 26, 27, 28, 29, 30],....
 ....[981, 982, 983, 984, 985, 986, 987, 988, 989, 990],
 [991, 992, 993, 994, 995, 996, 997, 998, 999, 1000]]

visit this link to know about List Comprehensions

edited Nov 14 '14 at 09:47

BomberMan

1,080
3
13
33

answered May 26 '13 at 15:18

Uday Kumar

1

2

How would you apply your approach on an existing list which comes as input? – Alfe Aug 14 '13 at 23:18
1

@Alfe `for chunk in [some_list[i:i + 10] for i in range(0, len(some_list), 10)]: print chunk` – flexd Aug 02 '14 at 21:34
1

This way it looks a lot like the accepted top-answer ;-) – Alfe Aug 02 '14 at 23:17

score -2 · Answer 66 · answered Nov 26 '13 at 21:58

Yes, it is an old question, but I had to post this one, because it is even a little shorter than the similar ones. Yes, the result looks scrambled, but if it is just about even length...

>>> n = 3 # number of groups
>>> biglist = range(30)
>>>
>>> [ biglist[i::n] for i in xrange(n) ]
[[0, 3, 6, 9, 12, 15, 18, 21, 24, 27],
 [1, 4, 7, 10, 13, 16, 19, 22, 25, 28],
 [2, 5, 8, 11, 14, 17, 20, 23, 26, 29]]

How do you split a list into evenly sized chunks?

66 Answers66

How do you split a list into evenly sized chunks?

Critique of other answers here

Cycle Solution

Slices

Updated prior solutions

Output

A simple solution

Groups of contiguous elements:

Groups of "every-Nth" elements:

Results

Lazy loading version

Linked

Related