31

I'm doing some analysis with pandas in a jupyter notebook and since my apply function takes a long time I would like to see a progress bar. Through this post here I found the tqdm library that provides a simple progress bar for pandas operations. There is also a Jupyter integration that provides a really nice progress bar where the bar itself changes over time.

However, I would like to combine the two and don't quite get how to do that. Let's just take the same example as in the documentation

import pandas as pd
import numpy as np
from tqdm import tqdm

df = pd.DataFrame(np.random.randint(0, 100, (100000, 6)))

# Register `pandas.progress_apply` and `pandas.Series.map_apply` with `tqdm`
# (can use `tqdm_gui`, `tqdm_notebook`, optional kwargs, etc.)
tqdm.pandas(desc="my bar!")

# Now you can use `progress_apply` instead of `apply`
# and `progress_map` instead of `map`
df.progress_apply(lambda x: x**2)
# can also groupby:
# df.groupby(0).progress_apply(lambda x: x**2)

It even says "can use 'tqdm_notebook' " but I don't find a way how. I've tried a few things like

tqdm_notebook(tqdm.pandas(desc="my bar!"))

or

tqdm_notebook.pandas

but they don't work. In the definition it looks to me like

tqdm.pandas(tqdm_notebook(desc="my bar!"))

should work, but the bar doesn't properly show the progress and there is still additional output.

Any other ideas?

Community
  • 1
  • 1
grinsbaeckchen
  • 401
  • 1
  • 6
  • 15
  • There seem to be a bug. I'm experiencing it too. It works with groupby progress_apply... `df.groupby(0).progress_apply(lambda x: x**2)` – Julien Marrec Nov 08 '16 at 00:46
  • @JulienMarrec, I don't see that it works with groupby either. I get an instantly complete green bar and then the updating happens on another not so pretty bar that updates just below the green bar. – grinsbaeckchen Nov 08 '16 at 00:54
  • Yeah I get the not so pretty bar too, but this one works... Maybe worth heading over the [GitHub](https://github.com/tqdm/tqdm/issues) to open an issue if there's no traction here in the future – Julien Marrec Nov 08 '16 at 00:55
  • I would probably be happy with the not-so-pretty bar, though I still wonder why. I also seem to have some weird dependency in my notebook. If I open a new notebook all is good (not pretty but working). But in my actual notebook running the same imports and function after having done some other stuff, the bar actually doesn't update itself but each update is in a new line – grinsbaeckchen Nov 08 '16 at 01:03

4 Answers4

34

My working solution (copied form the documnetation):

from tqdm.auto import tqdm
tqdm.pandas()
Vincenzo Lavorini
  • 1,132
  • 2
  • 10
  • 21
15

You can use:

tqdm_notebook().pandas(*args, **kwargs)

This is because tqdm_notebook has a delayer adapter, so it's necessary to instanciate it before accessing its methods (including class methods).

In the future (>v5.1), you should be able to use a more uniform API:

tqdm_pandas(tqdm_notebook, *args, **kwargs)
gaborous
  • 12,649
  • 7
  • 73
  • 94
  • Thanks, this solves the problem. However it shows two bars instead of one, one with 0 iterations and then the wanted one. Do you know if I can get rid of that? Maybe you can shortly add the usage with the above example to your answer in order to be even easier to grasp. – grinsbaeckchen Jan 12 '17 at 19:10
  • 1
    @grinsbaeckchen This sounds like an old bug we had with notebooks, could you [report in an issue](https://github.com/tqdm/tqdm/issues) with a screenshot so we can fix it? Thanks! – gaborous Jan 13 '17 at 12:28
13

I found that I had to import tqdm_notebook also. A simple example is given below that works in Jupyter notebook.

Given you want to map a function on a variable to create a new variable in your pandas dataframe.

# progress bar
from tqdm import tqdm, tqdm_notebook

# instantiate
tqdm.pandas(tqdm_notebook)

# replace map with progress_map
# where df is a pandas dataframe
df['new_variable'] = df['old_variable'].progress_map(some_function)
mammykins
  • 315
  • 2
  • 9
2

If you want to use more than 1 CPU for that slow apply step, consider using swifter. As a bonus, swifter automatically enables a tqdm progress bar on the apply step. To customize the bar description, use :

df.swifter.progress_bar(enable=True, desc='bar description').apply(...)

Espoir Murhabazi
  • 4,079
  • 1
  • 32
  • 56
crypdick
  • 4,829
  • 3
  • 31
  • 50