I've been exploring how to optimize my code and ran across pandas
.at
method. Per the documentation
Fast label-based scalar accessor
Similarly to loc, at provides label based scalar lookups. You can also set using these indexers.
So I ran some samples:
Setup
import pandas as pd
import numpy as np
from string import letters, lowercase, uppercase
lt = list(letters)
lc = list(lowercase)
uc = list(uppercase)
def gdf(rows, cols, seed=None):
"""rows and cols are what you'd pass
to pd.MultiIndex.from_product()"""
gmi = pd.MultiIndex.from_product
df = pd.DataFrame(index=gmi(rows), columns=gmi(cols))
np.random.seed(seed)
df.iloc[:, :] = np.random.rand(*df.shape)
return df
seed = [3, 1415]
df = gdf([lc, uc], [lc, uc], seed)
print df.head().T.head().T
df
looks like:
a
A B C D E
a A 0.444939 0.407554 0.460148 0.465239 0.462691
B 0.032746 0.485650 0.503892 0.351520 0.061569
C 0.777350 0.047677 0.250667 0.602878 0.570528
D 0.927783 0.653868 0.381103 0.959544 0.033253
E 0.191985 0.304597 0.195106 0.370921 0.631576
Lets use .at
and .loc
and ensure I get the same thing
print "using .loc", df.loc[('a', 'A'), ('c', 'C')]
print "using .at ", df.at[('a', 'A'), ('c', 'C')]
using .loc 0.37374090276
using .at 0.37374090276
Test speed using .loc
%%timeit
df.loc[('a', 'A'), ('c', 'C')]
10000 loops, best of 3: 180 µs per loop
Test speed using .at
%%timeit
df.at[('a', 'A'), ('c', 'C')]
The slowest run took 6.11 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 8 µs per loop
This looks to be a huge speed increase. Even at the caching stage 6.11 * 8
is a lot faster than 180
Question
What are the limitations of .at
? I'm motivated to use it. The documentation says it's similar to .loc
but it doesn't behave similarly. Example:
# small df
sdf = gdf([lc[:2]], [uc[:2]], seed)
print sdf.loc[:, :]
A B
a 0.444939 0.407554
b 0.460148 0.465239
where as print sdf.at[:, :]
results in TypeError: unhashable type
So obviously not the same even if the intent is to be similar.
That said, who can provide guidance on what can and cannot be done with the .at
method?