Questions tagged [pandas]

Pandas is a Python library for data manipulation and analysis, e.g. dataframes, multidimensional time series and cross-sectional datasets commonly found in statistics, experimental science results, econometrics, or finance. Pandas is one of the main data science libraries in Python.

pandas is a Python library for PAN-el DA-ta manipulation and analysis, i.e. multidimensional time series and cross-sectional data sets commonly found in statistics, experimental science results, econometrics, or finance. pandas is implemented primarily using NumPy and Cython; it is intended to be able to integrate very easily with NumPy-based scientific libraries, such as statsmodels.

To create a reproducible pandas example:

Main Features:

Data structures: for 1 and 2 dimensional labeled datasets (respectively Series and DataFrames). Some of their main features include:
Automatically aligning data and interpolation
Handling missing observations in calculations
Convenient slicing and reshaping ("reindexing") functions
Categorical data types
Provide 'group by' aggregation or transformation functionality
Tools for merging/joining together data sets
Simple matplotlib integration for plotting and graphing
Multi-Indexing providing structure to indices that allow for representation of an arbitrary number of dimensions.
Date tools: objects for expressing date offsets or generating date ranges; some functionality similar to scikits.timeseries. Dates can be aligned to a specific time zone and converted/compared at-will
Statistical models: convenient ordinary least squares and panel OLS implementations for in-sample or rolling time series / cross-sectional regressions. These will hopefully be the starting point for implementing models
Intelligent Cython offloading; complex computations are performed rapidly due to these optimizations.
Static and moving statistical tools: mean, standard deviation, correlation, covariance
Rich User Documentation, using Sphinx

Asking Questions:

Before asking the question, make sure you have gone through the 10 Minutes to pandas introduction. It covers all the basic functionality of pandas.
See this question on asking good questions: How to make good reproducible pandas examples
Please provide the version of pandas, NumPy, and platform details (if appropriate) in your questions

Answering Questions:

How can I effectively load data on Stack Overflow questions using pandas read_clipboard? (useful for copy pasting data from questions into your terminal as DataFrames)

Useful Canonicals:

Resources and Tutorials:

Books:

202712 questions

844

votes

7 answers

Convert list of dictionaries to a pandas DataFrame

I have a list of dictionaries like this: [{'points': 50, 'time': '5:00', 'year': 2010}, {'points': 25, 'time': '6:00', 'month': "february"}, {'points':90, 'time': '9:00', 'month': 'january'}, {'points_h1':20, 'month': 'june'}] And I want to turn…

asked Dec 17 '13 at 15:24

appleLover

11,323
8
30
46

840

votes

13 answers

Pretty-print an entire Pandas Series / DataFrame

I work with Series and DataFrames on the terminal a lot. The default __repr__ for a Series returns a reduced sample, with some head and tail values, but the rest missing. Is there a builtin way to pretty-print the entire Series / DataFrame? …

python pandas dataframe

asked Oct 01 '13 at 19:46

Dun Peal

12,539
11
27
40

817

votes

8 answers

Writing a pandas DataFrame to CSV file

I have a dataframe in pandas which I would like to write to a CSV file. I am doing this using: df.to_csv('out.csv') And getting the error: UnicodeEncodeError: 'ascii' codec can't encode character u'\u03b1' in position 20: ordinal not in…

python csv pandas dataframe

asked Jun 04 '13 at 16:46

user7289

25,989
27
64
86

766

votes

20 answers

How do I expand the output display to see more columns of a pandas DataFrame?

Is there a way to widen the display of output in either interactive or script-execution mode? Specifically, I am using the describe() function on a pandas DataFrame. When the DataFrame is 5 columns (labels) wide, I get the descriptive statistics…

python pandas printing column-width

asked Jul 29 '12 at 07:44

beets

8,051
4
14
11

738

votes

5 answers

How are iloc and loc different?

Can someone explain how these two methods of slicing are different? I've seen the docs, and I've seen these answers, but I still find myself unable to understand how the three are different. To me, they seem interchangeable in large part, because…

python pandas indexing dataframe

asked Jul 23 '15 at 16:34

AZhao

11,271
6
26
48

691

votes

13 answers

Deleting DataFrame row in Pandas based on column value

I have the following DataFrame: daysago line_race rating rw wrating line_date 2007-03-31 62 11 56 1.000000 56.000000 2007-03-10 83 11 …

python pandas

asked Aug 11 '13 at 14:14

TravisVOX

15,002
13
31
36

645

votes

19 answers

Combine two columns of text in pandas dataframe

I have a 20 x 4000 dataframe in Python using pandas. Two of these columns are named Year and quarter. I'd like to create a variable called period that makes Year = 2000 and quarter= q2 into 2000q2. Can anyone help with that?

python pandas dataframe

asked Oct 15 '13 at 09:42

user2866103

6,877
6
13
13

610

votes

6 answers

Creating an empty Pandas DataFrame, then filling it?

I'm starting from the pandas DataFrame docs here: http://pandas.pydata.org/pandas-docs/stable/dsintro.html I'd like to iteratively fill the DataFrame with values in a time series kind of calculation. So basically, I'd like to initialize the…

python dataframe pandas

asked Dec 09 '12 at 02:50

Matthias Kauer

7,007
5
15
18

599

votes

22 answers

Set value for particular cell in pandas DataFrame using index

I've created a Pandas DataFrame df = DataFrame(index=['A','B','C'], columns=['x','y']) and got this x y A NaN NaN B NaN NaN C NaN NaN Then I want to assign value to particular cell, for example for row 'C' and column 'x'. I've…

python pandas dataframe cell nan

asked Dec 12 '12 at 14:40

Mitkp

6,087
3
11
7

598

votes

14 answers

Select by partial string from a pandas DataFrame

I have a DataFrame with 4 columns of which 2 contain string values. I was wondering if there was a way to select rows based on a partial string match against a particular column? In other words, a function or lambda function that would do something…

python string pandas dataframe

asked Jul 05 '12 at 18:57

euforia

6,065
3
12
5

590

votes

28 answers

How to count the NaN values in a column in pandas DataFrame

I want to find the number of NaN in each column of my data so that I can drop a column if it has fewer NaN than some threshold. I looked but wasn't able to find any function for this. value_counts is too slow for me because most of the values are…

python pandas dataframe

asked Oct 08 '14 at 21:00

user3799307

5,909
3
9
3

586

votes

8 answers

How to filter Pandas dataframe using 'in' and 'not in' like in SQL

How can I achieve the equivalents of SQL's IN and NOT IN? I have a list with the required values. Here's the scenario: df = pd.DataFrame({'country': ['US', 'UK', 'Germany', 'China']}) countries_to_keep = ['UK', 'China'] #…

python pandas dataframe sql-function

asked Nov 13 '13 at 17:11

LondonRob

53,478
30
110
152

583

votes

8 answers

How to convert index of a pandas dataframe into a column

This seems rather obvious, but I can't seem to figure out how to convert an index of data frame to a column? For example: df= gi ptt_loc 0 384444683 593 1 384444684 594 2 384444686 596 To, df= index1 …

python pandas dataframe indexing series

asked Dec 09 '13 at 00:34

msakya

6,911
5
20
29

580

votes

10 answers

Shuffle DataFrame rows

I have the following DataFrame: Col1 Col2 Col3 Type 0 1 2 3 1 1 4 5 6 1 ... 20 7 8 9 2 21 10 11 12 2 ... 45 13 14 15 3 46 16 17 18 3 ... The DataFrame…

python pandas dataframe permutation shuffle

asked Apr 11 '15 at 09:47

JNevens

8,412
7
35
67

571

votes

8 answers

Get statistics for each group (such as count, mean, etc) using pandas GroupBy?

I have a data frame df and I use several columns from it to groupby: df['col1','col2','col3','col4'].groupby(['col1','col2']).mean() In the above way I almost get the table (data frame) that I need. What is missing is an additional column that…

python pandas dataframe group-by pandas-groupby

asked Oct 15 '13 at 15:00

Roman

97,757
149
317
426

Prev 1

…

99 100 Next