Questions tagged [pandas]

Pandas is a Python library for data manipulation and analysis, e.g. dataframes, multidimensional time series and cross-sectional datasets commonly found in statistics, experimental science results, econometrics, or finance. Pandas is one of the main data science libraries in Python.

pandas is a Python library for PAN-el DA-ta manipulation and analysis, i.e. multidimensional time series and cross-sectional data sets commonly found in statistics, experimental science results, econometrics, or finance. pandas is implemented primarily using NumPy and Cython; it is intended to be able to integrate very easily with NumPy-based scientific libraries, such as statsmodels.

To create a reproducible pandas example:

Main Features:

  • Data structures: for 1 and 2 dimensional labeled datasets (respectively Series and DataFrames). Some of their main features include:
  • Automatically aligning data and interpolation
  • Handling missing observations in calculations
  • Convenient slicing and reshaping ("reindexing") functions
  • Categorical data types
  • Provide 'group by' aggregation or transformation functionality
  • Tools for merging/joining together data sets
  • Simple matplotlib integration for plotting and graphing
  • Multi-Indexing providing structure to indices that allow for representation of an arbitrary number of dimensions.
  • Date tools: objects for expressing date offsets or generating date ranges; some functionality similar to scikits.timeseries. Dates can be aligned to a specific time zone and converted/compared at-will
  • Statistical models: convenient ordinary least squares and panel OLS implementations for in-sample or rolling time series / cross-sectional regressions. These will hopefully be the starting point for implementing models
  • Intelligent Cython offloading; complex computations are performed rapidly due to these optimizations.
  • Static and moving statistical tools: mean, standard deviation, correlation, covariance
  • Rich User Documentation, using Sphinx

Asking Questions:

  • Before asking the question, make sure you have gone through the 10 Minutes to pandas introduction. It covers all the basic functionality of pandas.
  • See this question on asking good questions: How to make good reproducible pandas examples
  • Please provide the version of pandas, NumPy, and platform details (if appropriate) in your questions

Answering Questions:

Useful Canonicals:

More FAQs at this link.

Resources and Tutorials:

Books:

202712 questions
2756
votes
27 answers

How to iterate over rows in a DataFrame in Pandas

I have a DataFrame from Pandas: import pandas as pd inp = [{'c1':10, 'c2':100}, {'c1':11,'c2':110}, {'c1':12,'c2':120}] df = pd.DataFrame(inp) print df Output: c1 c2 0 10 100 1 11 110 2 12 120 Now I want to iterate over the rows of this…
Roman
  • 97,757
  • 149
  • 317
  • 426
2539
votes
11 answers

How to select rows from a DataFrame based on column values

How can I select rows from a DataFrame based on values in some column in Pandas? In SQL, I would use: SELECT * FROM table WHERE colume_name = some_value I tried to look at Pandas' documentation, but I did not immediately find the answer.
szli
  • 28,045
  • 8
  • 26
  • 37
2216
votes
29 answers

Renaming columns in Pandas

I have a DataFrame using Pandas and column labels that I need to edit to replace the original column labels. I'd like to change the column names in a DataFrame A where the original column names are: ['$a', '$b', '$c', '$d', '$e'] to ['a', 'b', 'c',…
user1504276
  • 22,263
  • 3
  • 12
  • 7
1647
votes
17 answers

Delete column from pandas DataFrame

When deleting a column in a DataFrame I use: del df['column_name'] And this works great. Why can't I use the following? del df.column_name Since it is possible to access the column/Series as df.column_name, I expected this to work.
John
  • 32,659
  • 27
  • 74
  • 102
1377
votes
21 answers

Selecting multiple columns in a Pandas dataframe

I have data in different columns, but I don't know how to extract it to save it in another variable. index a b c 1 2 3 4 2 3 4 5 How do I select 'a', 'b' and save it in to df1? I tried df1 = df['a':'b'] df1 = df.ix[:,…
user1234440
  • 18,511
  • 17
  • 51
  • 88
1275
votes
15 answers

How do I get the row count of a Pandas DataFrame?

I'm trying to get the number of rows of dataframe df with Pandas, and here is my code. Method 1: total_rows = df.count print total_rows + 1 Method 2: total_rows = df['First_columnn_label'].count print total_rows + 1 Both the code snippets give me…
yemu
  • 18,591
  • 8
  • 25
  • 29
1155
votes
20 answers

Get list from pandas DataFrame column headers

I want to get a list of the column headers from a pandas DataFrame. The DataFrame will come from user input so I won't know how many columns there will be or what they will be called. For example, if I'm given a DataFrame like this: >>>…
natsuki_2002
  • 19,933
  • 18
  • 42
  • 49
1133
votes
28 answers

Adding new column to existing DataFrame in Python pandas

I have the following indexed DataFrame with named columns and rows not- continuous numbers: a b c d 2 0.671399 0.101208 -0.181532 0.241273 3 0.446172 -0.243316 0.051767 1.577318 5 0.614758 0.075793 -0.451460…
tomasz74
  • 13,747
  • 10
  • 32
  • 49
1117
votes
38 answers

How to change the order of DataFrame columns?

I have the following DataFrame (df): import numpy as np import pandas as pd df = pd.DataFrame(np.random.rand(10, 5)) I add more column(s) by assignment: df['mean'] = df.mean(1) How can I move the column mean to the front, i.e. set it as first…
Timmie
  • 11,359
  • 3
  • 12
  • 7
1116
votes
30 answers

Create pandas Dataframe by appending one row at a time

I understand that pandas is designed to load fully populated DataFrame but I need to create an empty DataFrame then add rows, one by one. What is the best way to do this ? I successfully created an empty DataFrame with : res =…
PhE
  • 12,544
  • 3
  • 18
  • 18
1086
votes
15 answers

"Large data" workflows using pandas

I have tried to puzzle out an answer to this question for many months while learning pandas. I use SAS for my day-to-day work and it is great for it's out-of-core support. However, SAS is horrible as a piece of software for numerous other…
Zelazny7
  • 35,102
  • 16
  • 63
  • 76
1037
votes
10 answers

Change column type in pandas

I want to convert a table, represented as a list of lists, into a Pandas DataFrame. As an extremely simplified example: a = [['a', '1.2', '4.2'], ['b', '70', '0.03'], ['x', '5', '0']] df = pd.DataFrame(a) What is the best way to convert the columns…
user1642513
983
votes
12 answers

How to drop rows of Pandas DataFrame whose value in a certain column is NaN

I have this DataFrame and want only the records whose EPS column is not NaN: >>> df STK_ID EPS cash STK_ID RPT_Date 601166 20111231 601166 NaN NaN 600036 20111231 600036 NaN 12 600016 20111231 600016 …
bigbug
  • 40,984
  • 35
  • 71
  • 92
905
votes
3 answers

Use a list of values to select rows from a pandas dataframe

Lets say I have the following pandas dataframe: df = DataFrame({'A' : [5,6,3,4], 'B' : [1,2,3, 5]}) df A B 0 5 1 1 6 2 2 3 3 3 4 5 I can subset based on a specific value: x = df[df['A'] == 3] x A B 2 3 …
zach
  • 22,141
  • 16
  • 57
  • 86
868
votes
15 answers

How to deal with SettingWithCopyWarning in Pandas

Background I just upgraded my Pandas from 0.11 to 0.13.0rc1. Now, the application is popping out many new warnings. One of them like this: E:\FinReporter\FM_EXT.py:449: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a…
bigbug
  • 40,984
  • 35
  • 71
  • 92
1
2 3
99 100