Pandas is a Python library for data manipulation and analysis, e.g. dataframes, multidimensional time series and cross-sectional datasets commonly found in statistics, experimental science results, econometrics, or finance. Pandas is one of the main data science libraries in Python.
pandas is a Python library for PAN-el DA-ta manipulation and analysis, i.e. multidimensional time series and cross-sectional data sets commonly found in statistics, experimental science results, econometrics, or finance. pandas
is implemented primarily using NumPy and Cython; it is intended to be able to integrate very easily with NumPy-based scientific libraries, such as statsmodels.
To create a reproducible pandas example:
- How to make good reproducible pandas examples
- How to provide a reproducible copy of your DataFrame with to_clipboard()
Main Features:
- Data structures: for 1 and 2 dimensional labeled datasets (respectively
Series
andDataFrames
). Some of their main features include: - Automatically aligning data and interpolation
- Handling missing observations in calculations
- Convenient slicing and reshaping ("reindexing") functions
- Categorical data types
- Provide 'group by' aggregation or transformation functionality
- Tools for merging/joining together data sets
- Simple matplotlib integration for plotting and graphing
- Multi-Indexing providing structure to indices that allow for representation of an arbitrary number of dimensions.
- Date tools: objects for expressing date offsets or generating date ranges; some functionality similar to scikits.timeseries. Dates can be aligned to a specific time zone and converted/compared at-will
- Statistical models: convenient ordinary least squares and panel OLS implementations for in-sample or rolling time series / cross-sectional regressions. These will hopefully be the starting point for implementing models
- Intelligent Cython offloading; complex computations are performed rapidly due to these optimizations.
- Static and moving statistical tools: mean, standard deviation, correlation, covariance
- Rich User Documentation, using Sphinx
Asking Questions:
- Before asking the question, make sure you have gone through the 10 Minutes to pandas introduction. It covers all the basic functionality of pandas.
- See this question on asking good questions: How to make good reproducible pandas examples
- Please provide the version of pandas, NumPy, and platform details (if appropriate) in your questions
Answering Questions:
- How can I effectively load data on Stack Overflow questions using pandas read_clipboard? (useful for copy pasting data from questions into your terminal as DataFrames)
Useful Canonicals:
- How to pivot a dataframe?
- Pandas Merging 101
- How to deal with SettingWithCopyWarning in Pandas
- What are the 'levels', 'keys', and names arguments for in Pandas' concat function?
- How to pivot a dataframe?
- Selecting multiple columns in a Pandas dataframe
- Delete column from pandas DataFrame
- How to iterate over rows in a DataFrame in Pandas
More FAQs at this link.