Questions tagged [dataframe]

A data frame is a 2D tabular data structure. Usually, it contains data where rows are observations and columns are variables and are allowed to be of different types (as distinct from an array or matrix). While "data frame" or "dataframe" is the term used for this concept in several languages (R, Apache Spark, deedle, Maple, the pandas library in Python and the DataFrames library in Julia), "table" is the term used in MATLAB and SQL.

A data frame is a tabular data structure. Usually, it contains data where rows are observations and columns are variables of various types. While data frame or dataframe is the term used for this concept in several languages (R, Apache Spark, deedle, Maple, the pandas library in Python and the DataFrames library in Julia), table is the term used in MATLAB and SQL.

The sections below correspond to each language that uses this term and are aimed at the level of an audience only familiar with the given language.

`data.frame` in R

Data frames (object class data.frame) are one of the basic tabular data structures in the R language, alongside matrices. Unlike matrices, each column can be a different data type. In terms of implementation, a data frame is a list of equal-length column vectors.

Type ?data.frame for help constructing a data frame. An example:

data.frame(
  x = letters[1:5], 
  y = 1:5, 
  z = (1:5) > 3
)
#   x y     z
# 1 a 1 FALSE
# 2 b 2 FALSE
# 3 c 3 FALSE
# 4 d 4  TRUE
# 5 e 5  TRUE

Related functions include is.data.frame, which tests whether an object is a data.frame; and as.data.frame, which coerces many other data structures to data.frame (through S3 dispatch, see ?S3). base r data.frames have been extended or modified to create new data structures by several R packages, including data.table and tibble. For further reading, see the paragraph on Data frames in the CRAN manual Intro to R

DataFrame in Python's pandas library

The pandas library in Python is the canonical tabular data framework on the SciPy stack, and the DataFrame is its two-dimensional data object. It is basically a rectangular array like a 2D numpy ndarray, but with associated indices on each axis which can be used for alignment. As in R, from an implementation perspective, columns are somewhat prioritized over rows: the DataFrame resembles a dictionary with column names as keys and Series (pandas' one-dimensional data structure) as values.

After importing numpy and pandas under the usual aliases (import numpy as np, import pandas as pd), we can construct a DataFrame in several ways, such as passing a dictionary of column names and values:

>>> pd.DataFrame({"x": list("abcde"), "y": range(1,6), "z": np.arange(1,6) > 3})
   x  y      z
0  a  1  False
1  b  2  False
2  c  3  False
3  d  4   True
4  e  5   True

DataFrame in Apache Spark

A Spark DataFrame is a distributed collection of data organized into named columns. It is conceptually equivalent to a table in a relational database or a data frame in R/Python, but with richer optimizations under the hood. DataFrames can be constructed from a wide array of sources such as: structured data files, tables in Hive, external databases, or existing RDDs. (source)

DataFrame in Maple

A DataFrame is one of the basic data structures in Maple. Data frames are a list of variables, known as DataSeries, which are displayed in a rectangular grid. Every column (variable) in a DataFrame has the same length, however, each variable can have a different type, such as integer, float, string, name, boolean, etc.

When printed, Data frames resemble matrices in that they are viewed as a rectangular grid, but a key difference is that the first row corresponds to the column (variable) names, and the first column corresponds to the row (individual) names. These row and columns are treated as header meta-information and are not a part of the data. Moreover, the data stored in a DataFrame can be accessed using these header names, as well as by the standard numbered index. For more details, see the Guide to DataFrames in the online Maple Programming Help.

95558 questions

931

votes

16 answers

Remove rows with all or some NAs (missing values) in data.frame

I'd like to remove the lines in this data frame that: a) contain NAs across all columns. Below is my example data frame. gene hsap mmul mmus rnor cfam 1 ENSG00000208234 0 NA NA NA NA 2 ENSG00000199674 0 2 2 2 …

r dataframe filter missing-data r-faq

asked Feb 01 '11 at 11:52

Benoit B.

10,466
8
23
29

905

votes

3 answers

Use a list of values to select rows from a pandas dataframe

Lets say I have the following pandas dataframe: df = DataFrame({'A' : [5,6,3,4], 'B' : [1,2,3, 5]}) df A B 0 5 1 1 6 2 2 3 3 3 4 5 I can subset based on a specific value: x = df[df['A'] == 3] x A B 2 3 …

python pandas dataframe

asked Aug 23 '12 at 16:31

zach

22,141
16
57
86

868

votes

15 answers

How to deal with SettingWithCopyWarning in Pandas

Background I just upgraded my Pandas from 0.11 to 0.13.0rc1. Now, the application is popping out many new warnings. One of them like this: E:\FinReporter\FM_EXT.py:449: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a…

python pandas dataframe chained-assignment

asked Dec 17 '13 at 03:48

bigbug

40,984
35
71
92

844

votes

7 answers

Convert list of dictionaries to a pandas DataFrame

I have a list of dictionaries like this: [{'points': 50, 'time': '5:00', 'year': 2010}, {'points': 25, 'time': '6:00', 'month': "february"}, {'points':90, 'time': '9:00', 'month': 'january'}, {'points_h1':20, 'month': 'june'}] And I want to turn…

python dictionary pandas dataframe

asked Dec 17 '13 at 15:24

appleLover

11,323
8
30
46

840

votes

13 answers

Pretty-print an entire Pandas Series / DataFrame

I work with Series and DataFrames on the terminal a lot. The default __repr__ for a Series returns a reduced sample, with some head and tail values, but the rest missing. Is there a builtin way to pretty-print the entire Series / DataFrame? …

python pandas dataframe

asked Oct 01 '13 at 19:46

Dun Peal

12,539
11
27
40

819

votes

22 answers

How do I replace NA values with zeros in an R dataframe?

I have a data frame and some columns have NA values. How do I replace these NA values with zeroes?

r dataframe na missing-data imputation

asked Nov 17 '11 at 03:45

Renato Dinhani

30,005
49
125
194

817

votes

8 answers

Writing a pandas DataFrame to CSV file

I have a dataframe in pandas which I would like to write to a CSV file. I am doing this using: df.to_csv('out.csv') And getting the error: UnicodeEncodeError: 'ascii' codec can't encode character u'\u03b1' in position 20: ordinal not in…

python csv pandas dataframe

asked Jun 04 '13 at 16:46

user7289

25,989
27
64
86

738

votes

5 answers

How are iloc and loc different?

Can someone explain how these two methods of slicing are different? I've seen the docs, and I've seen these answers, but I still find myself unable to understand how the three are different. To me, they seem interchangeable in large part, because…

python pandas indexing dataframe

asked Jul 23 '15 at 16:34

AZhao

11,271
6
26
48

645

votes

19 answers

Combine two columns of text in pandas dataframe

I have a 20 x 4000 dataframe in Python using pandas. Two of these columns are named Year and quarter. I'd like to create a variable called period that makes Year = 2000 and quarter= q2 into 2000q2. Can anyone help with that?

python pandas dataframe

asked Oct 15 '13 at 09:42

user2866103

6,877
6
13
13

610

votes

6 answers

Creating an empty Pandas DataFrame, then filling it?

I'm starting from the pandas DataFrame docs here: http://pandas.pydata.org/pandas-docs/stable/dsintro.html I'd like to iteratively fill the DataFrame with values in a time series kind of calculation. So basically, I'd like to initialize the…

python dataframe pandas

asked Dec 09 '12 at 02:50

Matthias Kauer

7,007
5
15
18

599

votes

22 answers

Set value for particular cell in pandas DataFrame using index

I've created a Pandas DataFrame df = DataFrame(index=['A','B','C'], columns=['x','y']) and got this x y A NaN NaN B NaN NaN C NaN NaN Then I want to assign value to particular cell, for example for row 'C' and column 'x'. I've…

python pandas dataframe cell nan

asked Dec 12 '12 at 14:40

Mitkp

6,087
3
11
7

598

votes

14 answers

Select by partial string from a pandas DataFrame

I have a DataFrame with 4 columns of which 2 contain string values. I was wondering if there was a way to select rows based on a partial string match against a particular column? In other words, a function or lambda function that would do something…

python string pandas dataframe

asked Jul 05 '12 at 18:57

euforia

6,065
3
12
5

590

votes

28 answers

How to count the NaN values in a column in pandas DataFrame

I want to find the number of NaN in each column of my data so that I can drop a column if it has fewer NaN than some threshold. I looked but wasn't able to find any function for this. value_counts is too slow for me because most of the values are…

python pandas dataframe

asked Oct 08 '14 at 21:00

user3799307

5,909
3
9
3

586

votes

8 answers

How to filter Pandas dataframe using 'in' and 'not in' like in SQL

How can I achieve the equivalents of SQL's IN and NOT IN? I have a list with the required values. Here's the scenario: df = pd.DataFrame({'country': ['US', 'UK', 'Germany', 'China']}) countries_to_keep = ['UK', 'China'] #…

python pandas dataframe sql-function

asked Nov 13 '13 at 17:11

LondonRob

53,478
30
110
152

583

votes

8 answers

How to convert index of a pandas dataframe into a column

This seems rather obvious, but I can't seem to figure out how to convert an index of data frame to a column? For example: df= gi ptt_loc 0 384444683 593 1 384444684 594 2 384444686 596 To, df= index1 …

python pandas dataframe indexing series

asked Dec 09 '13 at 00:34

msakya

6,911
5
20
29

Prev 1

…

99 100 Next

Questions tagged [dataframe]

data.frame in R

DataFrame in Python's pandas library

DataFrame in Apache Spark

DataFrame in Maple

`data.frame` in R