Questions tagged [dataframe]

A data frame is a 2D tabular data structure. Usually, it contains data where rows are observations and columns are variables and are allowed to be of different types (as distinct from an array or matrix). While "data frame" or "dataframe" is the term used for this concept in several languages (R, Apache Spark, deedle, Maple, the pandas library in Python and the DataFrames library in Julia), "table" is the term used in MATLAB and SQL.

A data frame is a tabular data structure. Usually, it contains data where rows are observations and columns are variables of various types. While data frame or dataframe is the term used for this concept in several languages (R, Apache Spark, deedle, Maple, the pandas library in Python and the DataFrames library in Julia), table is the term used in MATLAB and SQL.

The sections below correspond to each language that uses this term and are aimed at the level of an audience only familiar with the given language.

`data.frame` in R

Data frames (object class data.frame) are one of the basic tabular data structures in the R language, alongside matrices. Unlike matrices, each column can be a different data type. In terms of implementation, a data frame is a list of equal-length column vectors.

Type ?data.frame for help constructing a data frame. An example:

data.frame(
  x = letters[1:5], 
  y = 1:5, 
  z = (1:5) > 3
)
#   x y     z
# 1 a 1 FALSE
# 2 b 2 FALSE
# 3 c 3 FALSE
# 4 d 4  TRUE
# 5 e 5  TRUE

Related functions include is.data.frame, which tests whether an object is a data.frame; and as.data.frame, which coerces many other data structures to data.frame (through S3 dispatch, see ?S3). base r data.frames have been extended or modified to create new data structures by several R packages, including data.table and tibble. For further reading, see the paragraph on Data frames in the CRAN manual Intro to R

DataFrame in Python's pandas library

The pandas library in Python is the canonical tabular data framework on the SciPy stack, and the DataFrame is its two-dimensional data object. It is basically a rectangular array like a 2D numpy ndarray, but with associated indices on each axis which can be used for alignment. As in R, from an implementation perspective, columns are somewhat prioritized over rows: the DataFrame resembles a dictionary with column names as keys and Series (pandas' one-dimensional data structure) as values.

After importing numpy and pandas under the usual aliases (import numpy as np, import pandas as pd), we can construct a DataFrame in several ways, such as passing a dictionary of column names and values:

>>> pd.DataFrame({"x": list("abcde"), "y": range(1,6), "z": np.arange(1,6) > 3})
   x  y      z
0  a  1  False
1  b  2  False
2  c  3  False
3  d  4   True
4  e  5   True

DataFrame in Apache Spark

A Spark DataFrame is a distributed collection of data organized into named columns. It is conceptually equivalent to a table in a relational database or a data frame in R/Python, but with richer optimizations under the hood. DataFrames can be constructed from a wide array of sources such as: structured data files, tables in Hive, external databases, or existing RDDs. (source)

DataFrame in Maple

A DataFrame is one of the basic data structures in Maple. Data frames are a list of variables, known as DataSeries, which are displayed in a rectangular grid. Every column (variable) in a DataFrame has the same length, however, each variable can have a different type, such as integer, float, string, name, boolean, etc.

When printed, Data frames resemble matrices in that they are viewed as a rectangular grid, but a key difference is that the first row corresponds to the column (variable) names, and the first column corresponds to the row (individual) names. These row and columns are treated as header meta-information and are not a part of the data. Moreover, the data stored in a DataFrame can be accessed using these header names, as well as by the standard numbered index. For more details, see the Guide to DataFrames in the online Maple Programming Help.

95558 questions

2756

votes

27 answers

How to iterate over rows in a DataFrame in Pandas

I have a DataFrame from Pandas: import pandas as pd inp = [{'c1':10, 'c2':100}, {'c1':11,'c2':110}, {'c1':12,'c2':120}] df = pd.DataFrame(inp) print df Output: c1 c2 0 10 100 1 11 110 2 12 120 Now I want to iterate over the rows of this…

python pandas dataframe

asked May 10 '13 at 07:04

Roman

97,757
149
317
426

2539

votes

11 answers

How to select rows from a DataFrame based on column values

How can I select rows from a DataFrame based on values in some column in Pandas? In SQL, I would use: SELECT * FROM table WHERE colume_name = some_value I tried to look at Pandas' documentation, but I did not immediately find the answer.

python pandas dataframe

asked Jun 12 '13 at 17:42

szli

28,045
8
26
37

2216

votes

29 answers

Renaming columns in Pandas

I have a DataFrame using Pandas and column labels that I need to edit to replace the original column labels. I'd like to change the column names in a DataFrame A where the original column names are: ['$a', '$b', '$c', '$d', '$e'] to ['a', 'b', 'c',…

python pandas replace dataframe rename

asked Jul 05 '12 at 14:21

user1504276

22,263
3
12
7

1647

votes

17 answers

Delete column from pandas DataFrame

When deleting a column in a DataFrame I use: del df['column_name'] And this works great. Why can't I use the following? del df.column_name Since it is possible to access the column/Series as df.column_name, I expected this to work.

python pandas dataframe

asked Nov 16 '12 at 06:26

John

32,659
27
74
102

1396

votes

19 answers

How to sort a dataframe by multiple column(s)

I want to sort a data.frame by multiple columns. For example, with the data.frame below I would like to sort by column z (descending) then by column b (ascending): dd <- data.frame(b = factor(c("Hi", "Med", "Hi", "Low"), levels = c("Low",…

r sorting dataframe r-faq

asked Aug 18 '09 at 21:33

Christopher DuBois

38,442
23
68
91

1377

votes

21 answers

Selecting multiple columns in a Pandas dataframe

I have data in different columns, but I don't know how to extract it to save it in another variable. index a b c 1 2 3 4 2 3 4 5 How do I select 'a', 'b' and save it in to df1? I tried df1 = df['a':'b'] df1 = df.ix[:,…

python pandas dataframe select

asked Jul 01 '12 at 21:03

user1234440

18,511
17
51
88

1358

votes

13 answers

How to join (merge) data frames (inner, outer, left, right)

Given two data frames: df1 = data.frame(CustomerId = c(1:6), Product = c(rep("Toaster", 3), rep("Radio", 3))) df2 = data.frame(CustomerId = c(2, 4, 6), State = c(rep("Alabama", 2), rep("Ohio", 1))) df1 # CustomerId Product # 1 Toaster # …

r join merge dataframe r-faq

asked Aug 19 '09 at 13:18

Dan Goldstein

21,713
17
34
41

1275

votes

15 answers

How do I get the row count of a Pandas DataFrame?

I'm trying to get the number of rows of dataframe df with Pandas, and here is my code. Method 1: total_rows = df.count print total_rows + 1 Method 2: total_rows = df['First_columnn_label'].count print total_rows + 1 Both the code snippets give me…

python pandas dataframe

asked Apr 11 '13 at 08:14

yemu

18,591
8
25
29

1155

votes

20 answers

Get list from pandas DataFrame column headers

I want to get a list of the column headers from a pandas DataFrame. The DataFrame will come from user input so I won't know how many columns there will be or what they will be called. For example, if I'm given a DataFrame like this: >>>…

python pandas dataframe

asked Oct 20 '13 at 21:18

natsuki_2002

19,933
18
42
49

1133

votes

28 answers

Adding new column to existing DataFrame in Python pandas

I have the following indexed DataFrame with named columns and rows not- continuous numbers: a b c d 2 0.671399 0.101208 -0.181532 0.241273 3 0.446172 -0.243316 0.051767 1.577318 5 0.614758 0.075793 -0.451460…

python pandas dataframe chained-assignment

asked Sep 23 '12 at 19:00

tomasz74

13,747
10
32
49

1117

votes

38 answers

How to change the order of DataFrame columns?

I have the following DataFrame (df): import numpy as np import pandas as pd df = pd.DataFrame(np.random.rand(10, 5)) I add more column(s) by assignment: df['mean'] = df.mean(1) How can I move the column mean to the front, i.e. set it as first…

python pandas dataframe

asked Oct 30 '12 at 22:22

Timmie

11,359
3
12
7

1116

votes

30 answers

Create pandas Dataframe by appending one row at a time

I understand that pandas is designed to load fully populated DataFrame but I need to create an empty DataFrame then add rows, one by one. What is the best way to do this ? I successfully created an empty DataFrame with : res =…

python pandas dataframe append

asked May 23 '12 at 08:12

PhE

12,544
3
18
18

1037

votes

10 answers

Change column type in pandas

I want to convert a table, represented as a list of lists, into a Pandas DataFrame. As an extremely simplified example: a = [['a', '1.2', '4.2'], ['b', '70', '0.03'], ['x', '5', '0']] df = pd.DataFrame(a) What is the best way to convert the columns…

python pandas dataframe types casting

asked Apr 08 '13 at 23:53

user1642513

983

votes

12 answers

How to drop rows of Pandas DataFrame whose value in a certain column is NaN

I have this DataFrame and want only the records whose EPS column is not NaN: >>> df STK_ID EPS cash STK_ID RPT_Date 601166 20111231 601166 NaN NaN 600036 20111231 600036 NaN 12 600016 20111231 600016 …

python pandas dataframe nan

asked Nov 16 '12 at 09:17

bigbug

40,984
35
71
92

941

votes

20 answers

Drop data frame columns by name

I have a number of columns that I would like to remove from a data frame. I know that we can delete them individually using something like: df$x <- NULL But I was hoping to do this with fewer commands. Also, I know that I could drop columns using…

r dataframe r-faq

asked Jan 05 '11 at 14:34

Btibert3

34,187
40
119
164

2 3

…

99 100 Next

Questions tagged [dataframe]

data.frame in R

DataFrame in Python's pandas library

DataFrame in Apache Spark

DataFrame in Maple

`data.frame` in R