Questions tagged [dataframe]

A data frame is a 2D tabular data structure. Usually, it contains data where rows are observations and columns are variables and are allowed to be of different types (as distinct from an array or matrix). While "data frame" or "dataframe" is the term used for this concept in several languages (R, Apache Spark, deedle, Maple, the pandas library in Python and the DataFrames library in Julia), "table" is the term used in MATLAB and SQL.

A data frame is a tabular data structure. Usually, it contains data where rows are observations and columns are variables of various types. While data frame or dataframe is the term used for this concept in several languages (R, Apache Spark, deedle, Maple, the pandas library in Python and the DataFrames library in Julia), table is the term used in MATLAB and SQL.

The sections below correspond to each language that uses this term and are aimed at the level of an audience only familiar with the given language.

`data.frame` in R

Data frames (object class data.frame) are one of the basic tabular data structures in the R language, alongside matrices. Unlike matrices, each column can be a different data type. In terms of implementation, a data frame is a list of equal-length column vectors.

Type ?data.frame for help constructing a data frame. An example:

data.frame(
  x = letters[1:5], 
  y = 1:5, 
  z = (1:5) > 3
)
#   x y     z
# 1 a 1 FALSE
# 2 b 2 FALSE
# 3 c 3 FALSE
# 4 d 4  TRUE
# 5 e 5  TRUE

Related functions include is.data.frame, which tests whether an object is a data.frame; and as.data.frame, which coerces many other data structures to data.frame (through S3 dispatch, see ?S3). base r data.frames have been extended or modified to create new data structures by several R packages, including data.table and tibble. For further reading, see the paragraph on Data frames in the CRAN manual Intro to R

DataFrame in Python's pandas library

The pandas library in Python is the canonical tabular data framework on the SciPy stack, and the DataFrame is its two-dimensional data object. It is basically a rectangular array like a 2D numpy ndarray, but with associated indices on each axis which can be used for alignment. As in R, from an implementation perspective, columns are somewhat prioritized over rows: the DataFrame resembles a dictionary with column names as keys and Series (pandas' one-dimensional data structure) as values.

After importing numpy and pandas under the usual aliases (import numpy as np, import pandas as pd), we can construct a DataFrame in several ways, such as passing a dictionary of column names and values:

>>> pd.DataFrame({"x": list("abcde"), "y": range(1,6), "z": np.arange(1,6) > 3})
   x  y      z
0  a  1  False
1  b  2  False
2  c  3  False
3  d  4   True
4  e  5   True

DataFrame in Apache Spark

A Spark DataFrame is a distributed collection of data organized into named columns. It is conceptually equivalent to a table in a relational database or a data frame in R/Python, but with richer optimizations under the hood. DataFrames can be constructed from a wide array of sources such as: structured data files, tables in Hive, external databases, or existing RDDs. (source)

DataFrame in Maple

A DataFrame is one of the basic data structures in Maple. Data frames are a list of variables, known as DataSeries, which are displayed in a rectangular grid. Every column (variable) in a DataFrame has the same length, however, each variable can have a different type, such as integer, float, string, name, boolean, etc.

When printed, Data frames resemble matrices in that they are viewed as a rectangular grid, but a key difference is that the first row corresponds to the column (variable) names, and the first column corresponds to the row (individual) names. These row and columns are treated as header meta-information and are not a part of the data. Moreover, the data stored in a DataFrame can be accessed using these header names, as well as by the standard numbered index. For more details, see the Guide to DataFrames in the online Maple Programming Help.

95558 questions

votes

4 answers

How to partition when ranking on a particular column?

All: I have a data frame like the follow.I know I can do a global rank order like this: dt <- data.frame( ID = c('A1','A2','A4','A2','A1','A4','A3','A2','A1','A3'), Value = c(4,3,1,3,4,6,6,1,8,4) ); > dt ID Value 1 A1 4 2 A2 3 3…

r dataframe rank database-partitioning

asked Apr 01 '12 at 03:39

RobinMin

votes

3 answers

Random sample of rows from subset of an R dataframe

Is there a good way of getting a sample of rows from part of a dataframe? If I just have data such as gender <- c("F", "M", "M", "F", "F", "M", "F", "F") age <- c(23, 25, 27, 29, 31, 33, 35, 37) then I can easily sample the ages of three of the…

r dataframe sample

asked Mar 09 '12 at 02:29

Henry

6,448
2
21
36

votes

7 answers

Using one data.frame to update another

Given 2 data frames that are identical in terms of column names/datatypes, where some columns uniquely identify the rows, is there an efficient function/method for one data.frame to "update" the other? For example, in the following, original and…

r indexing dataframe

asked Nov 01 '11 at 19:04

SFun28

32,209
43
123
233

votes

2 answers

Identify records in data frame A not contained in data frame B

This is my first time posting here, so please be kind ;-) EDIT My question was closed before I had a chance to make the changes suggested to me. So I'm trying to do a better job now, thanks for everyone that answered so far! QUESTION How can I…

r join merge match dataframe

asked Oct 11 '11 at 15:23

Rappster

11,680
7
58
113

votes

3 answers

Unpickling dictionary that holds pandas dataframes throws AttributeError: 'Dataframe' object has no attribute '_data'

I have a class that performs analyses and attaches the results, which are pandas dataframes, as object attributes: >>> print(test.image.locate_DF) y x mass ... raw_mass ep frame 0 60.177142 59.788709 …

python-3.x pandas dataframe pickle

asked Aug 24 '20 at 10:12

Steven

votes

0 answers

Speed up odbc::dbFetch

I'm trying to analyze data stored in an SQL database (MS SQL server) in R, and on a mac. Typical queries might return a few GB of data, and the entire database is a few TB. So far, I've been using the R package odbc, and it seems to work pretty…

r sql-server dataframe odbc fread

asked Mar 04 '20 at 23:30

Michael

votes

3 answers

Get the nearest distance with two geodataframe in pandas

Here is my first geodatframe : !pip install geopandas import pandas as pd import geopandas city1 = [{'City':"Buenos Aires","Country":"Argentina","Latitude":-34.58,"Longitude":-58.66}, …

python pandas dataframe geolocation geopandas

asked Jan 26 '20 at 17:15

Bussiere

votes

4 answers

Calculate percentage of similar values in pandas dataframe

I have one dataframe df, with two columns : Script (with text) and Speaker Script Speaker aze Speaker 1 art Speaker 2 ghb Speaker 3 jka Speaker 1 tyc Speaker 1 avv Speaker 2 bhj Speaker 1 And I have the following list…

python python-3.x pandas dataframe

asked Dec 27 '19 at 15:36

Alex Dana

votes

1 answer

PySpark DataFrame Column Reference: df.col vs. df['col'] vs. F.col('col')?

I have a concept I hope you can help to clarify: What's the difference between the following three ways of referring to a column in PySpark dataframe. I know different situations need different forms, but not sure why. df.col:…

dataframe reference pyspark

asked Mar 11 '19 at 15:32

Zilong Z

votes

3 answers

Delete group if NaN is present anywhere in multiple columns

I am trying to clean my dataframe such that if my "Base_2007" and "Base_2011" column contains NA, then I should completely drop that county. In my case since both Counties contains NA both of them will be dropped. Thus empty dataset will be…

python pandas dataframe

asked Feb 14 '19 at 03:41

Data_is_Power

votes

4 answers

How to check if a pandas dataframe contains only numeric column wise?

I want to check every column in a dataframe whether it contains only numeric. How can i find it.

python pandas dataframe series

asked Jan 29 '19 at 17:47

Raja Sahe S

votes

2 answers

Check if all elements in a group are equal using pandas GroupBy

Is there a pythonic way to group by a field and check if all elements of each resulting group have the same value? Sample data: datetime rating signal 0 2018-12-27 11:33:00 IG 0 1 2018-12-27 11:33:00 HY -1 2 …

python pandas dataframe group-by pandas-groupby

asked Dec 27 '18 at 21:07

Yuca

5,048
2
17
38

votes

2 answers

How can I populate a pandas DataFrame with the result of a Snowflake sql query?

Using the Python Connector I can query Snowflake: import snowflake.connector # Gets the version ctx = snowflake.connector.connect( user=USER, password=PASSWORD, account=ACCOUNT, authenticator='https://XXXX.okta.com', …

pandas dataframe snowflake-cloud-data-platform

asked Nov 02 '18 at 07:58

RubenLaguna

15,227
11
82
96

votes

3 answers

Pandas: how to merge two dataframes on a column by keeping the information of the first one?

I have two dataframes df1 and df2. df1 contains the information of the age of people, while df2 contains the information of the sex of people. Not all the people are in df1 nor in df2 df1 Name Age 0 Tom 34 1 Sara 18 2 Eva …

python pandas dataframe

asked Oct 26 '18 at 13:59

emax

4,629
6
41
86

votes

5 answers

Pandas DataFrame check if column value exists in a group of columns

I have a DataFrame like this (simplified example) id v0 v1 v2 v3 v4 1 10 5 10 22 50 2 22 23 55 60 50 3 8 2 40 80 110 4 15 15 25 100 101 And would like to create an additional column that is either 1 or 0. 1 if v0…

python pandas numpy dataframe

asked Sep 18 '18 at 19:50

EGM8686

1,068
8
14

Prev 1 2 3

…

100 Next

Questions tagged [dataframe]

data.frame in R

DataFrame in Python's pandas library

DataFrame in Apache Spark

DataFrame in Maple

`data.frame` in R