Questions tagged [subset]

A subset consists of those elements selected from a larger set of elements, by their position in the larger set or other features, such as their value.

Definition:

From Wikipedia:

a set A is a subset of a set B, or equivalently B is a superset of A, if A is 'contained' inside B, that is, all elements of A are also elements of B.

Uses:

  • In , subset is a function that selects a subset of elements from a vector, matrix, or data frame, given some logical expression (caution: subset drops incomplete cases; see How to subset data in R without losing NA rows?). However, for programmatic use (as opposed to interactive use) it is better to use the \[ (or [[) operators or the filter function from dplyr. substring is used to find subsets of character strings.
  • In , a subset of an array can be obtained with array[indices].
5825 questions
417
votes
2 answers

Why is `[` better than `subset`?

When I need to filter a data.frame, i.e., extract rows that meet certain conditions, I prefer to use the subset function: subset(airquality, Month == 8 & Temp > 90) Rather than the [ function: airquality[airquality$Month == 8 & airquality$Temp >…
flodel
  • 82,429
  • 18
  • 167
  • 205
327
votes
11 answers

How to drop columns by name in a data frame

I have a large data set and I would like to read specific columns or drop all the others. data <- read.dta("file.dta") I select the columns that I'm not interested in: var.out <- names(data)[!names(data) %in% c("iden", "name", "x_serv",…
leroux
  • 3,358
  • 3
  • 14
  • 8
298
votes
14 answers

Opposite of %in%: exclude rows with values specified in a vector

A categorical variable V1 in a data frame D1 can have values represented by the letters from A to Z. I want to create a subset D2, which excludes some values, say, B, N and T. Basically, I want a command which is the opposite of %in% D2 = subset(D1,…
user702432
  • 10,330
  • 16
  • 49
  • 68
183
votes
3 answers

How can I get the intersection, union, and subset of arrays in Ruby?

I want to create different methods for a class called Multiset. I have all the required methods, but I'm unsure of how to write intersection, union, and subset methods. For intersection and union, my code starts like this: def intersect(var) x =…
user487743
  • 2,007
  • 4
  • 16
  • 11
164
votes
9 answers

Filter data.frame rows by a logical condition

I want to filter rows from a data.frame based on a logical condition. Let's suppose that I have data frame like expr_value cell_type 1 5.345618 bj fibroblast 2 5.195871 bj fibroblast 3 5.247274 bj fibroblast 4 5.929771 …
lhahne
  • 5,269
  • 8
  • 31
  • 39
154
votes
10 answers

Check whether an array is a subset of another

Any idea on how to check whether that list is a subset of another? Specifically, I have List t1 = new List { 1, 3, 5 }; List t2 = new List { 1, 5 }; How to check that t2 is a subset of t1, using LINQ?
Graviton
  • 76,900
  • 138
  • 399
  • 575
150
votes
3 answers

Subset data frame based on multiple conditions

I wish to filter a data frame based on conditions in several columns. For example, how can I delete rows if column A = B and Column E = 0.
AME
  • 4,956
  • 20
  • 67
  • 78
120
votes
17 answers

Python: Check if one dictionary is a subset of another larger dictionary

I'm trying to write a custom filter method that takes an arbitrary number of kwargs and returns a list containing the elements of a database-like list that contain those kwargs. For example, suppose d1 = {'a':'2', 'b':'3'} and d2 = the same thing.…
Jamey
  • 3,550
  • 2
  • 20
  • 19
113
votes
3 answers

Check if list contains any of another list

I have a list of parameters like this: public class parameter { public string name {get; set;} public string paramtype {get; set;} public string source {get; set;} } IEnumerable parameters; And a array of strings i want to…
gdp
  • 7,194
  • 10
  • 36
  • 60
110
votes
4 answers

Selecting data frame rows based on partial string match in a column

I want to select rows from a data frame based on partial match of a string in a column, e.g. column 'x' contains the string "hsa". Using sqldf - if it had a like syntax - I would do something like: select * from <> where x like 'hsa'. Unfortunately,…
Asda
  • 1,125
  • 2
  • 9
  • 4
100
votes
6 answers

Subset of rows containing NA (missing) values in a chosen column of a data frame

We have a data frame from a CSV file. The data frame DF has columns that contain observed values and a column (VaR2) that contains the date at which a measurement has been taken. If the date was not recorded, the CSV file contains the value NA, for…
John
  • 1,121
  • 2
  • 9
  • 12
90
votes
1 answer

How to subset matrix to one column, maintain matrix data type, maintain row/column names?

When I subset a matrix to a single column, the result is of class numeric, not matrix (i.e. myMatrix[ , 5 ] to subset to the fifth column). Is there a compact way to subset to a single column, maintain the matrix format, and maintain the row/column…
SFun28
  • 32,209
  • 43
  • 123
  • 233
88
votes
1 answer

Select multiple elements from a list

I have a list in R some 10,000 elements long. Say I want to select only elements, 5, 7, and 9. I'm not sure how I would do that without a for loop. I want to do something like mylist[[c(5,7,9]] but that doesn't work. I've also tried the lapply…
user1357015
  • 9,314
  • 16
  • 56
  • 100
80
votes
1 answer

Undefined columns selected when subsetting data frame

I have a data frame, str(data) to show more about my data frame the result is the following: > str(data) 'data.frame': 153 obs. of 6 variables: $ Ozone : int 41 36 12 18 NA 28 23 19 8 NA ... $ Solar.R: int 190 118 149 313 NA NA 299 99 19 194…
CreamStat
  • 2,013
  • 6
  • 20
  • 42
78
votes
3 answers

Select rows from a data frame based on values in a vector

I have data similar to this: dt <- structure(list(fct = structure(c(1L, 2L, 3L, 4L, 3L, 4L, 1L, 2L, 3L, 1L, 2L, 3L, 2L, 3L, 4L), .Label = c("a", "b", "c", "d"), class = "factor"), X = c(2L, 4L, 3L, 2L, 5L, 4L, 7L, 2L, 9L, 1L, 4L, 2L, 5L, 4L, 2L)),…
Joe King
  • 2,463
  • 4
  • 22
  • 39
1
2 3
99 100