Questions tagged [r-factor]

The factor is a data type in the R language, used to encode categorical or enumerated data.

The factor is a data type in the R language, used to encode categorical or enumerated data. This data type is often used in statistical models.

This type is encoded as an integer value, along with a lookup table of factor levels. The factor levels are represented as a vector of character strings. This representation allows easy conversion to character, and efficient use in statistical computations.

437 questions
15
votes
2 answers

How to remove ordering of the levels from factor variable in R?

The title says it all, I ordered a factor variable when I generated it, now I would like to remove the ordering and use it as an unordered factor variable. And another question, if I use my factor variable as a predictor in a regression does it…
Pulse
  • 757
  • 5
  • 11
  • 19
14
votes
2 answers

Convert Factor to Date/Time in R

This is the information contained within my dataframe: ## minuteofday: factor w/ 89501 levels "2013-06-01 08:07:00",... ## dDdt: num 7.8564 2.318 ... ## minutes: POSIXlt, format: NA NA NA I need to convert the minute of day column to a date/time…
Michelle
  • 183
  • 1
  • 1
  • 6
13
votes
2 answers

How to convert factor levels to list, in R

Imagine a data frame such as df1 below: df1 <- data.frame(v1 = as.factor(c("m0p1", "m5p30", "m11p20", "m59p60", "m59p60"))) How do I create a list of all the levels of a variable? Thank you.
jpinelo
  • 1,242
  • 5
  • 15
  • 27
11
votes
2 answers

How do I get discrete factor levels to be treated as continuous?

I have a data frame with columns initially labeled arbitrarily. Later on, I want to change these levels to numerical values. The following script illustrates the problem. library(ggplot2) library(reshape2) m <- 10 n <- 6 nam <-…
lafras
  • 7,025
  • 4
  • 26
  • 28
11
votes
3 answers

Linear model (lm) when dependent variable is a factor/categorical variable?

I want to do linear regression with the lm function. My dependent variable is a factor called AccountStatus: 1:0 days in arrears, 2:30-60 days in arrears, 3:60-90 days in arrears and 4:90+ days in arrears. (4) As independent variable I have several…
Tim_Utrecht
  • 1,319
  • 5
  • 17
  • 39
11
votes
4 answers

R: factor levels, recode rest to 'other'

I use factors somewhat infrequently and generally find them comprehensible, but I often am fuzzy about the details for specific operations. Currently, I am coding/collapsing categories with few observations into "other" and am looking for a quick…
ako
  • 3,229
  • 3
  • 24
  • 34
10
votes
1 answer

geom_boxplot() from ggplot2 : forcing an empty level to appear

I can't find a way to ask ggplot2 to show an empty level in a boxplot without imputing my dataframe with actual missing values. Here is reproducible code : # fake data dftest <- expand.grid(time=1:10,measure=1:50) dftest$value <-…
Marc C
  • 101
  • 1
  • 3
10
votes
3 answers

Select row by level of a factor

I have a data frame, df2, containing observations grouped by a ID factor that I would like to subset. I have used another function to identify which rows within each factor group that I want to select. This is shown below in df: df <- data.frame(ID…
Chris. Z
  • 365
  • 1
  • 4
  • 16
9
votes
2 answers

Subset a factor by NA levels

I have a factor in R, with an NA level. set.seed(1) x <- sample(c(1, 2, NA), 25, replace=TRUE) x <- factor(x, exclude = NULL) > x [1] 1 2 2 1 2 2 1 1 [12] 1 2 2 2 1 …
Zach
  • 27,553
  • 31
  • 130
  • 193
9
votes
1 answer

Best way in R to pick which level is the base category for a factor in an lm regression

Suppose I want to run a regression using lm and a factor as a right hand side variable. What is the best way to choose which level in the factor is the base category (the one that is excluded to avoid multicollinearity). Note that I am not…
Xu Wang
  • 8,891
  • 4
  • 40
  • 70
9
votes
4 answers

counting unique factors in r

I would like to know the number of unique dams which gave birth on each of the birth dates recorded. My data frame is similar to this one: dam <- c("2A11","2A11","2A12","2A12","2A12","4D23","4D23","1X23") bdate <-…
baz
  • 5,909
  • 11
  • 31
  • 37
9
votes
1 answer

How do I quickly find out whether two (large) factors are relabelings of each other?

I have two vectors of factors and suspect that they carry the same information up to relabeling. How can I find out whether this is correct? My problem is that both vectors are pretty long (200,000 entries), with a large number of levels (4,000).…
Stephan Kolassa
  • 7,653
  • 2
  • 25
  • 45
9
votes
3 answers

How to make the levels of a factor in a data frame consistent across all columns?

I have a data frame with 5 different columns: Test1 Test2 Test3 Test4 Test5 Sample1 PASS PASS FAIL WARN WARN Sample2 PASS PASS FAIL PASS WARN Sample3 PASS FAIL FAIL PASS WARN Sample4 PASS FAIL …
gaelgarcia
  • 1,756
  • 5
  • 16
  • 37
9
votes
2 answers

Order factor levels in order of appearance in data set

I have a survey in which a unique ID must be assigned to questions. Some questions appear multiple times. This means that there is an extra layer of questions. In the sample data below only the first layer is included. Question: how do I assign a…
Henk
  • 3,428
  • 4
  • 26
  • 51
9
votes
5 answers

Why does R change the variable type when prepending NA values to a data frame with factors?

I have a problem with the way R coerces variable types when using rbind of two data.frames with NA values. I illustrate by…
tomka
  • 2,106
  • 4
  • 28
  • 40
1 2
3
29 30