i'm looking for a way to quickly perform in-place subsetting of my dataframes in R.
say i have n dfs of varying lengths, all containing 3 columns: chr
, start
, end
describing genomic regions altogether.
I created a vector of the legitimate 'chr' labels for that matter, let's say ["chr1", "chr2", ... , "chr12"].
some (or all) of my dfs has rows which contains invalid chromosomal names (e.g. "chr4_UG5432_random" or "chrX". the names doesn't really matter - just that they don't appear in my vector of "valid" labels), and I want to efficiently filter out rows with this invalid labels.
so the best solution I found so far is putting them all in a list, and using lapply
on them with
subset(df,chr %in% c(paste0("chr",1:12)))
and I understand that afterwards I can use the functions list2env
to retrieve the variables "holding" the modified dfs.
but i'm sure there is a much simpler way to perform this filtering in-place for each dataframe, without having to throw them in a list. any help is appreciated!