1
data frame A1
Type  Worker
A  SHARON
B  RUN
C  JACK
D  NICK
K  EMILY

data frame A2
Type  Rate
A     0.39
A     0.78
A     0.22
A     -1.09

...

K     0.56
K     0.73
K     0.88
K     0.04
K     0.69

This is my coding.

K = merge(A1, A2,all.x=T)
Type  Rate
A     0.39
B     0.78
C     0.22
D     -1.09
K     -1.28
A     0.93
B     0.39
...

A     0.56
B     0.73
C     0.88
D     0.04
K     0.69

But my desired output is sorted by worker and type then show the corresponding value. Any ideas on different length of data frame merge?

3 Answers3

0

This can be achieved by the tidyverse way.

I'll create a dummy data similar to what you have:

library(dplyr)

A1 <- tibble(
  Type = c('A', 'B', 'C', 'D', 'K'),
  Worker = c('SHARON', 'RUN', 'JACK', 'NICK', 'EMILY')
)

A2 <- tibble(
  Type = sample(c('A', 'B', 'C', 'D', 'K'), size = 100, replace = TRUE),
  Rate = rnorm(100, mean = 0.8, sd = 1)
)
A2
## A tibble: 100 x 2
#   Type    Rate
#   <chr>  <dbl>
# 1 K      1.76 
# 2 A      1.07 
# 3 A      2.24 
# 4 K      2.53 
# 5 D      1.92 
# 6 B     -0.294
# 7 D      1.08 
# 8 D      1.94 
# 9 A      1.27 
#10 A     -0.224
## ... with 90 more rows

so you want to merge (or join) two data frames:

A1 %>% right_join(A2, by = 'Type')
## A tibble: 100 x 3
#   Type  Worker   Rate
#   <chr> <chr>   <dbl>
# 1 A     SHARON  1.07 
# 2 A     SHARON  2.24 
# 3 A     SHARON  1.27 
# 4 A     SHARON -0.224
# 5 A     SHARON  1.50 
# 6 A     SHARON -0.352
# 7 A     SHARON  1.85 
# 8 A     SHARON  0.444
# 9 A     SHARON -0.340
#10 A     SHARON -0.181
## ... with 90 more rows

the resulting table contains the 100 observations I created from A2 and a new column with the worker name.

NOTE tibble is a enhanced data.frame version. https://tibble.tidyverse.org/

0

We can use full_join and arrange from dplyr package.

library(dplyr)
K <- A1 %>% 
  full_join(A2, by = "Type") %>% 
  arrange(Type, Worker)

Output:

  # A tibble: 12 x 3
   Type  Worker  Rate
   <chr> <chr>  <dbl>
 1 A     SHARON  0.39
 2 A     SHARON  0.78
 3 A     SHARON  0.22
 4 A     SHARON -1.09
 5 B     RUN    NA   
 6 C     JACK   NA   
 7 D     NICK   NA   
 8 K     EMILY   0.56
 9 K     EMILY   0.73
10 K     EMILY   0.88
11 K     EMILY   0.04
12 K     EMILY   0.69
TarJae
  • 8,026
  • 2
  • 8
  • 25
0

If we want to use data.table, on which operations in R are generally faster, we can first merge A1 and A2, setting by the "by" field as "Type" in order to retain all of the variables after we merge. Then order by worker and type, the data.table way:

 A1 <- data.table(
    Type = c('A', 'B', 'C', 'D', 'K'),
     Worker = c('SHARON', 'RUN', 'JACK', 'NICK', 'EMILY')
 )
 
A2 <- data.table(
     Type = sample(c('A', 'B', 'C', 'D', 'K'), size = 100, replace = TRUE),
     Rate = rnorm(100, mean = 0.8, sd = 1)
 )

K = merge(A1, A2, by = "Type", all.x = TRUE)
setorder(K, "Worker", "Type")

Then we get the following results:

K

Type Worker        Rate
  1:    K  EMILY  3.03436918
  2:    K  EMILY -0.43926123
  3:    K  EMILY  1.51949381
  4:    K  EMILY  0.68492265
  5:    K  EMILY  2.21078017
  6:    K  EMILY  2.83141791
  7:    K  EMILY -0.19588209
  8:    K  EMILY -0.93980497
  9:    K  EMILY  0.55941370
 10:    K  EMILY  0.68850875
 11:    K  EMILY  0.07640872
 12:    K  EMILY  1.93482533
 13:    K  EMILY  1.18638298
 14:    K  EMILY  0.76216663
 15:    K  EMILY  0.28404814
 16:    K  EMILY  0.82054706
 17:    K  EMILY  0.51706340
 18:    K  EMILY -0.52274670
 19:    K  EMILY  1.44324550
 20:    K  EMILY -0.53222671
 21:    K  EMILY  0.45271088
 22:    K  EMILY -1.77868504
 23:    K  EMILY  0.04365219
 24:    K  EMILY  2.03788999
 25:    K  EMILY -0.60246720
 26:    C   JACK  1.98340019
 27:    C   JACK  0.80640231
 28:    C   JACK  1.42432058
...
gabesolomon10
  • 195
  • 1
  • 10