1

I'm aware of the general ins and outs of the different OO systems in R and I'm all for staying in S3 whenever/as long as I can.

However, the need for multiple dispatch keeps haunting me in professional projects and kept pulling me into S4 in the past.

But I refuse to accept that I can't find a S3 emulation of multi dispatch à la S4 that is at least able to support my primary use case of avoiding unsystematic data drift:

  1. My analytical programs create data structures that I publish to stakeholders for them to consume as part of some data pipelines
  2. Usually I publish to a DBMS such as MongoDB, thus my structures somewhat manifest themselves and could/should be thought of as DB schemas
  3. I follow a "release early, release often" approach to stay agile, so It's natural that "things change" - often and over time
  4. As data structures eventually evolve, I want to be explicit about describing the structure so I can define internal migration routines and/or make stakeholders aware of which version of data structures they are dealing with in the DB

If I go about describing my data structures via some versioning scheme (which is the best I could think of so far), I basically need to base my methods on three signature arguments regarding method dispatch:

  1. One for the input (class) of a given data transformation task
  2. One for the version of the input (class)
  3. One for the version of the output (class)

Due diligence

Apart from the OO field guide I've read:

My approach

Sequential S3-based approach for an input_class and an output_class that both can evolve over time structure-wise, which is captures by respective version classes such as v1, v2 etc.

Step 1: Define necessary base components

# Generic that dispatches on `x`
new_output_class <- function(x, schema_version_in, schema_version_out, ...) {
  UseMethod("new_output_class", x)
}

# Generic that dispatches on 'schema_version_in' and 
# can thus be used to define distinct methods for 
# different versions of 'input_class'
new_output_class__input_class <- function(x, schema_version_in, 
  schema_version_out, ...) {
  UseMethod("new_output_class__input_class", schema_version_in)
}

# Generic that dispatches on 'schema_version_out' and 
# can thus be used to define distinct methods for 
# different versions of 'output_class' for inputs of class `input_class`
# with version class `v1`
new_output_class__input_class__v1 <- function(x, schema_version_in, 
  schema_version_out, ...) {
  UseMethod("new_output_class__input_class__v1", schema_version_out)
}

# Top-level method for dispatch based on `input_class`
new_output_class.input_class <- function(
  x, 
  schema_version_in,
  schema_version_out
) {
  message("Level 1: method for `input_class`")

  new_output_class__input_class(x, schema_version_in, schema_version_out)
}

Step 2: Define methods for the initial case of input_class = v1 and output_class = v1

library(magrittr)
library(stringr)

# Method for `input_class` of version `v1`
new_output_class__input_class.v1 <- function(
  x, 
  schema_version_in,
  schema_version_out
) {
  # message("Sub-level 1: `input_class/in:v1`")
  message(stringr::str_glue("Level 2: method for `input_class/in:{schema_version_in}`"))

  new_output_class__input_class__v1(x, schema_version_in, schema_version_out)
}

# Method for `output_class` of version `v1` for inputs 
# of `input_class` of version `v1`
new_output_class__input_class__v1.v1 <- function(
  x, 
  schema_version_in,
  schema_version_out
) {
  message(stringr::str_glue("Level 3: method for `input_class/in:{schema_version_in}/out:{schema_version_out}`"))

  structure(x, 
    class = c(stringr::str_glue("output_class_{schema_version_out}"),
      "output_class")
  )
}

x <- structure(letters[1:3], class = "input_class")

schema_version_of_x <- structure("v1", class = "v1")
schema_version_of_new_output <- structure("v1", class = "v1")

new_output_class(x, schema_version_of_x, schema_version_of_new_output)
# Level 1: method for `input_class`
# Level 2: method for `input_class/in:v1`
# Level 3: method for `input_class/in:v1/out:v1`
# [1] "a" "b" "c"
# attr(,"class")
# [1] "output_class_v1" "output_class"   

Step 3: Handling evolution of output_class to v2

# Method for `output_class` of version `v2` for inputs 
# of `input_class` of version `v1`
new_output_class__input_class__v1.v2 <- function(
  x,
  schema_version_in,
  schema_version_out
) {
  message(stringr::str_glue("Level 3: method for `input_class/in:{schema_version_in}/out:{schema_version_out}`"))

  structure(x %>% rep(2),
    class = c(stringr::str_glue("output_class_{schema_version_out}"),
      "output_class")
  )
}

schema_version_of_new_output <- structure("v2", class = "v2")
new_output_class(x, schema_version_of_x, schema_version_of_new_output)
# Level 1: method for `input_class`
# Level 2: method for `input_class/in:v1`
# Level 3: method for `input_class/in:v1/out:v2`
# [1] "a" "b" "c" "a" "b" "c"
# attr(,"class")
# [1] "output_class_v2" "output_class"   

Step 4: Handling evolution of input_class to v2

# Generic for input `input_class` of version `v2`
new_output_class__input_class__v2 <- function(x, schema_version_in, 
  schema_version_out, ...) {
  UseMethod("new_output_class__input_class__v2", schema_version_out)
}

# Method for `input_class` of version `v2`
new_output_class__input_class.v2 <- function(
  x, 
  schema_version_in,
  schema_version_out
) {
  message(stringr::str_glue("Level 2: method for `input_class/in:{schema_version_in}`"))

  new_output_class__input_class__v2(x, schema_version_in, schema_version_out)
}

# Method for `output_class` of version `v1` for inputs 
# of `input_class` of version `v2`
new_output_class__input_class__v2.v1 <- function(
  x, 
  schema_version_in,
  schema_version_out
) {
  message(stringr::str_glue("Level 3: method for `input_class/in:{schema_version_in}/out:{schema_version_out}`"))

  structure(x %>% tolower(),
    class = c(stringr::str_glue("output_class_{schema_version_out}"),
      "output_class")
  )
}

x <- structure(LETTERS[1:3], class = "input_class")
schema_version_of_x <- structure("v2", class = "v2")
schema_version_of_new_output <- structure("v1", class = "v1")
new_output_class(x, schema_version_of_x, schema_version_of_new_output)
# Level 1: method for `input_class`
# Level 2: method for `input_class/in:v2`
# Level 3: method for `input_class/in:v2/out:v1`
# [1] "a" "b" "c"
# attr(,"class")
# [1] "output_class_v1" "output_class"   

# Method for `output_class` of version `v2` for inputs 
# of `input_class` of version `v2`
new_output_class__input_class__v2.v2 <- function(
  x, 
  schema_version_in,
  schema_version_out
) {
  message(stringr::str_glue("Level 3: method for `input_class/in:{schema_version_in}/out:{schema_version_out}`"))

  structure(x %>% tolower() %>% rep(2),
    class = c(stringr::str_glue("output_class_{schema_version_out}"),
      "output_class")
  )
}

schema_version_of_x <- structure("v2", class = "v2")
schema_version_of_new_output <- structure("v2", class = "v2")
new_output_class(x, schema_version_of_x, schema_version_of_new_output)
# Level 1: method for `input_class`
# Level 2: method for `input_class/in:v2`
# Level 3: method for `input_class/in:v2/out:v2`
# [1] "a" "b" "c" "a" "b" "c"
# attr(,"class")
# [1] "output_class_v2" "output_class"  

Pros

  1. It works
  2. It's S3
  3. It's systematic

Cons

  1. It's even more verbose than any S4-based solution
  2. I don't like it at all

Did anyone come up with a better emulation of multi dispatch in S3 yet?

Rappster
  • 11,680
  • 7
  • 58
  • 113

0 Answers0