Consider this much simpler case that doesn't involve circe or generic derivation at all:
package demo
import org.openjdk.jmh.annotations._
@State(Scope.Thread)
@BenchmarkMode(Array(Mode.Throughput))
class OrderingBench {
val items: List[(Char, Int)] = List('z', 'y', 'x').zipWithIndex
val tupleOrdering: Ordering[(Char, Int)] = implicitly
@Benchmark
def sortWithResolved(): List[(Char, Int)] = items.sorted
@Benchmark
def sortWithVal(): List[(Char, Int)] = items.sorted(tupleOrdering)
}
On 2.11 on my desktop machine I get this:
Benchmark Mode Cnt Score Error Units
OrderingBench.sortWithResolved thrpt 40 15940745.279 ± 102634.860 ps/s
OrderingBench.sortWithVal thrpt 40 16420078.932 ± 102901.418 ops/s
And if you look at allocations the difference is a little bigger:
Benchmark Mode Cnt Score Error Units
OrderingBench.sortWithResolved:gc.alloc.rate.norm thrpt 20 176.000 ± 0.001 B/op
OrderingBench.sortWithVal:gc.alloc.rate.norm thrpt 20 152.000 ± 0.001 B/op
You can tell what's going on by breaking out reify
:
scala> val items: List[(Char, Int)] = List('z', 'y', 'x').zipWithIndex
items: List[(Char, Int)] = List((z,0), (y,1), (x,2))
scala> import scala.reflect.runtime.universe._
import scala.reflect.runtime.universe._
scala> showCode(reify(items.sorted).tree)
res0: String = $read.items.sorted(Ordering.Tuple2(Ordering.Char, Ordering.Int))
The Ordering.Tuple2
here is a generic method that instantiates an Ordering[(Char, Int)]
. This is exactly the same thing that happens when we define our tupleOrdering
, but the difference is that in the val
case it happens once, while in the case where it's resolved implicitly it happens every time sorted
is called.
So the difference you're seeing is just the cost of instantiating the Decoder
instance in every operation, as opposed to instantiating it a single time at the beginning outside of the benchmarked code. This cost is relatively tiny, and for larger benchmarks it's going to be more difficult to see.