The discussion in that particular StackOverflow thread dates back a few questions, so instead of digging there, we will just take the latest benchmark code, and wrap it up with JMH. JMH already has the bindings for Java and Scala, which somewhat alleviates the difference in testing methodology. You can find full benchmark code here (warning, it contains spoilers).
The original benchmark may be expressed with the help of JMH as follows:
@State(Scope.Benchmark)classScalaBench{@Param(Array("1","5","10","15","20"))varlim:Int=_@tailrecprivatedefisEvenlyDivisible(v:Int,div:Int,lim:Int):Boolean={if(div>lim)trueelse(v%div==0)&&isEvenlyDivisible(v,div+1,lim)}@Benchmarkdeftest():Int={varv=10while(!isEvenlyDivisible(v,2,lim))v+=2v}}
What does this thing do? This code tries to find the first even integer past 10, which is evenly divisible by all integers in [2..lim)
. isEvenlyDivisible
is the tail-recursive implementation of the search through div
.
The straight rewrite of the same benchmark in Java yields:
@State(Scope.Benchmark)publicclassJavaBench{@Param({"1","5","10","15","20"})intlim;privatebooleanisEvenlyDivisible(intval,intdiv,intlim){if(div>lim)returntrue;elsereturn(val%div==0)&&isEvenlyDivisible(val,div+1,lim);}@Benchmarkpublicinttest(){intval=10;while(!isEvenlyDivisible(val,2,lim))val+=2;returnval;}}
Okay, now we have two seemingly similar benchmarks, let’s run them! Shall we? But before we do that, let us try our intuition and make some predictions. Most people would predict that since JVMs do not have tail recursion elimination yet and also are rather pessimistic with recursive inlining, the @tailrec
variant exploded by scalac itself would bring in the performance of ordinary looping. Therefore, one would conclude Scala benchmark should win big time over Java benchmark.
Good, let’s see. Running this on my laptop (2x2 i5-2520M, 2.0 GHz, Linux x86_64, JDK 8 GA) yields:
Benchmark (lim) Mode Samples Score Score error Units n.s.ScalaBench.test 1 avgt 15 0.002 0.000 us/op n.s.ScalaBench.test 5 avgt 15 0.494 0.005 us/op n.s.ScalaBench.test 10 avgt 15 24.228 0.268 us/op n.s.ScalaBench.test 15 avgt 15 3457.733 33.070 us/op n.s.ScalaBench.test 20 avgt 15 2505634.259 15366.665 us/op Benchmark (lim) Mode Samples Score Score error Units n.s.JavaBench.test 1 avgt 15 0.002 0.000 us/op n.s.JavaBench.test 5 avgt 15 0.252 0.001 us/op n.s.JavaBench.test 10 avgt 15 12.782 0.325 us/op n.s.JavaBench.test 15 avgt 15 1615.890 7.647 us/op n.s.JavaBench.test 20 avgt 15 1053187.731 20502.217 us/op
That’s odd: Scala benchmark is actually twice as slow. Consistently. At this point, people try to muck the benchmark into showing what they want it to show. The obvious thing to consider is inhibiting tail recursion elimination altogether:
--- ScalaBench.scala+++ ScalaBenchNoTailrec.scala@@ -10,12 +12,12 @@
@State(Scope.Benchmark)
-class ScalaBench {+class ScalaBenchNoTailrec {
@Param(Array("1", "5", "10", "15", "20"))
var lim: Int = _
- @tailrec private def isEvenlyDivisible(v: Int, div: Int, lim: Int): Boolean = {+ def isEvenlyDivisible(v: Int, div: Int, lim: Int): Boolean = {
if (div > lim) true
else (v % div == 0) && isEvenlyDivisible(v, div + 1, lim)
}
This gets the performance back to Java level:
Benchmark (lim) Mode Samples Score Score error Units n.s.ScalaBenchNoTailrec.test 1 avgt 15 0.002 0.000 us/op n.s.ScalaBenchNoTailrec.test 5 avgt 15 0.253 0.002 us/op n.s.ScalaBenchNoTailrec.test 10 avgt 15 12.482 0.056 us/op n.s.ScalaBenchNoTailrec.test 15 avgt 15 1615.506 11.100 us/op n.s.ScalaBenchNoTailrec.test 20 avgt 15 1055345.330 10848.845 us/op
Everyone is happy, except for everyone else who is sad. "Why would such a nice scalac optimization turn the code into slowish nightmare?", they would ask themselves.
