392

From the docs for GHC 7.6:

[Y]ou often don't even need the SPECIALIZE pragma in the first place. When compiling a module M, GHC's optimiser (with -O) automatically considers each top-level overloaded function declared in M, and specialises it for the different types at which it is called in M. The optimiser also considers each imported INLINABLE overloaded function, and specialises it for the different types at which it is called in M.

and

Moreover, given a SPECIALIZE pragma for a function f, GHC will automatically create specialisations for any type-class-overloaded functions called by f, if they are in the same module as the SPECIALIZE pragma, or if they are INLINABLE; and so on, transitively.

So GHC should automatically specialize some/most/all(?) functions marked INLINABLE without a pragma, and if I use an explicit pragma, the specialization is transitive. My question is: is the auto-specialization transitive?

Specifically, here's a small example:

Main.hs:

import Data.Vector.Unboxed as U
import Foo

main =
    let y = Bar $ Qux $ U.replicate 11221184 0 :: Foo (Qux Int)
        (Bar (Qux ans)) = iterate (plus y) y !! 100
    in putStr $ show $ foldl1' (*) ans

Foo.hs:

module Foo (Qux(..), Foo(..), plus) where

import Data.Vector.Unboxed as U

newtype Qux r = Qux (Vector r)
-- GHC inlines `plus` if I remove the bangs or the Baz constructor
data Foo t = Bar !t
           | Baz !t

instance (Num r, Unbox r) => Num (Qux r) where
    {-# INLINABLE (+) #-}
    (Qux x) + (Qux y) = Qux $ U.zipWith (+) x y

{-# INLINABLE plus #-}
plus :: (Num t) => (Foo t) -> (Foo t) -> (Foo t)
plus (Bar v1) (Bar v2) = Bar $ v1 + v2

GHC specializes the call to plus, but does not specialize (+) in the Qux Num instance which kills performance.

However, an explicit pragma

{-# SPECIALIZE plus :: Foo (Qux Int) -> Foo (Qux Int) -> Foo (Qux Int) #-}

results in transitive specialization as the docs indicate, so (+) is specialized and the code is 30x faster (both compiled with -O2). Is this expected behavior? Should I only expect (+) to be specialized transitively with an explicit pragma?


UPDATE

The docs for 7.8.2 haven't changed, and the behavior is the same, so this question is still relevant.

Community
  • 1
  • 1
crockeea
  • 21,467
  • 10
  • 44
  • 93
  • 33
    I don't know the answer but it sounds like it might be related to: https://ghc.haskell.org/trac/ghc/ticket/5928 Probably worth opening a new ticket or adding your information there if you think it's likely related to 5928 – jberryman Feb 08 '14 at 20:26
  • 6
    @jberryman There seem to be two differences between that ticket and my question: 1) In the ticket, the equivalent of `plus` was *not* marked as INLINABLE and 2) simonpj indicated that there was some inlining going on with the ticket code, but the core from my example shows that none of the functions were inlined (in particular, I couldn't get rid of the second `Foo` constructor, otherwise GHC inlined stuff). – crockeea Feb 10 '14 at 21:17
  • 5
    ah, okay. What happens when you define `plus (Bar v1) = \(Bar v2)-> Bar $ v1 + v2`, so that the LHS is fully-applied at the call-site? Does it get inlined and then does specialization kick in? – jberryman Feb 11 '14 at 17:32
  • 3
    @jberryman Funny you should ask. I've been down that road with [this question](http://stackoverflow.com/questions/19803949/style-vs-performance-using-vectors) which led to this [trac report](https://ghc.haskell.org/trac/ghc/ticket/8099). I originally had the call to `plus` fully applied specifically due to those links, but in fact I got *less* specialization: the call to `plus` was not specialized either. I have no explanation for that, but was intending to leave it for another question, or hope that it would get resolved in an answer to this one. – crockeea Feb 12 '14 at 01:57
  • well there you go; the circle of life. I'd definitely recommend filing a bug report though. – jberryman Feb 12 '14 at 02:30
  • @jberryman I feel a bit bad about filing a bug report, since I don't know whether or not it's a bug. – crockeea Feb 12 '14 at 03:02
  • 11
    From https://ghc.haskell.org/trac/ghc/wiki/ReportABug: "If in doubt, just report your bug." You shouldn't feel bad, especially since sufficient number of really experienced haskellers here don't know how to answer your question. Test cases like this are probably really valuable for the GHC devs. Anyway good luck! Updated the question if you file a ticket – jberryman Feb 12 '14 at 03:45
  • 2
    @jberryman I filed GHC trac [8744](https://ghc.haskell.org/trac/ghc/ticket/8774) – crockeea Feb 12 '14 at 17:36
  • So I filed the GHC report a week ago. It's nearly identical to the question above, which 57 people seem to think is pretty clear, well-written, and reasonable. But I haven't gotten any response on the ticket yet. Anything I can do to get some attention there? – crockeea Feb 19 '14 at 17:43
  • If GHC already has the issue in their bug tracker, why would them coming here get a resolution any faster? – Robert Harvey Feb 20 '14 at 02:01
  • @RobertHarvey "some attention *there*", not that it really matters where. – crockeea Feb 20 '14 at 02:07
  • 1
    I suppose you can poke them, but it's still up to them to schedule some time to fix it. It looks like GHC is MIT or BSD licensed, so it's basically run by volunteers. – Robert Harvey Feb 20 '14 at 02:09
  • 2
    If you would like to see an answer, CCing yourself on my GHC trac might help it get some attention. – crockeea Apr 04 '14 at 13:44

1 Answers1

4

Short answers:

The question's key points, as I understand them, are the following:

  • "is the auto-specialization transitive?"
  • Should I only expect (+) to be specialized transitively with an explicit pragma?
  • (apparently intended) Is this a bug of GHC? Is it inconsistent with the documentation?

AFAIK, the answers are No, mostly yes but there are other means, and No.

Code inlining and type application specialization is a trade-off between speed (execution time) and code size. The default level gets some speedup without bloating the code. Choosing a more exhaustive level is left to the programmer's discretion via SPECIALISE pragma.

Explanation:

The optimiser also considers each imported INLINABLE overloaded function, and specialises it for the different types at which it is called in M.

Suppose f is a function whose type includes a type variable a constrained by a type class C a. GHC by default specializes f with respect to a type application (substituting a for t) if f is called with that type application in the source code of (a) any function in the same module, or (b) if f is marked INLINABLE, then any other module that imports f from B. Thus, auto-specialization is not transitive, it only touches INLINABLE functions imported and called for in the source code of A.

In your example, if you rewrite the instance of Num as follows:

instance (Num r, Unbox r) => Num (Qux r) where
    (+) = quxAdd

quxAdd (Qux x) (Qux y) = Qux $ U.zipWith (+) x y
  • quxAdd is not specifically imported by Main. Main imports the instance dictionary of Num (Qux Int), and this dictionary contains quxAdd in the record for (+). However, although the dictionary is imported, the contents used in the dictionary are not.
  • plus does not call quxAdd, it uses the function stored for the (+) record in the instance dictionary of Num t. This dictionary is set at the call site (in Main) by the compiler.