72

Is there a complete list of allowed characters somewhere, or a rule that determines what can be used in an identifier vs an operator?

Matthias Braun
  • 24,493
  • 16
  • 114
  • 144
Peter Hall
  • 36,534
  • 10
  • 79
  • 144

3 Answers3

65

From the Haskell report, this is the syntax for allowed symbols:

a | b means a or b and

a<b> means a except b

special    ->   ( | ) | , | ; | [ | ] | `| { | } 
symbol     ->   ascSymbol | uniSymbol<special | _ | : | " | '>
ascSymbol  ->   ! | # | $ | % | & | * | + | . | / | < | = | > | ? | @
                \ | ^ | | | - | ~
uniSymbol  ->   any Unicode symbol or punctuation 

So, symbols are ASCII symbols or Unicode symbols except from those in special | _ | : | " | ', which are reserved.

Meaning the following characters can't be used: | , ; [ ] ` { } _ : " '

A few paragraphs below, the report gives the complete definition for Haskell operators:

varsym     -> ( symbol {symbol | :})<reservedop | dashes>
consym     -> (: {symbol | :})<reservedop>
reservedop -> .. | : | :: | = | \ | | | <- | -> | @ | ~ | =>

Operator symbols are formed from one or more symbol characters, as defined above, and are lexically distinguished into two namespaces (Section 1.4):

  • An operator symbol starting with a colon is a constructor.
  • An operator symbol starting with any other character is an ordinary identifier.

Notice that a colon by itself, ":", is reserved solely for use as the Haskell list constructor; this makes its treatment uniform with other parts of list syntax, such as "[]" and "[a,b]".

Other than the special syntax for prefix negation, all operators are infix, although each infix operator can be used in a section to yield partially applied operators (see Section 3.5). All of the standard infix operators are just predefined symbols and may be rebound.

Matthias Braun
  • 24,493
  • 16
  • 114
  • 144
Riccardo T.
  • 8,569
  • 5
  • 32
  • 74
  • 10
    Should probably be citing the [haskell2010 report](http://www.haskell.org/onlinereport/haskell2010/haskellch2.html#x7-180002.4) instead of the haskell98 report these days (although in this case they say the same thing, as far as I can see). – Ben Millwood May 11 '12 at 10:00
  • FWIW, tryhaskell.org currently gives a lexical error for trying to use one of the mathematical bracketing symbols, e.g. `let a ⟬ b = 1` – rampion May 14 '19 at 13:58
33

From the Haskell 2010 Report §2.4:

Operator symbols are formed from one or more symbol characters...

§2.2 defines symbol characters as being any of !#$%&*+./<=>?@\^|-~: or "any [non-ascii] Unicode symbol or punctuation".

NOTE: User-defined operators cannot begin with a : as, quoting the language report, "An operator symbol starting with a colon is a constructor."

Sridhar Ratnakumar
  • 68,948
  • 61
  • 139
  • 172
dave4420
  • 44,728
  • 6
  • 108
  • 146
  • 2
    Interesting that you can use arbitrary Unicode. So, for instance, λ or ⊗ would be valid Haskell operators? – Chris Taylor May 11 '12 at 08:57
  • 14
    No, `λ` is a Unicode letter, not a Unicode symbol or a Unicode punctuation character. So you can't use it as part of an operator name (but you can use it as part of an ordinary identifier). – dave4420 May 11 '12 at 08:59
  • 2
    I expect you could use `⊗` as a Haskell operator, but I don't know for sure. – dave4420 May 11 '12 at 09:01
  • 7
    You can. Its `generalCategory` is `MathSymbol` (just to make sure, I actually defined an operator `(⊗)` in ghci, and it was accepted). – Daniel Fischer May 11 '12 at 09:19
26

What I was looking for was the complete list of characters. Based on the other answers, the full list is;

Unicode Punctuation:

Unicode Symbols:

But excluding the following characters with special meaning in Haskell:

(),;[]`{}_:"'

A : is only permitted as the first character of the operator, and denotes a constructor (see An operator symbol starting with a colon is a constructor).

Peter Hall
  • 36,534
  • 10
  • 79
  • 144
  • This is absolutely insane! It is great that using any Unicode symbols is possible, but unfortunately they are usually very hard to type on current keyboards. – Qqwy Jul 08 '16 at 14:18
  • 5
    @Qqwy - Haskell is designed for use in a literate programming environment. You could be producing a document designed for typesetting with code fragments, and have those code fragments actually executable in your source document. The ability to define unicode operators is invaluable for that purpose. – Jules Apr 11 '17 at 03:07