The revised syntax is an alternative syntax for OCaml. Its purposes
are 1/ fix some problems of the normal syntax (unclosed constructions
sometimes introducing ambiguities, constructors arity, end of top
level phrases and structure items, etc) 2/ avoid unjustified double
constructions (":="
vs ``<-
'', ``fun'' vs ``function'',
``begin..end'' vs parentheses) or concepts (types and types
declarations) 3/ bring some ideas (lists, types). In a word, propose a
syntax which be more logical, simpler, more consistent and easier to
parse and to pretty print.
The revised syntax, being few used, is less constrained by the
history than the normal one, and can try to answer the question: ``how
things should be done'' instead of ``how to remain compatible with old
versions''.
Other motivations are: 1/ show that syntax is just a ``shell'' of the
language: you can change it without modifying the background 2/
experiment right to the end the ability of Camlp4 of doing syntax
extensions.
It is a syntax of the complete language, therefore it can be used for
all OCaml programs: by the way, Camlp4 is itself completely written in
that syntax. Notice that it is not a constraint: it is always possible
to convert from and to the normal syntax, using the pretty print
facilities of Camlp4.
Remark: syntax in programming languages is much a question of personal
taste. This syntax represents mine, with some ideas taken here and
there. Some choices may seem arbitrary (other solutions are possible),
but I tried to keep some consistency, and without being too far from
the normal syntax: I guess that it is possible to understand a program
written in revised syntax even without having read this chapter.
Most of the constructions in revised syntax are therefore the same
than in the normal syntax. This chapter presents only the differences,
and the motivations of them.
The quotations for OCaml syntax trees, which we shall see in next
chapter, use the revised syntax.
To compile the file foo.ml
written in revised syntax, use:
$ ocamlc -pp camlp4r foo.ml
To use the revised syntax in the toplevel, do:
$ ocaml
#load "camlp4r.cma";;
- In revised syntax, simple semicolons end the items of structures,
signatures and objects. These semicolons are mandatory. The
double semicolon is no more a token. There is no ambiguity with the
sequence, which has a special construction (see further).
- The declaration of a global variable is introduced by the keyword
``
value
'', ``let
'' being reserved to the construction
``let..in
'':
OCaml | Revised |
let x = 23;; | value x = 23; |
let x = 23 in x + 7;; | let x = 23 in x + 7; |
- In interfaces, one must use ``
value
'', too, instead of
``val
''.
OCaml | Revised |
val x : int;; | value x : int; |
Motivation of the simple semicolon
The double semicolon in OCaml exists for historical reasons: the first
parsers were driven by the tokens, not by the rules: all constructions
needed to have a specific token.
But because of the introduction of modules in OCaml, the double
semicolon, which was mandatory in Caml Light to end sentences, became
optional: the reason is that in OCaml, a ``phrase'' and a
``structure item'' are actually the same notion. The problem is that
the double semicolon is associated with the idea of ``terminating''
something: for a phrase, it is exact, but not for a structure item
inside a structure, since other structure items and the keyword
``end'' follow.
That choice of letting the double semicolon be optional in normal
syntax has introduced several problems:
- A structure item is actually ended by the beginning of the next
structure item; it means that all structure items must start with a
keyword; otherwise there is an ambiguity. For example, you cannot write:
print_string "hello, world"
print_newline ()
because it is interpreted as a call to print_string
with 3
parameters (and typing error). The advocated solution is to write:
let _ = print_string "hello, world"
let _ = print_newline ()
Mmm....
- But this solution does not work interactively: in the toplevel, you
cannot ask people to type the beginning of the next sentence to see
the result of the current one. Therefore the double semicolon still
remains! The property that we write in the toplevel like in source
files has been lost.
- In structures and objects, the fact that you don't end the
structure items and object items make the programs more difficult to
read. If you write a short object or structure item in one only line,
it is very difficult to see where the items start and end.
My opinion is that the structure items should end with a token in a
context where there is never need to read another token. This ensures
a correct behavior in the interactive toplevel. The fact that the
sequence is closed, in the revised syntax, frees the simple semicolon.
And a simple semicolon is perfectly acceptable inside structures and
objects, to end their item, the same way they close a record item. In
the revised syntax, this ending semicolon is mandatory.
It is easier to treat a language whose all phrases end with a token:
at end of the sentences, the characters and the tokens streams are
synchronized (no need to read an extra token to be sure that the
phrase is ended). This property can bring simplifications in other
treatments (extraction of comments or code for documentation,
indentation, editors modes, interactive tools).
Motivation of ``value''
The choice of having a different keyword value
instead of
let
, for a toplevel value definition, is to mark the difference
with the let..in
construct. At toplevel, to see if it is a
let
or or let..in
, we have to look at the end of the let binding.
In the abstract syntax tree, let
and let...in
are very
different: they do not even have the same type: let
is a
structure item, while let...in
is an expression. This deserves
to be more visible in the concrete syntax.
Why not val
instead of value
? It is to be coherent with
the other declarations type
and exception
, which are not
abbreviations: we don't write typ
for type declarations, nor
exc
for exception declarations.
5.3 |
Imperative constructions |
|
-
The sequence is introduced by the keyword
``
do
'' followed by ``{
'' and terminated by ``}
''
(it is possible to put a semicolon after the last expression):
OCaml | Revised |
e1; e2; e3; e4 | do { e1; e2; e3; e4 } |
- The body of ``
for
'' and ``while
'' has the same
syntax:
OCaml | Revised |
while e1 do | while e1 do { |
e2; e3; e4 | e2; e3; e4 |
done | } |
- The ``lets'' apply up to the end of the sequences.
Motivation of ``do'' and braces
First, the sequence needed to be closed. For the reason of the
previous section (toplevel phrases), but also because there are too
many ambiguities with other constructions. For example in the list:
[ a; b; c ]
We know that it is the list of a
, b
and c
. But it
could be interpreted as a list one element, the sequence
"a; b; c"
. In the grammar, it supposes that list items are not
``top'' expressions (expressions of the first level of the ``expr''
grammar entry): it is mandatory to use things like ``expression-1'' or
``simple expression'' in the grammar.
In revised syntax, this case never occurs: when a rule needs an
expression, it always uses the top level of the ``expr'' entry. The
grammar is then simpler and easier to read and understand.
The choice of "do"
followed by braces has something
arbitrary. However, the keyword "do"
let us easily think of
something imperative (not functional). And the braces remind the
sequence in the C language.
Why not do..done
? Question of taste. It could have been
do..done
. The idea is to remain relatively discrete. And the
proposed construction saves a keyword.
Note that a let...in
in the sequence applies up to the end of
the sequence, like in normal syntax. However, in normal syntax,
because of the fact that the sequence is an opened construction, you
can obtain strange results. In the example:
if condition then
a-simple-statement;
statement-2;
statement-3;
Let us suppose that you need to add a let binding for the ``simple
statement'': if you just add it, this is what you see:
if condition then
let v = expr in
a-simple-statement;
statement-2;
statement-3;
But what you get is actually:
if condition then
let v = expr in
a-simple-statement;
statement-2;
statement-3;
The let
has ``absorbed'' the rest of the sequence, which is now
included in the if condition. To be correct, you need to add an
enclosing begin..end
or parentheses.
-
Parentheses are mandatory in tuples:
OCaml | Revised |
1, "hello", World | (1, "hello", World) |
- Lists are always enclosed with ``
[
'' and ``]
''.
Their syntax is:
list |
::= |
[ elem-list opt-cons ] |
elem-list |
::= |
expression ; elem-list |
expression |
opt-cons |
::= |
:: expression | (*empty*) |
A list is a sequence of expressions separated by semicolons, optionally
ended by a ``::
'' and an expression, the whole being always enclosed
by brackets.
Examples:
OCaml | Revised |
x::y | [x::y] |
[x; y; z] | [x; y; z] |
x::y::z::t | [x::[y::[z::t]]] |
x::y::z::t | [x; y; z :: t] |
Note the two ways to write the last case.
Motivation to close the tuples by parentheses
In mathematics, tuples are always between parentheses.
Moreover, it is in a general policy of the revised syntax: close more
constructions: it is easier to read and don't need to learn certain
subtle precedences levels.
Motivation for the syntax of lists
In revised syntax, the lists are always closed. Be a ``cons''
[a :: b]
or an enumeration of all items
[a; b; c]
, we always know syntactically where a list starts
and when it ends.
This syntax have something similar of the lists in Lisp: the brackets
are like the parentheses, the semicolons are like the spaces and the
double colon is like the dot.
Moreover, the syntax:
[ x; y; z :: t ]
is more understandable and more logical than the equivalent in normal
syntax:
x :: y :: z :: t
Indeed, reading it in normal syntax, the types are not clear:
x
, y
and z
are not of same type than t
, we
have to remember that this double colon is right associative, which is
generally not natural. In revised syntax, x
, y
, and
z
are at the same level (separated by semicolons), different
from the one of t
(separated from the rest by the double
colon).
In revised syntax, it is clear that x
, y
and
z
are the first items of the list, because the syntax is
identical when the list is ended by a ``cons'' and when it is not,
what is not the case in normal syntax.
There is a notion of ``irrefutable patterns'' used by some syntactic
constructions (next sections). Matching against these patterns never
fails. An ``irrefutable pattern'' is either:
-
A variable.
- The wildcard ``
_
''.
- The constructor ``
()
''.
- A tuple with irrefutable patterns.
- A record with irrefutable patterns.
- An irrefutable pattern with a type constraint.
Note that the term ``irrefutable'' does not apply to all patterns
which never fail: constructors alone in their type definition,
except ``()
'', are not said ``irrefutable'' (the fact that
they be alone or not cannot be determined at parsing time).
5.6 |
Constructions with matching |
|
Motivation for one alone keyword ``fun''
The presence of fun
and function
is somewhat strange,
since they have the same semantics.
In revised syntax, by adding this notion of ``irrefutable patterns'',
there is no ambiguity: a list not being an irrefutable pattern, the
construction with brackets is not a parsing problem. When using
an irrefutable pattern, there must be only one case, and therefore no
close construction is necessary, allowing us to keep the simple
frequent form: fun x -> x
.
Motivation to close the constructions
It is to avoid the problem of the ``dangling bar'' (the same than the
``dangling else'' in the ``if'' construct). In normal syntax, this
program:
match ... with
case1 ->
match ... with
case11 -> ...
| case12 -> ...
| case2 -> ...
is wrongly interpreted: to obtain what you want, you need to use
parentheses or begin..end
to close the internal match
construct. There is a same problem with the if
construct,
because of the optional else
(see further).
I admit that the fact that all cases do not start with the same token
(the first starting with a left brace, the other ones with a vertical
bar) is not practical in editing programs: it is indeed complicated to
exchange the first case and the other ones. However readability and
absence of ambiguity are more important than easiness to use and
absence of verbosity: when it is easy to edit but risk to introduce
bugs or irregularities, it is not sure that it be better.
Why not close the construction by a keyword, end
for example,
like the Ada
language does? It is because an ending keyword
gives an idea of something imperative, it does not make think that
something is returned, which is however the case in the match
construct, like most of OCaml
constructs.
Motivation for the empty forms
The empty function is useful for initial cases of iterations or initial
references values. It is not absolutely essential since it is possible
to write:
fun _ -> assert False
The empty match
existed before the introduction of the
assert
construction in OCaml
. Like the assert, it
indicates the position of the error in the file.
These constructions are there because they are the limit when the
number of the matching cases reach zero.
Motivation for irrefutable patterns in ``let''
In normal syntax, if you use a ``let'' binding with a non irrefutable
pattern, you get a typing message ``pattern matching is not
exhaustive''. If you want to be clean and add the missing cases, you
have to torture your sources. Indeed, for example, the
let x :: y = a in b
must be changed into:
match a with x :: y -> b | ...
In revised syntax, since it is forbidden, you are never in this situation.
Motivation for the ``where'' construct
This construction existed in the old ``Caml'' V3.1 (whose development
was stopped by the beginning of the 90ies) and I liked it much. There
was a problem in this construct, because it was possible to add
several bindings separated with ``and'', which sometimes could enter
in conflict (another ``dangling'' case) with a possible ``and'' in an
enclosing ``let'':
let a =
b where c = d
and e = f in ...
In this situation, the ``where'' construct used to ``absorb'' the
``and'' of the ``let'' binding. The program was interpreted as:
let a =
b where c = d and e = f
in ...
Because of that, in Caml Light
and OCaml
, the ``where''
construction were removed. But a ``where'' with only one binding could
works. Anyway, having several bindings is not interesting nor useful nor
readable, in this construction.
I personally use this construction in the case when the ``let''
binding is a function definition and the expression a call to this
function. I generally prefer to write:
loop 0 where rec loop i = ...
than the equivalent form:
let loop i = ... in loop 0
I consider the form with where
more readable in this situation.
5.7 |
Mutables and assignment |
|
-
The statement ``
<-
'' is written ``:=
'':
OCaml | Revised |
x.f <- y | x.f := y |
- The ``
ref
'' type is used as if its field label was
named ``val
'', instead of ``contents
''. The operator
``!
'' does not exist any more, and references are assigned like
the other mutables:
OCaml | Revised |
x := !x + y | x.val := x.val + y |
Motivation
Having two constructions for the assignment is abnormal. In normal
syntax, the ":="
, specific to the ref
type, is an old
rest of the time when references where implemented with a constructor
(there were mutable constructors, then), and the codes to extract a
reference value and to change it were complicated:
match x with Ref x -> x
match x with Ref x -> x <- y
It was then justified to have specific constructions "!x"
and
"x := y"
for these cases. Now, references are implemented with
a record type, and these constructions can be written:
x.contents
x.contents <- y
In normal syntax, there are 2 ways to access and assign references,
although the method using the label ``contents'' is rarely used. In
revised syntax, it is the only method. However, I consider
``contents'' as a too long identifier, it is why I changed it into
``val''. It is actually not a change in the definition of ref
(since Camlp4
does only syntax), it is changed in the syntax
trees, the real name of the field remaining ``contents''.
As ":="
is no more necessary with the semantics of assigning a
reference value, it can be used in the place of "<-"
, a token
less natural and introducing confusions (when we read it) with the
"->"
of the functions and pattern matchings.
The construction !x
is no more necessary either since we can
write x.val
. We then save two tokens which were used only for
the reference type.
-
The type constructors are before their type parameters, which
are written with the currified form:
OCaml | Revised |
int list | list int |
('a, bool) Hashtbl.t | Hashtbl.t 'a bool |
type 'a foo = | type foo 'a = |
'a list list;; | list (list 'a); |
- The abstract types are represented by a unbound type variable:
OCaml | Revised |
type 'a foo;; | type foo 'a = 'b; |
type bar;; | type bar = 'a; |
- Parentheses are mandatory in tuples of types:
OCaml | Revised |
int * bool | (int * bool) |
- In declaration of a concrete type, brackets must enclose
the constructors declarations:
OCaml | Revised |
type t = A of i | B;; | type t = [ A of i | B ]; |
- It is possible to make the empty type, without constructor:
type foo = [];
- There is a syntax difference between data constructors with
several parameters and data constructors with one parameter of type
tuple.
The declaration of a data constructor with several parameters is
done by separating the types with ``and
''. In expressions and
patterns, this constructor parameters must be currified:
OCaml | Revised |
type t = C of t1 * t2;; | type t = [ C of t1 and t2 ]; |
C (x, y);; | C x y; |
The declaration of a data constructor with one parameter of type
tuple is done by using a tuple type. In expressions and patterns,
the parameter has not to be currified, since it is alone:
OCaml | Revised |
type t = D of (t1 * t2);; | type t = [ D of (t1 * t2) ]; |
D (x, y);; | D (x, y); |
- The predefined constructors ``
True
'' and ``False
''
start with an uppercase letter.
- In record types, the keyword ``
mutable
'' must appear
after the colon:
OCaml | Revised |
type t = {mutable x : t1};; | type t = {x : mutable t1}; |
Motivation for the applying order of type constructors
The order is to look like the constructors values: you can then read
value in the same order than their types. The syntax with
currification style is used also for value constructors.
Motivation for the abstract types syntax
It was to look like existential types, because abstract types are
actually some kind of existential types. This may have a meaning if
existential types are included one day in OCaml
.
Motivation for the parentheses around tuple types
Close more constructions. Closed like tuples are. Moreover it is more
visible in constructor declarations to differentiate the case of two
parameters and one parameter being a tuple.
Motivation for the constructor declaration type
The revised syntax have tried to be the most general possible, to plan
the possible future extensions of the language.
Record types are closed by braces (no change). Symmetrically, the sum
types (declaring constructors) are closed by brackets. This is also a
way to consider them just as ``types''. We could imagine that they be
authorized one day outside type declarations. For example like this:
fun (x : [ A | B ]) -> ...
type t = { lab : [ A | B ] }
type u = [ C of { lab : ...} ]
The form of the last line is, by the way, the method used in the
language SML
, where record types are always anonymous.
In Camlp4
abstract syntax, there is no notion of ``type
declaration'': a type declaration is just a type. The fact that sum
types and record types are accepted only in type declarations is done
when converting into the abstract syntax which ocamlc
uses.
Motivation for the empty type
As the type constructor definition is closed, it is possible to
imagine the empty type. Not very useful, but we have it without any
cost: a type inhabited by nothing (empty set).
Motivation for the currified syntax for constructors
This reflects the actual semantics. There are indeed two cases, and
the values in the two cases are implemented differently. The arity of
constructors are more clear.
In normal syntax, it is difficult to understand (and to explain) why
if C is a constructor with two parameters, this is accepted:
fun C (x, y) -> (x, y)
but not that:
fun C x -> x
In revised syntax you have to write:
fun [ C x y -> (x, y) ]
The revised syntax reflects the fact that the two parameters of the
constructor C
cannot be considered as a tuple.
This does not mean that the ``partial evaluation'' of constructors is
accepted: accept it or not is a semantic issue, treated at
OCaml
typing time.
Motivation for the uppercase for True and False
In normal syntax, true
and false
are the only
constructors which start with a lowercase letter. It is due to
historical reasons: in Caml Light
, no constructors (of any
type) need to be capitalized. When OCaml
was created, this was
changed, but strangely, true
and false
escaped to this
rule. They are even now considered as keywords, what they should not
be, since they are not syntactic constructs or part of syntactic
constructs.
In revised syntax, they must be written True
and False
and are not keywords.
Motivation for mutable syntax in records
It is just to read: ``the label x is a mutable integer'' instead of
``the mutable label x is an integer'', which is less clear.
Modules application uses the currified form:
OCaml | Revised |
type t = Set.Make(M).t;; | type t = (Set.Make M).t; |
Motivation
Currification syntax is more natural in functional languages. There is
no reason to have two different syntaxes for applications (whatever we
apply): one with parentheses, one with currification.
The classes and objects also have a revised syntax. To see it, the
simplest way is to write examples in normal syntax and to convert them
into revised syntax using the command:
camlp4o pr_r.cmo file.ml
(documentation to be updated)
Motivation for the ``else''
The else
is mandatory to avoid the ``dangling else''
problem. In normal syntax, you can write:
if a then
if b then c
else d
In the above program, the ``else d'' will actually corresponds to the
``if b'' not to the ``if a''. In revised syntax, the ``else'' being
mandatory, the problem does not exist.
OCaml
being a functional language, it is normal that the ``else'' case
be mandatory: indeed if the condition is false, what is returned by
the statement is not clear in normal syntax.
All these ``dangling'' problems cause also problems in pretty
printing: it is not easy to know if the constructions have to be
parenthesized or not. In revised syntax, there are no dangling
problems and no problem in pretty printing. To pretty print in normal
syntax, a solution had to be used, using an extra parameter
transmitted in all functions.
We remark that in revised syntax, the if
construct is not
closed, it does not need to be.
Motivation for the ``or'' and ``and'' operators
There is no reason to accept two syntaxes for the ``or'' operator and
two for the ``and'' operator. The syntaxes or
and &
are
actually old constructions, kept for an old backward compatibility.
Motivation for the suppression of begin..end
In normal syntax, the construction with begin
and end
is
actually the same than the parentheses: often a question of personal
taste. In normal syntax, when parenthesis is necessary, some
programmers prefer "begin match...end"
, other "(match...)"
.
In revised syntax, the cases when such a parenthesization is necessary
is much less frequent, since most constructions are already
parenthesized. Two constructions for that are not necessary.
Motivation for syntax for alone operators
To avoid the case of the *
operator which must be specifically
written with spaces around it, since (*)
in lexically
interpreted as a beginning of a comment.
Motivation for the fact that there are no automatic infixes
Since we are under Camlp4, we can use Camlp4 features.
Motivation for the ``declare'' construction
Essential when a syntax extension in OCaml
structure item
generates several structure items. For example, if you make a syntax
change in order that a type declaration generates 1/ the type
declaration itself and 2/ functions to be applied to this type.
When converted into OCaml
normal syntax tree, this construct is
inlined.
Motivation for the keyword "parser"
, rather than
"parse"
Actually, it is not different from the choice of the normal syntax,
since the same keyword is used.
The keyword ``parser'' is like ``function'', not like ``match''. The
``match'' and ``try'' statements are direct actions, with their
immediate parameters. On the other hand, the parsers and functions are
just ``concepts'': they are not immediately applied with their
parameters. One must read: ``this is a parser'' just like ``this is
a function''.
The word ``parse'' might have been used if the construction was
``parse xxx with''. This is written ``match xxx with parser'' in order
to save a keyword.
Motivation for [:
instead of [<
It is a question of readability, because of the presence of quotations
in our extended language, whose syntax use many ``less'' and
``greater'' characters. And it is a problem for a list of quoted
things:
[<:expr< xx >>; <:expr< yy >>]
Motivation for quotes and backquotes
Actually, this should have been done in OCaml
normal syntax,
since from Caml Light
to OCaml
, the character used to
enclose characters changed from backquote into right quote. It would
have been then normal to invert that for the streams terminals, but
it was forgotten.
In normal syntax, this creates sometimes problems in characters streams:
parser [< '('a' | 'b') >] -> ...
The lexer interprets the first parenthesis as a character, which causes
thus parsing error. You must add a space before the left parenthesis:
parser [< ' ('a' | 'b') >] -> ...
In revised syntax, which backquotes, this problem does not appear.
Motivation for closing the syntax of parsers
To resolve the same problem of ``dangling bar'' than for functions,
matches and tries. This syntax is closed the same way.
Motivation for the empty parser
Useful in initial cases in iterations or initial references values.