Database queries
The implementation of a database, its interface, and its query
language is a project far too ambitious for the scope of this book and
for the Objective CAML knowledge of the reader at this point. However,
restricting the problem and using the functional programming style
at its best allows us to create an interesting tool for query
processing. For instance, we show how to use iterators as well as
partial application to formulate and execute queries. We also show the
use of a data type encapsulating functional values.
For this application, we use as an example a database on the members
of an association. It is presumed to be stored in the file
association.dat.
Data format
Most database programs use a ``proprietary'' format to store
the data they manipulate. However, it is usually possible to store the
data as some text that has the following structure:
-
the database is a list of cards separated by
carriage-returns;
- each card is a list of fields separated by some given
character,
':'
in our case;
- a field is a string which contains no carriage-return nor the
character
':'
;
- the first card is the list of the names associated with the
fields, separated by the character
'|'
.
The association data file starts with:
Num|Lastname|Firstname|Address|Tel|Email|Pref|Date|Amount
0:Chailloux:Emmanuel:Université P6:0144274427:ec@lip6.fr:email:25.12.1998:100.00
1:Manoury:Pascal:Laboratoire PPS::pm@lip6.fr:mail:03.03.1997:150.00
2:Pagano:Bruno:Cristal:0139633963::mail:25.12.1998:150.00
3:Baro:Sylvain::0144274427:baro@pps.fr:email:01.03.1999:50.00
The meaning of the fields is the following:
-
Num is the member number;
- Lastname, Firstname, Address, Tel, and Email
are obvious;
- Pref indicates the means by which the member wishes to be
contacted: by mail (mail), by email (email), or by
phone (tel);
- Date and Amount are the date and the amount of the last
membership fee received, respectively.
We need to decide what represention the program should use internally
for a database. We could use either a list of cards
or an array of cards. On the one hand, a list has the nice property of
being easily modified: adding and removing a card are simple operations.
On the other hand, an array allows constant access time to any card.
Since our goal is to work on all the cards and not on some of them,
each query accesses all the cards. Thus a list is a good choice. The
same issue arises concerning the cards themselves: should they be
lists or arrays of strings? This time an array is a good choice,
since the format of a card is fixed for the whole database. It not
possible to add a new field. Since a query might access only a
few fields, it is important for this access to be fast.
The most natural solution for a card would be to use an array indexed
by the names of the fields. Since such a type is not available in
Objective CAML, we can use an array (indexed by integers) and a
function associating a field name with the array index
corresponding to the field.
# type
data_card
=
string
array
;;
# type
data_base
=
{
card_index
:
string
->
int
;
data
:
data_card
list
}
;;
Access to the field named n of a card dc of
the database db is implemented by the function:
# let
field
db
n
(dc
:
data_card)
=
dc.
(db.
card_index
n)
;;
val field : data_base -> string -> data_card -> string = <fun>
The type of dc has been set to data_card to
constrain the function field to only accept string arrays and
not arrays of other types.
Here is a small example:
# let
base_ex
=
{
data
=
[
[|
"Chailloux"
;
"Emmanuel"
|]
;
[|
"Manoury"
;
"Pascal"
|]
]
;
card_index
=
function
"Lastname"
->0
|
"Firstname"
->1
|
_->
raise
Not_found
}
;;
val base_ex : data_base =
{card_index=<fun>;
data=[[|"Chailloux"; "Emmanuel"|]; [|"Manoury"; "Pascal"|]]}
# List.map
(field
base_ex
"Lastname"
)
base_ex.
data
;;
- : string list = ["Chailloux"; "Manoury"]
The expression field
base_ex
"Lastname"
evaluates to a function
which takes a card and returns the value of its
"Lastname"
field. The library function List.map applies the
function to each card of the database base_ex, and returns
the list of the results: a list of the
"Lastname"
fields of the database.
This example shows how we wish to use the functional style in our
program. Here, the partial application of field allows us to
define an access function for a given field, which we can use on any
number of cards. This also shows us that the implementation of the
field function is not very efficient, since although we are
always accessing the same field, its index is computed for each
access. The following implementation is better:
# let
field
base
name
=
let
i
=
base.
card_index
name
in
fun
(card
:
data_card)
->
card.
(i)
;;
val field : data_base -> string -> data_card -> string = <fun>
Here, after applying the function to two arguments, the index of the
field is computed and is used for any subsequent application.
Reading a database from a file
As seen from Objective CAML, a file containing a database is just a list of
lines. The first work that needs to be done is to read each line as a
string, split it into smaller parts according to the separating
character, and then extract the corresponding data as well as the field
indexing function.
Tools for processing a line
We need a function split that splits a string at every occurrence of
some separating character. This function uses the function
suffix which returns the suffix of a string s after
some position i. To do this, we use three
predefined functions:
-
String.length returns the length of a string;
- String.sub returns the substring of s
starting at position i and of length l;
- String.index_from computes the position of the first occurrence
of character c in the string s,
starting at position n.
# let
suffix
s
i
=
try
String.sub
s
i
((String.length
s)-
i)
with
Invalid_argument("String.sub"
)
->
""
;;
val suffix : string -> int -> string = <fun>
# let
split
c
s
=
let
rec
split_from
n
=
try
let
p
=
String.index_from
s
n
c
in
(String.sub
s
n
(p-
n))
::
(split_from
(p+
1
))
with
Not_found
->
[
suffix
s
n
]
in
if
s=
""
then
[]
else
split_from
0
;;
val split : char -> string -> string list = <fun>
The only remarkable characteristic in this implementation is the use
of exceptions, specifically the exception Not_found.
Computing the data_base structure
There is no difficulty in creating an array of strings from a list of
strings, since this is what the of_list function in the
Array module does. It might seem more complicated to compute
the index function from a list of field names, but the List
module provides all the needed tools.
Starting from a list of strings, we need to code a function that
associates each string with an index corresponding to its position in
the list.
# let
mk_index
list_names
=
let
rec
make_enum
a
b
=
if
a
>
b
then
[]
else
a::(make_enum
(a+
1
)
b)
in
let
list_index
=
(make_enum
0
((List.length
list_names)
-
1
))
in
let
assoc_index_name
=
List.combine
list_names
list_index
in
function
name
->
List.assoc
name
assoc_index_name
;;
val mk_index : 'a list -> 'a -> int = <fun>
To create the association function between field names and indexes, we
combine the list of indexes and the list of names to obtain a
list of associations of the type string * int list. To look up
the index associated with a name, we use the function assoc
from the List library. The function mk_index
returns a function that takes a name and calls assoc on this
name and the previously built association list.
It is now possible to create a function that reads a file of
the given format.
# let
read_base
filename
=
let
channel
=
open_in
filename
in
let
split_line
=
split
':'
in
let
list_names
=
split
'|'
(input_line
channel)
in
let
rec
read_file
()
=
try
let
data
=
Array.of_list
(split_line
(input_line
channel
))
in
data
::
(read_file
())
with
End_of_file
->
close_in
channel
;
[]
in
{
card_index
=
mk_index
list_names
;
data
=
read_file
()
}
;;
val read_base : string -> data_base = <fun>
The auxiliary function read_file reads records from the
file, and works recursively on the input channel. The base case of
the recursion corresponds to the end of the file, signaled by the
End_of_file exception. In this case, the empty list is
returned after closing the channel.
The association's file can now be loaded:
# let
base_ex
=
read_base
"association.dat"
;;
val base_ex : data_base =
{card_index=<fun>;
data=
[[|"0"; "Chailloux"; "Emmanuel"; "Universit\233 P6"; "0144274427";
"ec@lip6.fr"; "email"; "25.12.1998"; "100.00"|];
[|"1"; "Manoury"; "Pascal"; "Laboratoire PPS"; ...|]; ...]}
General principles for database processing
The effectiveness and difficulty of processing the data in a
database is proportional to the power and complexity of the query
language. Since we want to use Objective CAML as query language, there is no
limit a priori on the requests we can express! However, we
also want to provide some simple tools to manipulate cards and their
data. This desire for simplicity requires us to limit the power of the
Objective CAML
language, through the use of general goals and principles for database
processing.
The goal of database processing is to obtain a state of the
database. Building such a state may be decomposed into three
steps:
-
selecting, according to some given criterion, a set of
cards;
- processing each of the selected cards;
- processing all the data collected on the cards.
Figure 6.1 illustrates this decomposition.
Figure 6.1: Processing a request.
According to this decomposition, we need three functions of the
following types:
-
(data_card -> bool) -> data_card list -> data_card list
- (data_card -> 'a) -> data_card list -> 'a list
- ('a -> 'b -> 'b) -> 'a list -> 'b -> 'b
Objective CAML provides us with three higher-order function, also known as
iterators, introduced page ??, that satisfy our
specification:
# List.find_all
;;
- : ('a -> bool) -> 'a list -> 'a list = <fun>
# List.map
;;
- : ('a -> 'b) -> 'a list -> 'b list = <fun>
# List.fold_right
;;
- : ('a -> 'b -> 'b) -> 'a list -> 'b -> 'b = <fun>
We will be able to use them to implement the three steps of building a
state by choosing the functions they take as an argument.
For some special requests, we will also use:
# List.iter
;;
- : ('a -> unit) -> 'a list -> unit = <fun>
Indeed, if the required processing consists only of displaying some
data, there is nothing to compute.
In the next paragraphs, we are going to see how to define functions
expressing simple selection criteria, as well as simple queries. We
conclude this section with a short example using these functions
according to the principles stated above.
Selection criteria
Concretely, the boolean function corresponding to the selection
criterion of a card is a boolean combination of properties of some or
all of the fields of the card. Each field of a card, even though it
is a string, can contain some information of another type: a float, a
date, etc.
Selection criteria on a field
Selecting on some field is usually done using a function of the type
data_base -> 'a -> string -> data_card -> bool. The
'a type parameter corresponds to the type of the information
contained in the field. The string argument corresponds to
the name of the field.
String fields
We define two simple tests on strings: equality with another string,
and non-emptiness.
# let
eq_sfield
db
s
n
dc
=
(s
=
(field
db
n
dc))
;;
val eq_sfield : data_base -> string -> string -> data_card -> bool = <fun>
# let
nonempty_sfield
db
n
dc
=
(""
<>
(field
db
n
dc))
;;
val nonempty_sfield : data_base -> string -> data_card -> bool = <fun>
Float fields
To implement tests on data of type float, it is enough to translate
the string representation of a decimal number into its
float value. Here are some examples obtained from a generic
function
tst_ffield:
# let
tst_ffield
r
db
v
n
dc
=
r
v
(float_of_string
(field
db
n
dc))
;;
val tst_ffield :
('a -> float -> 'b) -> data_base -> 'a -> string -> data_card -> 'b = <fun>
# let
eq_ffield
=
tst_ffield
(=
)
;;
# let
lt_ffield
=
tst_ffield
(<
)
;;
# let
le_ffield
=
tst_ffield
(<=
)
;;
(* etc. *)
These three functions have type:
data_base -> float -> string -> data_card -> bool.
Dates
This kind of information is a little more complex to deal with, as it
depends on the representation format of dates, and requires that we define
date comparison.
We decide to represent dates in a card as a string with format
dd.mm.yyyy
. In order to be able to define additional
comparisons, we also allow the replacement of the day, month or year
part with the underscore character ('_'
). Dates are
compared according to the lexicographic order of lists of integers of
the form [year; month; day]. To express queries such as: ``is
before July 1998'', we use the date pattern:
"_.07.1998". Comparing a date with a pattern is
accomplished with the function tst_dfield which analyses the
pattern to create the ad hoc comparison function. To define this
generic test function on dates, we need a few auxiliary functions.
We first code two conversion functions from dates
(ints_of_string) and date patterns
(ints_of_dpat) to lists of ints.
The character '_'
of a pattern will be replaced by the integer
0:
# let
split_date
=
split
'.'
;;
val split_date : string -> string list = <fun>
# let
ints_of_string
d
=
try
match
split_date
d
with
[
d;m;y]
->
[
int_of_string
y;
int_of_string
m;
int_of_string
d]
|
_
->
failwith
"Bad date format"
with
Failure("int_of_string"
)
->
failwith
"Bad date format"
;;
val ints_of_string : string -> int list = <fun>
# let
ints_of_dpat
d
=
let
int_of_stringpat
=
function
"_"
->
0
|
s
->
int_of_string
s
in
try
match
split_date
d
with
[
d;m;y]
->
[
int_of_stringpat
y;
int_of_stringpat
m;
int_of_stringpat
d
]
|
_
->
failwith
"Bad date format"
with
Failure("int_of_string"
)
->
failwith
"Bad date pattern"
;;
val ints_of_dpat : string -> int list = <fun>
Given a relation r on integers, we now code the test function.
It simply consists of implementing the lexicographic order, taking
into account the particular case of 0:
# let
rec
app_dtst
r
d1
d2
=
match
d1,
d2
with
[]
,
[]
->
false
|
(0
::d1)
,
(_::
d2)
->
app_dtst
r
d1
d2
|
(n1::d1)
,
(n2::d2)
->
(r
n1
n2)
||
((n1
=
n2)
&&
(app_dtst
r
d1
d2))
|
_,
_
->
failwith
"Bad date pattern or format"
;;
val app_dtst : (int -> int -> bool) -> int list -> int list -> bool = <fun>
We finally define the generic function tst_dfield which
takes as arguments a relation r, a database db, a
pattern dp, a field name nm, and a card
dc. This function checks that the pattern and the field from
the card satisfy the relation.
# let
tst_dfield
r
db
dp
nm
dc
=
r
(ints_of_dpat
dp)
(ints_of_string
(field
db
nm
dc))
;;
val tst_dfield :
(int list -> int list -> 'a) ->
data_base -> string -> string -> data_card -> 'a = <fun>
We now apply it to three relations.
# let
eq_dfield
=
tst_dfield
(=
)
;;
# let
le_dfield
=
tst_dfield
(<=
)
;;
# let
ge_dfield
=
tst_dfield
(>=
)
;;
These three functions have type:
data_base -> string -> string -> data_card -> bool.
Composing criteria
The tests we have defined above all take as first arguments a
database, a value, and the name of a field. When we write a query, the
value of these three arguments are known. For instance, when we work
on the database base_ex, the test ``is before July 1998''
is written
# ge_dfield
base_ex
"_.07.1998"
"Date"
;;
- : data_card -> bool = <fun>
Thus, we can consider a test as a function of type data_card
-> bool. We want to obtain boolean combinations of the results of
such functions applied to a given card. To this end, we implement the
iterator:
# let
fold_funs
b
c
fs
dc
=
List.fold_right
(fun
f
->
fun
r
->
c
(f
dc)
r)
fs
b
;;
val fold_funs : 'a -> ('b -> 'a -> 'a) -> ('c -> 'b) list -> 'c -> 'a = <fun>
Where b is the base value, the function c is the
boolean operator, fs is the list of test functions on a
field, and dc is a card.
We can obtain the conjunction and the disjunction of a list of tests with:
# let
and_fold
fs
=
fold_funs
true
(&
)
fs
;;
val and_fold : ('a -> bool) list -> 'a -> bool = <fun>
# let
or_fold
fs
=
fold_funs
false
(or)
fs
;;
val or_fold : ('a -> bool) list -> 'a -> bool = <fun>
We easily define the negation of a test:
# let
not_fun
f
dc
=
not
(f
dc)
;;
val not_fun : ('a -> bool) -> 'a -> bool = <fun>
For instance, we can use these combinators to define a selection
function for cards whose date field is included in a given range:
# let
date_interval
db
d1
d2
=
and_fold
[
(le_dfield
db
d1
"Date"
);
(ge_dfield
db
d2
"Date"
)]
;;
val date_interval : data_base -> string -> string -> data_card -> bool =
<fun>
Processing and computation
It is difficult to guess how a card might be processed, or the data that
would result from that processing. Nevertheless, we can consider two
common cases: numerical computation and data formatting for printing.
Let's take an example for each of these two cases.
Data formatting
In order to print, we wish to create a string containing the name
of a member of the association, followed by some information.
We start with a function that reverses the splitting of a line
using a given separating character:
# let
format_list
c
=
let
s
=
String.make
1
c
in
List.fold_left
(fun
x
y
->
if
x=
""
then
y
else
x^
s^
y)
""
;;
val format_list : char -> string list -> string = <fun>
In order to build the list of fields we are interested in, we code the
function extract that returns the fields associated
with a given list of names in a given card:
# let
extract
db
ns
dc
=
List.map
(fun
n
->
field
db
n
dc)
ns
;;
val extract : data_base -> string list -> data_card -> string list = <fun>
We can now write the line formatting function:
# let
format_line
db
ns
dc
=
(String.uppercase
(field
db
"Lastname"
dc))
^
" "
^
(field
db
"Firstname"
dc)
^
"\t"
^
(format_list
'\t'
(extract
db
ns
dc))
^
"\n"
;;
val format_line : data_base -> string list -> data_card -> string = <fun>
The argument ns is the list of requested fields. In the
resulting string, fields are separated by a tab ('\t'
)
and the string is terminated with a newline character.
We display the list of last and first names of all members with:
# List.iter
print_string
(List.map
(format_line
base_ex
[])
base_ex.
data)
;;
CHAILLOUX Emmanuel
MANOURY Pascal
PAGANO Bruno
BARO Sylvain
- : unit = ()
Numerical computation
We want to compute the total amount of received fees for a given set
of cards. This is easily done by composing the extraction and
conversion of the correct field with the addition. To get nicer
code, we define an infix composition operator:
# let
(++
)
f
g
x
=
g
(f
x)
;;
val ++ : ('a -> 'b) -> ('b -> 'c) -> 'a -> 'c = <fun>
We use this operator in the following definition:
# let
total
db
dcs
=
List.fold_right
((field
db
"Amount"
)
++
float_of_string
++
(+.
))
dcs
0
.
0
;;
val total : data_base -> data_card list -> float = <fun>
We can now apply it to the whole database:
# total
base_ex
base_ex.
data
;;
- : float = 450
An example
To conclude, here is a small example of an application that uses the
principles described in the paragraphs above.
We expect two kinds of queries on our database:
-
a query returning two lists, the elements of the first
containing the name of a member followed by his mail address,
the elements of the other containing the name of the member
followed by his email address, according to his preferences.
- another query returning the state of received fees for a given
period of time. This state is composed of the list of last and
first names, dates and amounts of the fees as well as the total amount
of the received fees.
List of addresses
To create these lists, we first select the relevant cards according to
the field "Pref"
, then we use the formatting function
format_line:
# let
mail_addresses
db
=
let
dcs
=
List.find_all
(eq_sfield
db
"mail"
"Pref"
)
db.
data
in
List.map
(format_line
db
[
"Mail"
]
)
dcs
;;
val mail_addresses : data_base -> string list = <fun>
# let
email_addresses
db
=
let
dcs
=
List.find_all
(eq_sfield
db
"email"
"Pref"
)
db.
data
in
List.map
(format_line
db
[
"Email"
]
)
dcs
;;
val email_addresses : data_base -> string list = <fun>
State of received fees
Computing the state of the received fees uses the same technique:
selection then processing. In this case however the processing part is
twofold: line formatting followed by the computation of the total
amount.
# let
fees_state
db
d1
d2
=
let
dcs
=
List.find_all
(date_interval
db
d1
d2)
db.
data
in
let
ls
=
List.map
(format_line
db
[
"Date"
;"Amount"
]
)
dcs
in
let
t
=
total
db
dcs
in
ls,
t
;;
val fees_state : data_base -> string -> string -> string list * float = <fun>
The result of this query is a tuple containing a list of strings
with member information, and the total amount of received fees.
Main program
The main program is essentially an
interactive loop that displays the result of queries asked by the
user through a menu. We use here an imperative style, except for the
display of the results which uses an iterator.
# let
main()
=
let
db
=
read_base
"association.dat"
in
let
finished
=
ref
false
in
while
not
!
finished
do
print_string" 1: List of mail addresses\n"
;
print_string" 2: List of email addresses\n"
;
print_string" 3: Received fees\n"
;
print_string" 0: Exit\n"
;
print_string"Your choice: "
;
match
read_int()
with
0
->
finished
:=
true
|
1
->
(List.iter
print_string
(mail_addresses
db))
|
2
->
(List.iter
print_string
(email_addresses
db))
|
3
->
(let
d1
=
print_string"Start date: "
;
read_line()
in
let
d2
=
print_string"End date: "
;
read_line()
in
let
ls,
t
=
fees_state
db
d1
d2
in
List.iter
print_string
ls;
print_string"Total: "
;
print_float
t;
print_newline())
|
_
->
()
done;
print_string"bye\n"
;;
val main : unit -> unit = <fun>
This example will be extended in chapter 21
with an interface using a web browser.
Further work
A natural extension of this example would consist of adding type
information to every field of the database. This information would be
used to define generic comparison operators with type
data_base -> 'a -> string -> data_card -> bool
where the name of the field (the third argument) would trigger the
correct conversion and test functions.