The first chapter on Algebraic Data Types discussed the fundamental aspects of algebraic data types. In the following, we go deeper by discussing parameterised and recursive data types. The former are also called generic in other programming languages and the latter enable us to construct arbitrary tree structures.
In Fundamentals, we discussed the concept of partial functions; that is, functions which only successfully return result values for a subset of their possible argument values — we call this subset the domain of the function. If the function is applied to arguments outside its domain, it raises a runtime error. For example, the index function
(!!) :: [a] -> Int -> a only returns a value if the second argument, the index, is greater or equal to zero, and the list has sufficiently many elements. Applying it to other combinations of values will trigger a runtime error. Consider the the following examples, where we assume
xs = ['a', 'b', 'c', 'd']:
*** Exception: Prelude.!!: index too large
*** Exception: Prelude.!!: negative index
If we want to avoid such runtime errors, we need to test whether arguments are admissible before calling a partial function, such as
(!!), as in the following example:
showFifthElement :: Show a => [a] -> String showFifthElement xs = if length xs < 5 then "there is no fifth element in this list" else "the fifth element of the list is: " ++ show (xs !! 4)
While that test is simple (even if inefficient) in the above example, that is not always the case. In general, especially if the tests become more involved, it is nicer to turn the partial function (here
(!!)) into a total one by wrapping its return type into a new data type that avoids failure by turning it into a special return value:
data MaybeInt = Just Int | Nothing
We can use this new data type to define a total indexing function on lists of
Ints. (Note how infix functions can be defined for custom symbols using the infix notation in the function definition.)
(!+!) :: [Int] -> Int -> MaybeInt  !+! _ = Nothing (x : _) !+! 0 = Just x (_ :xs) !+! n = xs !+! (n - 1)
Using this new function, the definition of
showFifthElement :: [Int] -> String showFifthElement xs = case xs !+! 4 of Nothing -> "there is no fifth element in this list" Just n -> "the fifth element of the list is: " ++ show n
We can call the total index function
!+! without first having to calculate the length of the list to check that it has at least five elements. This is more efficient and avoids an awkward dependency between the test and the implementation of the function whose pre-condition is being tested. However, since the result of the total indexing function is not a straight
Int value, we have to use
case to check whether the call was a failure, and extract the
Int value in case of success.
It would be awkward to define separate data type wrappers, such as
MaybeInt, for every result type of an otherwise partial function. Just as the builtin Haskell list type (and many functions defined on it) can be used with lists of varying element types, we can define a parametric version of
MaybeInt by replacing the
Int argument to
Just with a type variable that becomes an argument to the type constructor
data Maybe a = Just a | Nothing
We can read this definition as: the type
Maybe a, where
a can be any type, is either
Just x, where
x is a value of type
a. In other words, this definition describes a whole family of types, where the type variable
a can be instantiated to any type. For example, to define
!+!, we use
Maybe Int, which has the values
Just n for any
Maybe type is so commonly used that it is already provided in the
Prelude. We use it to provide a polymorphic definition of total list indexing:
(!+!) :: [a] -> Int -> Maybe a  !+! _ = Nothing (x : _) !+! 0 = Just x (_ :xs) !+! n = (!+!) xs (n - 1)
With this definition,
showFifthElement works on lists on any type in the
showFifthElement :: Show a => [a] -> String showFifthElement xs = case xs !+! 4 of Nothing -> "there is no fifth element in this list" Just n -> "the fifth element of the list is: " ++ show n
The use of
Maybe is illustrated in the following screencast.
In Algebraic Data Types, we briefly discussed the generalised notation for algebraic data types enabled with the
GADTs language pragma. In that generalised notation,
Maybe is defined as
data Maybe a where Just :: a -> Maybe a Nothing :: Maybe a
The data constructor
Just can be viewed as a function that takes a value of type
a and returns a
Maybe a, and
Nothing as a constant which returns a
Maybe a value of any type
a. In other words, the data constructors of a parametric data type are themselves parametric polymorphic functions — i.e., functions whose type signature contain unconstrained type variables, here the
Just like the
Maybe type constructor, the list type constructor
 is parametric — i.e., it takes as a type argument the list's element type. And although lists come with special, builtin syntax (the square bracket notation), we can define a data type that is functionally equivalent to standard Haskell lists. Standard Haskell lists come with two parametric polymorphic data constructors:
 :: [a], which returns an empty list of any type
(:) :: a -> [a] -> [a], which adds an element to the front of a list. The new and interesting point about lists is that the type of the second argument of the data constructor
(:) is the list type itself — in other words, the definition of lists depends on itself. Hence, we call such types recursive (just as with functions whose definition depends on itself).
The following recursive data type definition corresponds functionally to that of the builtin list type. However, it uses two alphanumeric data constructor,
Nil, instead of the builtin special syntax,
(:). (Strictly speaking, user-defined data constructors can be formed from infix symbols, starting with a colon character (
 is not in the realm of notation that can be used in user-provided definitions.)
data List a = Cons a (List a) | Nil
or, alternatively, in the generalised syntax
data List a where Cons :: a -> List a -> List a Nil :: List a
Conceptually, our new
List type is the same as builtin lists, just without the syntactic sugar of the bracket notation. Hence, we need to write
Cons (Cons (Cons (Const 4 Nil) 3) 2) 1)
instead of the more compact
[1, 2, 3, 4]. We can, however, use infix notation as with
1 : 2 : 3 : 4 :  by enclosing
Cons in backquotes:
1 `Cons` (2 `Cons` (3 `Cons` (4 `Cons` Nil)))
(We could even avoid the parentheses by using a suitable infix operator declaration to fix the associativity of
So in practice, we will rarely find the need to define lists ourselves (and instead use the syntactically more convenient builtin ones). However, lists as a collection type are limited, and we will see that we can often do better with user-defined recursive data types of a different structure.
If we store a collection of items in a list
[a], we can easily insert a new item by making this item the new head of the list using the list constructor
(:) :: a -> [a] -> [a].
insert :: a -> [a] -> [a] insert = (:)
Checking if an item is in the list is also straightforward: we can use the
elem function from the Prelude or define it ourselves — in either case, we need the element type
a to conform to the type class
isElement :: Eq a => a -> [a] -> Bool isElement _  = False isElement a (x:xs) | a == x = True | otherwise = isElement a xs
As we can see from this definition, the
isElement function has no choice but to potentially look at every single item in the list to determine whether it is the value we are looking for. Depending on the length of the list, this can be rather time consuming. Can we do better?
If we had to do such a task in the physical world, we might try to keep our sequence sorted, so we can stop the search as soon as we reach an item which is smaller than the item we are looking for.
-- precondition: list is sorted in increasing order isElementSorted :: Ord a => a -> [a] -> Bool isElementSorted _  = False isElementSorted a (x:xs) | a == x = True | a > x = False | otherwise = isElementSorted a xs
Now that we need greater
Eq a constraint is no longer sufficient and we need to use
Ord to ensure the element type
a supports a total ordering. Nevertheless, Haskell cannot ensure the informally stated precondition, namely that the argument list is actually sorted. Hence, if we apply
elemSorted inadvertently to an unsorted list, we may get an incorrect result:
isElementSorted 3 [1, 2, 5, 3]⇒
Nevertheless, if applied to sorted lists,
isElementSorted only has to inspect every single item if the value of the item we are looking for is greater or equal to the last one in the list. This is slightly better than our original definition
isElement, but it does not actually solve the problem. In particular, we do not get this improvement for free, as we have to change the implementation of the
insert function, to ensure that, if we insert in a sorted list, the resulting list is sorted as well:
-- precondition: list is sorted in increasing order -- postcondition: return list is sorted in increasing order insertSorted :: (Eq a, Ord a) => a -> [a] -> Bool insertSorted x  = [x] insertSorted x (y:ys) | x <= y = x : y : ys | otherwise = y : insertSorted x ys
In contrast to
insert, which takes just a single step, no matter how long the list,
insertSorted may have to traverse the whole list to find the correct spot for the new element; so, overall, merely using sorted lists is not likely to improve our program.
To see what went wrong, let's again consider how we would deal with such a task in real life. To search for a particular item in a sorted stack, for example, we definitely wouldn't start from the first item and proceed, one by one, to the last. Instead, we might split the stack into roughly equal halves. Then, we would check if the item we are looking for is the middle one and, if not is, whether it is bound to be in the first or second half, depending on whether the item we are looking for is smaller or greater than we one in the middle. Based on this check we can discard either the first or second half of the pile and continue our search with the same strategy in the remaining stack. This is significantly faster, in particular for large sequences, as we discard about half of the remaining items in every step, instead of just a single one. This strategy is called binary search.
Unfortunately, binary search isn't directly applicable to lists! There is simply no way in which we can directly get to the middle of a list without traversing the first half, element by element.
What is the alternative? What we need to implement binary search is a data structure that lets us get to the middle of a collection of sorted elements in one step; then, we need to be able to do the same with whichever half of the original collection we select after the first comparison. Such a recursive structure, which has more than one sub-structures of the same type is a tree. Our tree has exactly two subtrees, and is therefore a binary tree. Its definition is (apart from the names) quite similar to that of
data BinaryTree a = Node a (BinaryTree a) (BinaryTree a) | Leaf
Or, if you prefer the generalised syntax:
data BinaryTree a where Node :: a -> BinaryTree a -> BinaryTree a -> BinaryTree a Leaf :: BinaryTree a
Leaf corresponds to
Cons, with the difference that
Cons only had one sublist, whereas
Node has a left and the right subtree.
In contrast to lists, for which Haskell provides syntactic sugar, such that we can write
[1, 2, 3] instead of
1 : 2 : 3 : , we don't have this luxury for user-defined data types. A tree which contains only the number
5, for example, is represented by the term
Node 5 Leaf Leaf. A tree which contains the numbers
6 can either be represented by term
Node 5 (Node 6 Leaf Leaf) Leaf,
Node 6 (Node 5 Leaf Leaf) Leaf,
Node 6 Leaf (Node 5 Leaf Leaf), or
Node 5 Leaf (Node 6 Leaf Leaf) — not very readable! That's why we'll use diagrams to represent
BinaryTree terms instead. A
Leaf term is just a crossed box, and a
Node term is a box which contains the node value, and two edges pointing to the left and right subtree.
The following screencast illustrates the construction of binary trees.
To experiment with binary trees yourself, please download the Haskell for Mac project with the
BinaryTree definition and an implementation of
renderTree to draw trees: Trees.hsproj. This project includes two Haskell modules:
BinTree. The former contains some of the graphics definitions we already used in previous chapters. The latter is where you can add code while working through this chapter. (We already included some definitions.)
Trees.hsproj in Haskell for Mac and proceed as in the screencasts.
Run GHCi inside the
Trees.hsproj directory after downloading and unpacking. Then, load the
BinTrees module. To visualise your drawings, you need to write them to disk as a PNG image file. To do so, use the writePng function as follows:
writePng FILENAME (drawPicture LINEWIDTH (renderTree BINARY_TREE))
How do we implement functions corresponding to
insertSorted for tree types in Haskell? Let us start with the insertion function. As a precondition, we require that the input tree is sorted: for each node, all the values in the left subtree are less than the value stored in the node, and all the values in the right subtree are equal or greater than the value stored in the node. As a postcondition, we promise that the resulting tree will be sorted as well.
The base case of an empty input tree (a
Leaf) merely requires us to construct a node which contains the value of the new item and whose left and right subtrees are both
-- precondition: tree is sorted in increasing order -- postcondition: return tree is sorted in increasing order insertTree :: Ord a => a -> BinaryTree a -> BinaryTree a insertTree x Leaf = Node x Leaf Leaf
If the tree is non-empty, we have to check whether to insert the item into the left or the right subtree. If the new item is less than the node value, we have to insert it in the left subtree; otherwise in the right subtree:
insertTree newValue (Node nodeValue leftSubtree rightSubtree) | newValue < nodeValue = Node nodeValue (insertTree newValue leftSubtree) rightSubtree | otherwise = Node nodeValue leftSubtree (insertTree newValue rightSubtree)
isElementTree function is quite similar, as we have to consider the same cases. If a tree is empty,
isElementTree should always return
False, independent of the value we are looking for:
-- precondition: tree is sorted in increasing order isElementTree :: Ord a => a -> BinaryTree a -> Bool isElementTree x Leaf = False
Otherwise, there are three possible cases: (1) the current node value is already the value we are looking for and we are done; (2) we have to search in the left subtree if the value is smaller than the value in the node; or (3) we have to search in the right subtree, otherwise.
isElementTree value (Node nodeValue leftSubtree rightSubtree) | value == nodeValue = True | value < nodeValue = isElementTree value leftSubtree | otherwise = isElementTree value rightSubtree
After all this work, let's see if this indeed fixed the problem: is inserting and searching in a binary search tree faster than in a sorted list? As we observed previously, if we can reduce the size of the search space by half in each step —that is, if the number of nodes in each of the two subtrees is roughly the same— we need significantly fewer steps. More formally,
log2 n recursive steps for a tree with
n nodes. This means, for example, that we can search a tree with about 1000 items in 10 steps — a huge improvement over lists.
There is a catch, though: to half the search space in each step, the two subtrees of each node need to contain about the same number of values, or in other words, the tree needs to be balanced. Now, while
insertTree ensures that, given a sorted binary tree, the resulting tree will be sorted again, it does not ensure that the result will be balanced.
As an example, let's consider two trees built by inserting the values from
7 in different order. Specifically, the following images display the binary trees resulting from evaluation
insertTree 5 $ insertTree 7 $ insertTree 1 $ insertTree 3 $ insertTree 6 $ insertTree 2 $ insertTree 4 Leaf
insertTree 7 $ insertTree 6 $ insertTree 5 $ insertTree 4 $ insertTree 3 $ insertTree 2 $ insertTree 1 Leaf
This next screencast demonstrates how insertion order affects tree shapes.
These examples clearly demonstrates that if we are not careful about the order in which we are inserting elements, the resulting binary tree will not just be unbalanced, it might even degenerate into a list. Searching for elements in such a degenerate tree is as inefficient as it is on plain lists. There exist many different approaches to ensuring that a tree is balanced in situations where we cannot control the order of insertions. They all come with different trade offs and discussing them in any detail is beyond the scope of this chapter.
Recall the definition of the
type String = [Char]
String is a type synonym — i.e., the types
String are the exact same thing as far as the type checker is concerned. We can use them interchangeably;
String is just a more readable name. This is what we want, in the case of
String, as we would like to be able to use all the standard list operations on
In other situations, we like to introduce a new type name for an already existing type and, at the same time, prevent the original and new type to be used interchangeably. For example, we may want distinct types for various physical units, such as
Fahrenheit, both of which we represent with
Floats. In that case, we certainly don't want values of
Fahrenheit to be used interchangeably, even if they are both represented by the same basic type, namely
Float. We can achieve this as follows by wrapping the basic type into a data type with only one constructor taking one argument:
data Celsius = Celsius Float data Fahrenheit = Fahrenheit Float
In contrast to using a
type declaration, this ensures that
Fahrenheit are distinct types and using values of one for the other will lead to a type error.
As this is a common situation and because the wrapper data types introduce significant inefficiencies, especially when wrapping elementary types, such as
Int, and so forth, Haskell supports
newtype declarations for the special case of data types with one data constructor with a single argument.
newtype Celsius = Celsius Float newtype Fahrenheit = Fahrenheit Float
With those definitions, we have
Celsius 0 == Fahrenheit 0⇒
*** Type mismatch
The standard functions
head :: [a] -> a and
tail :: [a] -> [a] are partial. Implement total variants
safeTail by making use of
Maybe in the function results.
Write a function
myLength :: [a] -> Int that, given a list
l, returns the same result as
length l. However, implement
myLength without any explicit pattern matching on lists; instead, use the function
safeTail from the previous exercise to determine whether you reached the end of the list and to get the list tail in case where the end has not been reached yet.
Implement a function
deleteSorted :: Ord a => a -> [a] -> [a] which removes a value passed as first argument from a sorted list given as the second argument. If the value does not occur in the list, the list is returned unchanged. Exploit the fact that the list is sorted: if an element is not present in the list, stop the search as early as possible.
Implement a function
deleteSmallestTree :: Ord a => BinaryTree a -> BinaryTree a that removes the smallest value from the binary search tree passed as an argument. If the tree is empty to begin with, simply return the empty tree. If you don't know how to get started, draw some example input and result trees, and think about where to delete and how the result can be constructed.
Implement a recursive function
treeToList :: BinaryTree a -> [a] that extracts all the values contained in a binary tree and returns them in a list. There are essentially three different ways in which the tree values can be serialised into a list. Which are they? How does the corresponding code look like?
Implementing a function
deleteTree :: Ord a => a -> BinaryTree a -> BinaryTree a is considerably more difficult than the corresponding functions on lists. To see why, consider the balanced tree with seven elements from just before the last screencast, and think about what the result tree should look like if we want to remove the element
4. We will discuss a solution in detail in one of the next chapters, but as a challenge, try and implement the
deleteTree function. If your solution only works correctly for some cases, return an error with an appropriate error message for the cases your function doesn't handle.