# More about Algebraic Data Types

The first chapter on Algebraic Data Types discussed the fundamental aspects of algebraic data types. In the following, we go deeper by discussing parameterised and recursive data types. The former are also called generic in other programming languages and the latter enable us to construct arbitrary tree structures.

## Parameterised Data Types

In Fundamentals, we discussed the concept of partial functions; that is, functions which only successfully return result values for a subset of their possible argument values — we call this subset the domain of the function. If the function is applied to arguments outside its domain, it raises a runtime error. For example, the index function `(!!) :: [a] -> Int -> a` only returns a value if the second argument, the index, is greater or equal to zero, and the list has sufficiently many elements. Applying it to other combinations of values will trigger a runtime error. Consider the the following examples, where we assume `xs = ['a', 'b', 'c', 'd']`:

`xs!!2`   ⇒   `'c'`

`xs!!4`   ⇒   `*** Exception: Prelude.!!: index too large`

`xs!!(-1)`   ⇒   `*** Exception: Prelude.!!: negative index`

If we want to avoid such runtime errors, we need to test whether arguments are admissible before calling a partial function, such as `(!!)`, as in the following example:

``````showFifthElement :: Show a => [a] -> String
showFifthElement xs
= if length xs < 5
then "there is no fifth element in this list"
else "the fifth element of the list is: " ++ show (xs !! 4)``````

While that test is simple (even if inefficient) in the above example, that is not always the case. In general, especially if the tests become more involved, it is nicer to turn the partial function (here `(!!)`) into a total one by wrapping its return type into a new data type that avoids failure by turning it into a special return value:

``````data MaybeInt
= Just Int
| Nothing``````

We can use this new data type to define a total indexing function on lists of `Int`s. (Note how infix functions can be defined for custom symbols using the infix notation in the function definition.)

``````(!+!) :: [Int] -> Int -> MaybeInt
[]      !+! _ = Nothing
(x : _) !+! 0 = Just x
(_ :xs) !+! n = xs !+! (n - 1)``````

Using this new function, the definition of `showFifthElement` becomes:

``````showFifthElement :: [Int] -> String
showFifthElement xs
= case xs !+! 4 of
Nothing -> "there is no fifth element in this list"
Just n  -> "the fifth element of the list is: " ++ show n``````

We can call the total index function `!+!` without first having to calculate the length of the list to check that it has at least five elements. This is more efficient and avoids an awkward dependency between the test and the implementation of the function whose pre-condition is being tested. However, since the result of the total indexing function is not a straight `Int` value, we have to use `case` to check whether the call was a failure, and extract the `Int` value in case of success.

It would be awkward to define separate data type wrappers, such as `MaybeInt`, for every result type of an otherwise partial function. Just as the builtin Haskell list type (and many functions defined on it) can be used with lists of varying element types, we can define a parametric version of `MaybeInt` by replacing the `Int` argument to `Just` with a type variable that becomes an argument to the type constructor `Maybe`:

``````data Maybe a
= Just a
| Nothing``````

We can read this definition as: the type `Maybe a`, where `a` can be any type, is either `Nothing` or `Just x`, where `x` is a value of type `a`. In other words, this definition describes a whole family of types, where the type variable `a` can be instantiated to any type. For example, to define `!+!`, we use `Maybe Int`, which has the values `Nothing` and `Just n` for any `Int` value `n`.

The `Maybe` type is so commonly used that it is already provided in the `Prelude`. We use it to provide a polymorphic definition of total list indexing:

``````(!+!) :: [a] -> Int -> Maybe a
[]      !+! _ = Nothing
(x : _) !+! 0 = Just x
(_ :xs) !+! n = (!+!) xs (n - 1)``````

With this definition, `showFifthElement` works on lists on any type in the `Show` class:

``````showFifthElement :: Show a => [a] -> String
showFifthElement xs
= case xs !+! 4 of
Nothing ->  "there is no fifth element in this list"
Just n  ->  "the fifth element of the list is: " ++ show n``````

The use of `Maybe` is illustrated in the following screencast.

### Generalised Syntax

In Algebraic Data Types, we briefly discussed the generalised notation for algebraic data types enabled with the `GADTs` language pragma. In that generalised notation, `Maybe` is defined as

``````data Maybe a where
Just     :: a -> Maybe a
Nothing  ::      Maybe a``````

The data constructor `Just` can be viewed as a function that takes a value of type `a` and returns a `Maybe a`, and `Nothing` as a constant which returns a `Maybe a` value of any type `a`. In other words, the data constructors of a parametric data type are themselves parametric polymorphic functions — i.e., functions whose type signature contain unconstrained type variables, here the `a`.

## Recursive Type Constructors

Just like the `Maybe` type constructor, the list type constructor `[]` is parametric — i.e., it takes as a type argument the list's element type. And although lists come with special, builtin syntax (the square bracket notation), we can define a data type that is functionally equivalent to standard Haskell lists. Standard Haskell lists come with two parametric polymorphic data constructors: `[] :: [a]`, which returns an empty list of any type `a`, and `(:) :: a -> [a] -> [a]`, which adds an element to the front of a list. The new and interesting point about lists is that the type of the second argument of the data constructor `(:)` is the list type itself — in other words, the definition of lists depends on itself. Hence, we call such types recursive (just as with functions whose definition depends on itself).

The following recursive data type definition corresponds functionally to that of the builtin list type. However, it uses two alphanumeric data constructor, `Cons` and `Nil`, instead of the builtin special syntax, `[]` and `(:)`. (Strictly speaking, user-defined data constructors can be formed from infix symbols, starting with a colon character (`:`). However, `[]` is not in the realm of notation that can be used in user-provided definitions.)

``````data List a
= Cons a (List a)
| Nil``````

or, alternatively, in the generalised syntax

``````data List a where
Cons :: a -> List a -> List a
Nil  ::                List a``````

Conceptually, our new `List` type is the same as builtin lists, just without the syntactic sugar of the bracket notation. Hence, we need to write

``Cons (Cons (Cons (Const 4 Nil) 3) 2) 1)``

instead of the more compact `[1, 2, 3, 4]`. We can, however, use infix notation as with `1 : 2 : 3 : 4 : []` by enclosing `Cons` in backquotes:

``1 `Cons` (2 `Cons` (3 `Cons` (4 `Cons` Nil)))``

(We could even avoid the parentheses by using a suitable infix operator declaration to fix the associativity of `Cons`.)

So in practice, we will rarely find the need to define lists ourselves (and instead use the syntactically more convenient builtin ones). However, lists as a collection type are limited, and we will see that we can often do better with user-defined recursive data types of a different structure.

## Binary Trees

If we store a collection of items in a list `[a]`, we can easily insert a new item by making this item the new head of the list using the list constructor `(:) :: a -> [a] -> [a]`.

``````insert :: a -> [a] -> [a]
insert  = (:)``````

Checking if an item is in the list is also straightforward: we can use the `elem` function from the Prelude or define it ourselves — in either case, we need the element type `a` to conform to the type class `Eq`, though:

``````isElement :: Eq a => a -> [a] -> Bool
isElement _ []     = False
isElement a (x:xs)
| a == x         = True
| otherwise      = isElement a xs``````

As we can see from this definition, the `isElement` function has no choice but to potentially look at every single item in the list to determine whether it is the value we are looking for. Depending on the length of the list, this can be rather time consuming. Can we do better?

### Sorted lists

If we had to do such a task in the physical world, we might try to keep our sequence sorted, so we can stop the search as soon as we reach an item which is smaller than the item we are looking for.

``````-- precondition: list is sorted in increasing order
isElementSorted :: Ord a  => a -> [a] -> Bool
isElementSorted _ []     = False
isElementSorted a (x:xs)
| a == x               = True
| a > x                = False
| otherwise            = isElementSorted a xs``````

Now that we need greater `(>)`, the `Eq a` constraint is no longer sufficient and we need to use `Ord` to ensure the element type `a` supports a total ordering. Nevertheless, Haskell cannot ensure the informally stated precondition, namely that the argument list is actually sorted. Hence, if we apply `elemSorted` inadvertently to an unsorted list, we may get an incorrect result:

`isElementSorted 3 [1, 2, 5, 3]`   ⇒   `False`

Nevertheless, if applied to sorted lists, `isElementSorted` only has to inspect every single item if the value of the item we are looking for is greater or equal to the last one in the list. This is slightly better than our original definition `isElement`, but it does not actually solve the problem. In particular, we do not get this improvement for free, as we have to change the implementation of the `insert` function, to ensure that, if we insert in a sorted list, the resulting list is sorted as well:

``````-- precondition: list is sorted in increasing order
-- postcondition: return list is sorted in increasing order
insertSorted :: (Eq a, Ord a)  => a -> [a] -> Bool
insertSorted x []     = [x]
insertSorted x (y:ys)
| x <= y            = x : y : ys
| otherwise         = y : insertSorted x ys``````

In contrast to `insert`, which takes just a single step, no matter how long the list, `insertSorted` may have to traverse the whole list to find the correct spot for the new element; so, overall, merely using sorted lists is not likely to improve our program.

To see what went wrong, let's again consider how we would deal with such a task in real life. To search for a particular item in a sorted stack, for example, we definitely wouldn't start from the first item and proceed, one by one, to the last. Instead, we might split the stack into roughly equal halves. Then, we would check if the item we are looking for is the middle one and, if not is, whether it is bound to be in the first or second half, depending on whether the item we are looking for is smaller or greater than we one in the middle. Based on this check we can discard either the first or second half of the pile and continue our search with the same strategy in the remaining stack. This is significantly faster, in particular for large sequences, as we discard about half of the remaining items in every step, instead of just a single one. This strategy is called binary search.

### Binary trees

Unfortunately, binary search isn't directly applicable to lists! There is simply no way in which we can directly get to the middle of a list without traversing the first half, element by element.

What is the alternative? What we need to implement binary search is a data structure that lets us get to the middle of a collection of sorted elements in one step; then, we need to be able to do the same with whichever half of the original collection we select after the first comparison. Such a recursive structure, which has more than one sub-structures of the same type is a tree. Our tree has exactly two subtrees, and is therefore a binary tree. Its definition is (apart from the names) quite similar to that of `List`:

``````data BinaryTree a
= Node a (BinaryTree a) (BinaryTree a)
| Leaf``````

Or, if you prefer the generalised syntax:

``````data BinaryTree a where
Node :: a  -> BinaryTree a -> BinaryTree a -> BinaryTree a
Leaf ::                                       BinaryTree a``````

A `Leaf` corresponds to `Nil` and `Node` to `Cons`, with the difference that `Cons` only had one sublist, whereas `Node` has a left and the right subtree.

In contrast to lists, for which Haskell provides syntactic sugar, such that we can write `[1, 2, 3]` instead of `1 : 2 : 3 : []`, we don't have this luxury for user-defined data types. A tree which contains only the number `5`, for example, is represented by the term `Node 5 Leaf Leaf`. A tree which contains the numbers `5` and `6` can either be represented by term `Node 5 (Node 6 Leaf Leaf) Leaf`, `Node 6 (Node 5 Leaf Leaf) Leaf`, `Node 6 Leaf (Node 5 Leaf Leaf)`, or `Node 5 Leaf (Node 6 Leaf Leaf)` — not very readable! That's why we'll use diagrams to represent `BinaryTree` terms instead. A `Leaf` term is just a crossed box, and a `Node` term is a box which contains the node value, and two edges pointing to the left and right subtree.

The following screencast illustrates the construction of binary trees.

To experiment with binary trees yourself, please download the Haskell for Mac project with the `BinaryTree` definition and an implementation of `renderTree` to draw trees: Trees.hsproj. This project includes two Haskell modules: `ShapeGraphics` and `BinTree`. The former contains some of the graphics definitions we already used in previous chapters. The latter is where you can add code while working through this chapter. (We already included some definitions.) Just open `Trees.hsproj` in Haskell for Mac and proceed as in the screencasts. Run GHCi inside the `Trees.hsproj` directory after downloading and unpacking. Then, load the `BinTrees` module. To visualise your drawings, you need to write them to disk as a PNG image file. To do so, use the writePng function as follows:

``writePng FILENAME (drawPicture LINEWIDTH (renderTree BINARY_TREE))``

How do we implement functions corresponding to `isElementSorted` and `insertSorted` for tree types in Haskell? Let us start with the insertion function. As a precondition, we require that the input tree is sorted: for each node, all the values in the left subtree are less than the value stored in the node, and all the values in the right subtree are equal or greater than the value stored in the node. As a postcondition, we promise that the resulting tree will be sorted as well.

The base case of an empty input tree (a `Leaf`) merely requires us to construct a node which contains the value of the new item and whose left and right subtrees are both `Leaf`s:

``````-- precondition: tree is sorted in increasing order
-- postcondition: return tree is sorted in increasing order
insertTree :: Ord a => a -> BinaryTree a -> BinaryTree a
insertTree x Leaf
= Node x Leaf Leaf``````

If the tree is non-empty, we have to check whether to insert the item into the left or the right subtree. If the new item is less than the node value, we have to insert it in the left subtree; otherwise in the right subtree:

``````insertTree newValue (Node nodeValue leftSubtree rightSubtree)
| newValue < nodeValue = Node nodeValue (insertTree newValue leftSubtree) rightSubtree
| otherwise            = Node nodeValue leftSubtree (insertTree newValue rightSubtree)``````

The `isElementTree` function is quite similar, as we have to consider the same cases. If a tree is empty, `isElementTree` should always return `False`, independent of the value we are looking for:

``````-- precondition: tree is sorted in increasing order
isElementTree :: Ord a => a -> BinaryTree a -> Bool
isElementTree x Leaf = False``````

Otherwise, there are three possible cases: (1) the current node value is already the value we are looking for and we are done; (2) we have to search in the left subtree if the value is smaller than the value in the node; or (3) we have to search in the right subtree, otherwise.

``````isElementTree value (Node nodeValue leftSubtree rightSubtree)
| value == nodeValue  = True
| value  <  nodeValue = isElementTree value leftSubtree
| otherwise           = isElementTree value rightSubtree``````

After all this work, let's see if this indeed fixed the problem: is inserting and searching in a binary search tree faster than in a sorted list? As we observed previously, if we can reduce the size of the search space by half in each step —that is, if the number of nodes in each of the two subtrees is roughly the same— we need significantly fewer steps. More formally, `log2 n` recursive steps for a tree with `n` nodes. This means, for example, that we can search a tree with about 1000 items in 10 steps — a huge improvement over lists.

### Balancing

There is a catch, though: to half the search space in each step, the two subtrees of each node need to contain about the same number of values, or in other words, the tree needs to be balanced. Now, while `insertTree` ensures that, given a sorted binary tree, the resulting tree will be sorted again, it does not ensure that the result will be balanced.

As an example, let's consider two trees built by inserting the values from `1` to `7` in different order. Specifically, the following images display the binary trees resulting from evaluation

``````insertTree 5 \$
insertTree 7 \$
insertTree 1 \$
insertTree 3 \$
insertTree 6 \$
insertTree 2 \$
insertTree 4 Leaf``````

and

``````insertTree 7 \$
insertTree 6 \$
insertTree 5 \$
insertTree 4 \$
insertTree 3 \$
insertTree 2 \$
insertTree 1 Leaf``````

This next screencast demonstrates how insertion order affects tree shapes.

These examples clearly demonstrates that if we are not careful about the order in which we are inserting elements, the resulting binary tree will not just be unbalanced, it might even degenerate into a list. Searching for elements in such a degenerate tree is as inefficient as it is on plain lists. There exist many different approaches to ensuring that a tree is balanced in situations where we cannot control the order of insertions. They all come with different trade offs and discussing them in any detail is beyond the scope of this chapter.

## New Types

Recall the definition of the `String` type:

``type String = [Char]``

`String` is a type synonym — i.e., the types `[Char]` and `String` are the exact same thing as far as the type checker is concerned. We can use them interchangeably; `String` is just a more readable name. This is what we want, in the case of `String`, as we would like to be able to use all the standard list operations on `String` values.

In other situations, we like to introduce a new type name for an already existing type and, at the same time, prevent the original and new type to be used interchangeably. For example, we may want distinct types for various physical units, such as `Celsius` and `Fahrenheit`, both of which we represent with `Float`s. In that case, we certainly don't want values of `Celsius` and `Fahrenheit` to be used interchangeably, even if they are both represented by the same basic type, namely `Float`. We can achieve this as follows by wrapping the basic type into a data type with only one constructor taking one argument:

``````data Celsius    = Celsius    Float
data Fahrenheit = Fahrenheit Float``````

In contrast to using a `type` declaration, this ensures that `Celsius` and `Fahrenheit` are distinct types and using values of one for the other will lead to a type error.

As this is a common situation and because the wrapper data types introduce significant inefficiencies, especially when wrapping elementary types, such as `Float`, `Int`, and so forth, Haskell supports `newtype` declarations for the special case of data types with one data constructor with a single argument.

``````newtype Celsius    = Celsius    Float
newtype Fahrenheit = Fahrenheit Float``````

With those definitions, we have

`Celsius 0 == Fahrenheit 0`   ⇒   `*** Type mismatch`

## Exercises

1. The standard functions `head :: [a] -> a` and `tail :: [a] -> [a]` are partial. Implement total variants `safeHead` and `safeTail` by making use of `Maybe` in the function results.

2. Write a function `myLength :: [a] -> Int` that, given a list `l`, returns the same result as `length l`. However, implement `myLength` without any explicit pattern matching on lists; instead, use the function `safeTail` from the previous exercise to determine whether you reached the end of the list and to get the list tail in case where the end has not been reached yet.

3. Implement a function `deleteSorted :: Ord a => a -> [a] -> [a]` which removes a value passed as first argument from a sorted list given as the second argument. If the value does not occur in the list, the list is returned unchanged. Exploit the fact that the list is sorted: if an element is not present in the list, stop the search as early as possible.

4. Implement a function `deleteSmallestTree :: Ord a => BinaryTree a -> BinaryTree a` that removes the smallest value from the binary search tree passed as an argument. If the tree is empty to begin with, simply return the empty tree. If you don't know how to get started, draw some example input and result trees, and think about where to delete and how the result can be constructed.

5. Implement a recursive function `treeToList :: BinaryTree a -> [a]` that extracts all the values contained in a binary tree and returns them in a list. There are essentially three different ways in which the tree values can be serialised into a list. Which are they? How does the corresponding code look like?

6. Implementing a function `deleteTree :: Ord a => a -> BinaryTree a -> BinaryTree a` is considerably more difficult than the corresponding functions on lists. To see why, consider the balanced tree with seven elements from just before the last screencast, and think about what the result tree should look like if we want to remove the element `4`. We will discuss a solution in detail in one of the next chapters, but as a challenge, try and implement the `deleteTree` function. If your solution only works correctly for some cases, return an error with an appropriate error message for the cases your function doesn't handle.