Algebraic Data Types

All the types that we used so far were either already provided by the standard Prelude, such as Float, Char, Double, or Int, or they were lists or tuples of these types. We also used type synonyms to define more convenient and descriptive names, as in

type Point = (Float, Float)
type Line  = (Point, Point)

While type synonyms are convenient, they only associate new names with existing types. Sometimes we need entirely new types to be able to model data properly. For example, to create more varied graphics, we might want to extent the Line type with a line style; then, instead of always drawing solid lines, we can offer dotted and dashed lines as alternative styles. How can we represent this attribute? We might be tempted to use strings and define

type LineStyle = String
type FancyLine = (Point, Point, LineStyle)

and use it as in

myLine :: FancyLine
myLine = ((0, 0), (1, 1), "dashed")

Unfortunately, this has a number of serious drawbacks. For one, it's easy to introduce mistakes that will lead to unpredictable behaviour — we might accidentally write

myLine :: FancyLine
myLine = ((0, 0), (1, 1), "bashed")

which (as far as the compiler is concerned) conforms to the type definition. After all, "bashed" is a String, even if it is not a valid line style. Consequently, each function using the line styles has to check —at runtime— if the string is a valid style. Here is an example of such a function (where the function elem checks if an item is an element of a given list):

changeLineStyle :: FancyLine -> LineStyle -> FancyLine
changeLineStyle (x, y, _) newStyle
  | newStyle `elem` ["solid", "dashed", "dotted"] = (x, y, newStyle)
  | otherwise 
  = error $ "error in changeLineStyle: " ++ newStyle ++ " is not a valid style"

This is unsatisfactory! It is a hassle to implement all the error checking — and easily forgotten, too. It slows down code execution (string comparison is a relatively time consuming operation), but worst of all: invalid line styles cannot be spotted by the compiler before we run the program; instead, we get runtime errors.

A much better approach is to introduce a new data type that comprises exactly the three admissible line style value, so that the compiler can reject any program that attempts to use an illegal value. In Haskell, the keyword data introduces the definition of a data type. It is followed by the name of the new type and its values (separated by | characters), as in

data LineStyle
  = Solid
  | Dashed
  | Dotted

The identifiers representing the elements of the new type (here, Solid, Dashed, and Dotted) are called the data constructors of type LineStyle. Just as type names, they must start with an uppercase letter. Now, we can define

myLine :: FancyLine
myLine = ((0, 0), (1, 1), Dashed)

and, if we misspell,

myLine :: FancyLine
myLine = ((0, 0), (1, 1), Bashed)   -- Error: unknown data constructor

the compiler will be able to assist us by pointing out that the data constructor Bashed is unknown. If we accidentally start the data constructor with a lower case letter, as in

myLine :: Line
myLine = ((0,0), (1, 1), dashed)    -- Error: identifier not in scope

the compiler will treat it as a variable and report that dashed is not in scope. Moreover, the check for a runtime error in our previous definition of changeLineStyle is now obsolete and we can simply write

changeLineStyle :: FancyLine -> LineStyle -> FancyLine
changeLineStyle (x, y, _) newStyle = (x, y, newStyle)

All around, a better solution!

Enumeration Types

Types, such as LineStyle, which consist of fixed set of simple data constructors, are called enumeration types. Another example of an enumeration type are the days of the week:

data Day
  = Sunday
  | Monday
  | Tuesday
  | Wednesday
  | Thursday
  | Friday
  | Saturday
  deriving (Enum)

In addition to defining the new data type Day, the deriving clause in this example instructs the Haskell compiler to include the new type Day into the standard type class Enum — the code for the methods of that type class are automatically synthesised by the Haskell compiler for us. Hence, we can now make use of Haskell's range syntax for values of Day — for example,

[Monday .. Friday]   ⇒   [Monday, Tuesday, Wednesday, Thursday, Friday]

Another well-known enumeration type is Bool, which is not a somewhat magical builtin type, but —in the Prelude— simply defined as

data Bool = False | True

To summarise, definitions of enumeration types have the general form

data <TypeName> = <DataConstructorName1> || <DataConstructorNameN>
                deriving <Classes>

where the deriving clause is optional and the name of the new type as well as all the names of data constructors need to start with an uppercase letter.

Pattern matching and case expressions

Values of enumeration types are often scrutinised by way of pattern matching. For example, let us write a function that checks whether a given Day is a week day. If it is a week day, we want to return True; otherwise, False.

isWeekday :: Day -> Bool
isWeekday Sunday   = False
isWeekday Saturday = False
isWeekday _        = True

With this definition, we have

isWeekday Monday   ⇒   True

Pattern matching by way of multiple equations in a function definition (such as in the definition of isWeekday) is merely a convenient shorthand for the more verbose, but also more flexible case expressions, which can occur anywhere in an expression (not just in the argument position of a function definition). Hence, we can rewrite isWeekend as follows:

isWeekday :: Day -> Bool
isWeekday day = case day of
                  Sunday   -> False
                  Saturday -> False
                  _        -> True

Case expressions have the following general form

case <Expr> of
  <Pattern1> -> <BodyExpr1>
  <Pattern2> -> <BodyExpr2>

where <Expr> can be an arbitrary expression. Evaluation attempts to match the value of <Expr> against the patterns from top to bottom, selecting the first alternative whose pattern matches. Evaluation, then, continues with the body expression of the selected alternative. If no pattern matches, evaluation raises a runtime exception signalling a pattern match failure. In the definition of isWeekday, the pattern of the last alternative is an anonymous variable, which matches all values — it effectively serves as the default case.

Case expressions are another example of Haskell notation where the indentation matters: all the patterns need to be indented to the same level (c.f., Layout). However, we can always opt to group the patterns explicitly using curly braces, separated by semicolon. It is usually considered good style to nevertheless align the patterns, such as in

isWeekday :: Day -> Bool
isWeekday day = case day of
                { Sunday   -> False
                ; Saturday -> False
                ; _        -> True
                }

With explicit braces and semicolon, layout is no longer required — i.e., the following code is also correct, albeit not particularly readable:

isWeekday :: Day -> Bool
isWeekday day = case day of {Sunday -> False ; Saturday -> False; _ -> True}

Deriving type classes

Even simple functions, such as isWeekday, can usually be implemented in a variety of ways. As an alternative to the explicit pattern matching in isWeekday, we could construct a list containing the two days of the weekend, and then, test —with the Predude function elem— whether the argument to isWeekday is contained in that list. This leads to a compact one-liner:

isWeekday :: Day -> Bool
isWeekday day = not $ day `elem` [Saturday, Sunday]

Unfortunately, it gives us a compiler error of the form

No instance for (Eq Day) arising from a use of ‘elem’
In the second argument of ‘($)’, namely
  ‘day `elem` [Saturday, Sunday]’
In the expression: not $ day `elem` [Saturday, Sunday]
In an equation for ‘isWeekday’:
    isWeekday day = not $ day `elem` [Saturday, Sunday]

What went wrong? The type of elem is

elem :: Eq a => a -> [a] -> Bool

The function elem needs to compare its first argument with the elements of the list for equality. As we discussed in A First Glance at Overloading, this requires the type of compared data items to be a member of the type class Eq. Unfortunately, that is not the case for Day.

How can we fix this? In general, if we want a type to be a member of a type class, we have to provide a definition for the functions constituting the methods of the type class. Such a definition is called a type class instance for that particular type and we will discuss this topic in detail in a subsequent chapter.

However, for some type classes, the compiler can derive (that is, generate) a standard definition automatically. We have encountered this before in the form of the deriving clause for the Enum type class. The derivation process is quite straight forward when generating Eq instances for enumeration types. Presented with a new enumeration type of the form

data <TypeName> = <DataConstructorName1> || <DataConstructorNameN>

the compiler simply generates

<DataConstructorName1> == <DataConstructorName1> = True<DataConstructorNameN> == <DataConstructorNameN> = True
_                      == _                      = False

Hence, if we define Day as

data Day
  = Sunday
  | Monday
  | Tuesday
  | Wednesday
  | Thursday
  | Friday
  | Saturday
  deriving (Eq, Enum)

we can immediately use ==, and hence, also elem on Day values — thus, our one-liner for isWeekday will now work as intended. The second method defined by Eq is inequality /= and that is generated in an analogous manner.

In addition to Eq and Enum, it is common to derive instances for Ord (to obtain comparison operators, such as <, <=, and so on) as well as Show (to be able to use show, print, and similar).

Whether you want to derive Eq, Ord, and Enum depends very much on how you use the values of a particular data type; in contrast, deriving Show is almost always a good idea to help with experimentation and debugging. Without being a member of Show, neither Haskell for Mac nor GHCi can display values of a user-defined type. Hence, we almost always add at least deriving (Show) to a data defnition.

Product Types as Parametrised Data Constructors

In Spirals, Snowflakes & Trees: Recursion in Pictures, we defined a simple interface for line graphics including the following type synonym definitions:

-- LineGraphics Interface
type Point   = (Float, Float)
type Vector  = (Float, Float)
type Line    = (Point, Point)
type Colour  = (Int, Int, Int, Int)    -- red, green, blue, opacity

Type synonyms are convenient. They shorten type signatures (e.g., we can write Colour instead of (Int, Int, Int, Int)) and they document the meaning of function arguments (e.g., they let us decide whether a particular use of the type (Float, Float) represents a Point or a Vector). They however do not improve type safety. More specifically, if we accidentally pass a value of type Point instead of a Vector, the Haskell compiler will not complain — after all, in either case, we have got a pair of two floating-point values.

In other words, type synonyms give new names to existing types, but during type checking, the Haskell compiler does not distinguish between the new type name and the existing type that the new name refers to. To illustrate this, let us consider a function that moves a point by n-times a given vector:

movePointN :: Float -> Vector -> Point -> Point
movePointN n (vx, vy) (x, y) = (n * vx + x, n * vy + y)

Now, if we accidentally pass the arguments in the incorrect order, as in movePointN 5 point vector, the Haskell system will happily accept our program and perform a computation that we did not intend.

To avoid this problem —and thus, to increase type safety— a type synonym is not sufficient, we need to introduce a new data type instead:

data Point = Point Float Float
           deriving (Show, Eq)

Just as our previous type synonym to represent points, this data type definition introduces the new type name Point. In addition, it also introduces a new data constructor, which is also named Point. This new data constructor gets two parameters (or arguments), each of type Float; hence, we can construct a value of type Point as follows:

zeroPoint :: Point
zeroPoint = Point 0 0   -- point at (0, 0)

The type signature of the data constructor is

Point :: Float -> Float -> Point

Here, the Point on the left is the data constructor (a value) and the Point on the right is the newly introduced type. We could also use different names for the type and the data constructor — e.g., we could call the data constructor MkPoint instead. However, it is a common idiom in Haskell to use the same name if a type has only got a single data constructor. After all, there is no risk to confuse them — one is a value and the other is a type, and types and values are kept strictly separate in Haskell.

The general form of a data type definition with just one data constructor is

data <Type> = <Constructor> <Type1><TypeN>
            deriving <Classes>

where the type of the data constructor is

<Constructor> :: <Type1> ->-> <TypeN> -> <Type>

Such data type definitions are called product types, just like a tuple, they combine values of multiple types into a new compound value. In order for the deriving clause to be valid, all parameter types <Type1> to <TypeN> need to be members of all the type classes listed in <Classes>.

Analogous to Point, we introduce a corresponding definition for vectors:

data Vector = Vector Float Float
            deriving (Show, Eq)

Now, we can no longer accidentally confuse expressions that build a vector, such as Vector 1 1, with those that construct a point, such as Point 1 1. Moreover, they have got distinct types, which enables the type checker to keep points and vectors apart as well. If we want to convert one into the other, we need an explicit casting function that performs the conversion:

pointToVector :: Point -> Vector
pointToVector (Point x y) = Vector x y

Pattern matching decomposes the function argument of type Point into its two components x and y, corresponding to the two arguments of the data constructor.

Now, we are in a position define a type safe version of the function movePointN, using the new definitions of Point and Vector:

movePointN :: Float -> Vector -> Point -> Point
movePointN n (Vector vx vy) (Point x y) 
  = Point (n * vx + x) (n * vy + y)

We now have

movePointN 5 (Vector 1 3) (Point 0 0)   ⇒   Point (5 * 1 + 0) (5 * 3 + 0)   ⇒   Point 5 15

If we accidentally confuse the order of the parameters, the type checker will complain as vectors and points are now two distinct types.

Finally, we can also redefine Colour as a product type instead of as a type synonym for a quadruple (4-tuple):

data Colour = Colour Int Int Int Int  -- red, green, blue, and opacity component
            deriving (Show, Eq)

We of course also need to adapt the definition of the pre-defined colour, such as:

red :: Colour
red = Colour 255 0 0 255

Sum Types as Alternative Data Constructors

Now that we improved the type safety of the simple interface for line graphics from Spirals, Snowflakes & Trees: Recursion in Pictures, we are ready to take the next step and to extent its functionality. So far, all pictures were composed of simple paths (i.e., sequences of points connected by coloured lines). Next, we will enrich the representation of pictures to include closed shapes, such as circles, ellipses, and polygons. In addition to a customisable line colour, we will add line styles, and for closed shapes, also a fill style.

To this end, we introduce a new data type PictureObject. A value of type PictureObject, can be either a path, a circle, an ellipse, or a polygon. A simple enumeration type is not sufficient, though; in contrast to Day and LineStyle, where the name of the data constructor was all the information needed to characterise a variant, picture objects are more demanding. For example, for a path, we need to maintain the list of points that constitute the path, along with a line style and colour. As in the case of Point and Vector, we need a product type; i.e., the data constructor Path requires parameters that describe the characteristics of the represented path. More precisely, we want the data constructor Path to have the following type:

Path :: [Point] -> Colour -> LineStyle -> PictureObject

In other words, to construct a value of type PictureObject with the data constructor Path, we need to apply the latter to a list of points, a colour, and a line style. For example, we can now define a path object as follows

myPath :: PictureObject
myPath = Path [Point 210 200, Point 270 200, Point 545 600,
               Point 525 600, Point 380 390, Point 250 600,
               Point 230 600, Point 370 380, Point 260 215,
               Point 210 215] red Solid

If we render that path, we get the picture in the following display.

Lambda path

How about circles? A circle is usually characterised by its center point as well as its radius. For our purposes, we also want to fix a line style and a fill style, where we use the following enumeration type for the latter:

data FillStyle
  = NoFill
  | SolidFill
  deriving (Eq, Show)

Overall, we want the Circle constructor to have the type

Circle :: Point -> Float -> Colour -> LineStyle -> FillStyle -> PictureObject

where the line style has no effect whenever the fill style is Solid. Equipped with this data constructor, we can define three different circles as follows:

dashedCircle, dottedCircle, solidCircle :: PictureObject
dashedCircle = Circle (Point 400 400) 180 blue  Dashed NoFill
dottedCircle = Circle (Point 400 400)  90 green Dotted NoFill
solidCircle  = Circle (Point 400 400)  20 red   Solid  SolidFill

If we render all three into a single picture, we get

Three circles

In a similar manner, we characterise an ellipse by its center point, its length and width, and the rotation of the x-axis of the ellipse, augmented by a colour, a line style, and fill style. Finally, a polygon, like a path, is characterised by a list of points for the polygon edges, but unlike a path, it has a fill style.

To combine the Path, Circle, Ellipse, and Polygon constructors into a single data type definition for PictureObject, we combine the notation for enumeration types with that for product types. More precisely, we separate the alternative data constructors by a vertical bar | and add the types of the data constructor parameters as arguments, which gets us

data PictureObject 
  = Path    [Point]                   Colour LineStyle 
  | Circle  Point   Float             Colour LineStyle FillStyle 
  | Ellipse Point   Float Float Float Colour LineStyle FillStyle 
  | Polygon [Point]                   Colour LineStyle FillStyle
  deriving (Show, Eq)

As before, we derive the type classes Show and Eq, which requires that the type parameters of all data constructors in that data definition are members of both type classes. Enumeration types where some or all of the data constructors are parameterised are more generally called sum types, or, put differently, conventional enumeration types are sum types where no data constructor has got any parameters.

Data type definitions in Haskell are generally speaking sum types (the various alternative data constructors) of product types (the multiple arguments of data constructors). As we will discuss in detail in a subsequent chapter, these definitions can also be recursive and they may be parameterised. Data types of this nature are also known as algebraic data types.

The following three definitions make use of the Ellipse constructor:

redEllipse, greenEllipse, blueEllipse :: PictureObject
redEllipse   = Ellipse (Point 400 400) 300 100      0 red   Solid SolidFill
greenEllipse = Ellipse (Point 400 400) 300 100 (pi/4) green Solid SolidFill
blueEllipse  = Ellipse (Point 400 400) 300 100 (pi/2) blue  Solid SolidFill

In the following display, we render these three types of ellipses into individual images.

red spiral

purple spiral

blue spiral

Generating pictures

To produce more interesting pictures, we programmatically create picture objects and combine them into a list to obtain a composite picture:

type Picture = [PictureObject]

For example, to generate a sequence of n ellipses, which are identical except for their rotation around their center, we start with a list of angles generated from a range:

[0, pi/n..(n-1) * pi/n]

We turn this into an ellipse per angle using the Prelude function map from Higher-order Functions:

simpleEllipsePic :: Float -> Picture
simpleEllipsePic n = map greenEllipse [0, pi/n..(n-1) * pi/n]
  where
    greenEllipse angle = Ellipse (Point 400 400) 250 70 angle myGreen Dashed SolidFill
    myGreen            = Colour 27 230 34 80

Alternatively, for a more compact definition, we can use a lambda abstraction:

simpleEllipsePic :: Float -> Picture
simpleEllipsePic n
= map (\angle -> Ellipse (Point 400 400) 250 70 angle myGreen Dashed SolidFill)
      [0, pi/n..(n-1) * pi/n]
  where
    myGreen = Colour 27 230 34 80

By applying simpleEllipsePic to the values 8, 12, and 60, respectively, we get the following images.

red spiral

purple spiral

blue spiral

The following screencast illustrates picture generation by way of a few examples.

Transforming pictures

In addition to generating picture objects, we can transform them; for example, by translating a picture object along a vector.

movePictureObject :: Vector -> PictureObject -> PictureObject

For each variant of the data type PictureObject, the function has to do something different. For instance, to move a path, every point in the path has to be translated by the given vector. We do this by defining a helper function that reuses the function that we defined before — the definition is compact due to its use of partial application:

movePoint :: Vector -> Point -> Point
movePoint = movePointN 1

We map this function, partially applied to vector, over the points in path that we want to move:

movePictureObject vector (Path points colour lineStyle)
  = Path (map (movePoint vector) points) colour lineStyle

For circles and ellipses, only the center point has to be moved:

movePictureObject vector (Circle center radius colour lineStyle fillStyle)
  = Circle (movePoint vector center) radius colour lineStyle fillStyle
movePictureObject vector (Ellipse center width height rotation colour lineStyle fillStyle)
  = (Ellipse (movePoint vector center) width height rotation colour lineStyle fillStyle)

Finally, the code for a polygon is almost identical to that for a path:

movePictureObject vector (Polygon points colour lineStyle fillStyle)
  = Polygon (map (movePoint vector) points) colour lineStyle fillStyle

To illustrate the behaviour of movePictureObject graphically, we plot a few different shapes along the path described by sine and cosine curves:

curves :: Picture
curves
  = map makeCircleSin  xvals ++
    map makePolygonCos xvals ++
    map makeEllipseSin xvals ++
    map makePathCos    xvals
  where
    redCircle      = Circle  (Point 20 380) 10 red Solid SolidFill
    greenEllipse   = Ellipse (Point 20 380) 5 30 0 green Solid SolidFill
    whitePolygon   = Polygon [Point 20 380, Point 40 380, Point 40 420] white Solid SolidFill
    bluePath       = Path    [Point 20 380, Point 40 380, Point 40 420] blue Solid 
    xvals          = [0,10..780]
    --
    makeCircleSin  x = movePictureObject (Vector x $ 100 * sin (pi * x/200)) redCircle
    makePolygonCos x = movePictureObject (Vector x $ 100 * cos (pi * x/200)) whitePolygon
    makeEllipseSin x = movePictureObject (Vector x $ 250 * sin (pi * x/200)) greenEllipse
    makePathCos    x = movePictureObject (Vector x $ 200 * cos (pi * x/200)) bluePath

In this function, we define the shapes using a position on the left hand side of the picture. The list xvals contains the various distances by which each shape is to be moved along the x-axis; these are [0, 10, 20, 30, ... 780]. The function makeCircleSin, given such an x-distance yields a circle shape identical to the initial one, but moved to the left by x and up (or down) by a value calculated by a scaled sine function. We do the same for the other shapes, but vary the scaling factor and whether we use the sine or cosine. The resulting image is the following.

Shapes along sine and cosine curves

Download

To work through the examples in this chapter, please download the Haskell for Mac project with the PictureObject definition and an implementation of drawPicture: ShapeGraphics.hsproj. This project includes two Haskell modules: ShapeGraphics and ShapeExamples. The former contains the graphics definitions for you to use. The latter is where you can add code while working through this chapter. (We already included some of the definitions.)

Haskell for Mac

Haskell for Mac

Just open ShapeGraphics.hsproj in Haskell for Mac and proceed as in the screencasts.

Other Haskell

Command Line Haskell

Run GHCi inside the ShapeGraphics.hsproj directory after downloading and unpacking. Then, load the ShapeExamples module. To visualise your drawings, you need to write them to disk as a PNG image file. To do so, use the writePng function as follows:

writePng FILENAME (drawPicture LINEWIDTH PICTURE)

Records

By adopting user-defined data types for the graphics interface, we improved type safety and made the interface more expressive than the original line graphics from Spirals, Snowflakes & Trees: Recursion in Pictures. However, there is still room for improvement. For example, the definition of Colour as a product type improved type safety over our original use of a type synonym. However, the type data definition still provides no indication as to which argument of the Colour data constructor represents the, say, blue component of the colour. Moreover, as all colour components are of the same type, confusing them will not lead to a type error, but simply result in an incorrect program. To improve this situation, Haskell offers record syntax for product types.

data Colour
  = Colour { redC      :: Int
           , greenC    :: Int
           , blueC     :: Int
           , opacityC  :: Int
           }
  deriving (Show, Eq)

Here we associate each parameter of the data constructor Colour with a name, which ought to indicate its meaning. Now, quite clearly, the blue component is the one labelled blueC. We call each parameter of a data constructor using record syntax a field and its name a field label. Product types in record syntax are also called structs.

With that definition, we can create values of type Colour either exactly as before (and ignore the field names when applying the data constructor)

red :: Colour
red = Colour 255 0 0 255

or we can use the names to clarify the meaning of each argument:

red :: Colour
red = Colour{redC = 255, opacityC = 255, blueC = 0, greenC = 0}

In the latter case, the fields can be initialised in any order. If we forget a field, the compiler will issue a warning —not a fatal error— and keep the field undefined.

In the same manner in which we can use field names to construct a value of record type, we can use them in pattern matching to identify data constructor arguments. Here we extract the green component of a colour:

greenComponent :: Colour -> Int
greenComponent Color{redC = red, opacityC = alpha, blueC = blue, greenC = green}
  = green

In fact, we can keep this shorter: as we are only interested in the green component, we can ignore the rest and write

greenComponent :: Colour -> Int
greenComponent Color{greenC = green} = green

As extracting of individual fields of a record is a very common operation, Haskell's record syntax actually provides these projection functions automatically and they carry the same name as the field that they project. In other words, the definition of Colour in record syntax implicitly defines four projection functions

redC     :: Colour -> Int
greenC   :: Colour -> Int
blueC    :: Colour -> Int
opacityC :: Colour -> Int

Given those projections, the definition of greenComponent becomes trivial:

greenComponent :: Colour -> Int
greenComponent = greenC

Although record projection functions are convenient, they come with a caveat. The projection functions are toplevel functions, and hence, all record field names need to be unique in the toplevel name space in which the data type is defined. For example, we cannot have a toplevel user-defined function called greenC in the module defining Colour; otherwise, the name of the projection function would clash with that user-defined function.

In addition to pattern matching using record syntax, we can also still pattern match using the conventional positional syntax, even if the definition of the product type does use record syntax — for example,

isOpaqueColour :: Colour -> Bool
isOpaqueColour (Colour _ _ _ opacity) = opacity == 255

We can also just use the projection function implied by the field name, as in

isOpaqueColour :: Colour -> Bool
isOpaqueColour colour = opacityC colour == 255

Projection functions and field names

When choosing names for record fields, we not only need to consider the potential for name clashes with toplevel functions, but also name classes with other record field names. For example, we might be tempted to use the following definitions:

data Point
  = Point { x :: Float, y :: Float}

data Vector
  = Vector { x :: Float, y :: Float}  -- Error: conflicting definitions of `x` and `y`

However, this leads to conflicting definitions of the two projection functions x and y — for example, x would simultaneously have to be typed as x :: Point -> Float, and as x :: Vector -> Float. As a general rule of good style, we should avoid very short and general field names (even if they do not conflict with other field names) to avoid polluting the toplevel name space. Hence, we choose

data Point
  = Point { xPoint :: Float
          , yPoint :: Float}
  deriving (Show, Eq)

data Vector
  = Vector { xVector :: Float
           , yVector :: Float}
  deriving (Show, Eq)

On the basis of these definitions, we can rewrite the function movePointN, such that it uses the field names:

movePointN :: Float -> Vector -> Point -> Point
movePointN n vector point
  = Point { xPoint = n * xVector vector + xPoint point 
          , yPoint = n * yVector vector + yPoint point
          }

However, it is important to note that we do not have to rewrite the function in this manner. The original definition of movePointN is still valid as pattern matching using positional syntax is generally valid for product types defined using record syntax. It is generally wise to choose whichever variant leads to the clearest code.

Record updates

As we saw above in the discussion of the function greenComponent, pattern matching in record syntax leads to concise and clear code when only one or a few of the record fields are being used. However, this is of no immediate help in the common situation, where we want to produce a new value of record type by altering only one or a few of the components of the argument. For instance, recall the fade function from Spirals, Snowflakes & Trees: Recursion in Pictures, which takes a colour and returns the same colour only slightly less opaque. We might define it as follows using record syntax:

fade :: Colour -> Colour
fade Colour{ redC = red, opacityC = alpha, blueC = blue, greenC = green }
  = Colour{ redC     = red
          , greenC   = green
          , blueC    = blue
          , opacityC = max 0 (opacity - 10)
          }

This is unnecessarily verbose as three out of four fields are simply copied from the argument to the result. To address this problem, record types support record update syntax of the form

<Expr> { <FieldName1> = <Expr1>, … <FieldNameN> = <ExprN> }

Here, <Expr> must evaluate to a value of a record type that has fields named <FieldName1> to <FieldNameN>. It will then, produce a new value of the same type, where fields named <FieldName1> to <FieldNameN> have got the values <Expr1> to <ExprN>, but all other fields remain as in the value of <Expr>. For example,

(Point 10 20){ xPoint = 0 }   ⇒   Point 0 20

We can use this notation to simplify the fade function considerably:

fade :: Colour -> Colour
fade colour = colour{opacityC = max 0 (opacityC colour - 10)}

For more complicated data types, the code improvement is even more significant. Let us consider a function

setLineStyle :: PictureObject -> LineStyle -> PictureObject

that replaces the line style of the PictureObject passed as a first argument by the line style given in the second argument. Given the current definition of PictureObject, we will need to write code as the following

setLineStyle :: PictureObject -> LineStyle -> PictureObject
setLineStyle (Path points colour _) newLineStyle
  = Path points colour newLineStyle
…

For each alternative in the definition of PictureObject, we need an equation that copies all constructor parameters except the line style. Haskell supports record syntax not only for pure product types (i.e., data types with only one data constructor), but also for each alternative of a sum type (with multiple data constructors). Hence, we can use record syntax to improve the definition of PictureObject:

data PictureObject 
  = Path    
    { pointsPO    :: [Point] 
    , colourPO    :: Colour
    , lineStylePO :: LineStyle
    }
  | Circle  
    { centerPO    :: Point
    , radiusPO    :: Float
    , colourPO    :: Colour
    , lineStylePO :: LineStyle
    , fillStylePO :: FillStyle 
    }
  | Ellipse 
    { centerPO    :: Point
    , widthPO     :: Float
    , heightPO    :: Float
    , rotationPO  :: Float
    , colourPO    :: Colour
    , lineStylePO :: LineStyle
    , fillStylePO :: FillStyle
    }
  | Polygon 
    { pointsPO    :: [Point]
    , colourPO    :: Colour
    , lineStylePO :: LineStyle
    , fillStylePO :: FillStyle 
    }
  deriving (Show, Eq)

We earlier discussed how all field names need to be distinct as they imply toplevel projection functions. There is an exception to this rule: if two fields belong to different data constructors of the same data definition and they have the same type, they may use the same field name. In our example, all data constructors of PictureObject have got a field lineStylePO :: LineStyle and all, but one data constructor have got a field fillStylePO :: FillStyle.

This implies that a single projection function is used for all data constructors that share the field name. For example,

lineStylePO :: PictureObject -> LineStyle

obtains the line style of a picture object regardless of whether it represents a path, circle, ellipse, or polygon. It also means that the record update expression picObj{lineStylePO = newLineStyle} sets the line style of any of the four variants that picObj :: PictureObject might assume. Hence, we can define setLineStyle simply as

setLineStyle :: PictureObject -> LineStyle -> PictureObject
setLineStyle picObj newLineStyle = picObj{lineStylePO = newLineStyle}

As a general rule, the more complex a data type gets (i.e., the more alternatives and more data constructor arguments there are), the more likely it is that record syntax improves code clarity. As a second example, let us rewrite the function movePictureObject using record updates:

movePictureObject :: Vector -> PictureObject -> PictureObject
movePictureObject vec picObj@(Path _ _ _) 
  = picObj{pointsPO = map (movePoint vec) $ pointsPO picObj}
movePictureObject vec picObj@(Polygon _ _ _ _) 
  = picObj{pointsPO = map (movePoint vec) $ pointsPO picObj}
movePictureObject vec picObj
  = picObj{centerPO = movePoint vec $ centerPO picObj}

The big win is again that we only need to mention the fields that change without any need to explicitly copy those that remain the same. We use @-patterns (discussed in Spirals, Snowflakes & Trees: Recursion in Pictures) to match a particular data constructor, while binding the matched PictureObject to the variable picObj. The computations that need to be performed for the Path and Polygon data constructors as well as those for the Circle and Ellipse data constructors are identical. We can exploit that easily for the latter two by way of the third, catch all, equation. The Path and Polygon equations, however, repeat the same computation. We could use an additional helper function to avoid that duplication, but that seems hardly worth the trouble in this particular function definition.

Finally, we can even simplify the constructor matching —which ignores all arguments— by using record syntax without any fields.

movePictureObject :: Vector -> PictureObject -> PictureObject
movePictureObject vec picObj@Path{}
  = picObj{pointsPO = map (movePoint vec) $ pointsPO picObj}
movePictureObject vec picObj@Polygon{}
  = picObj{pointsPO = map (movePoint vec) $ pointsPO picObj}
movePictureObject vec picObj
  = picObj{centerPO = movePoint vec $ centerPO picObj}

Total versus partial projections

An important consideration when using projection functions is whether they are total or partial functions — a concept that we discussed in Fundamentals. In our example, PictureObject, colourPO and lineStylePO are total functions. They always produce a result, regardless of the passed picture object. In contrast, pointsPO and fillStylePO are partial. For instance, applying pointsPO to a Circle results in a runtime exception.

The following screencast steps through the process of rewriting a function from using positional pattern matching to using record syntax.

A more complex example

Let us develop a function

rotatePictureObject :: Float -> Point -> PictureObject -> PictureObject

such that rotatePictureObject alpha pnt obj rotates obj clockwise by alpha radians around the point pnt, which represents the center of the rotation, as illustrated by the following diagram.

Rotate Object

To implement rotation, we need to carefully consider which of the four variants of PictureObject we are dealing with.

To rotate a path, we have to rotate each point constituting the path. Consequently, we need a helper function that rotates a single point around the rotation's center point as in the following illustration.

Rotate Path

To simplify this task, we reuse the rotateLine function from Spirals, Snowflakes & Trees: Recursion in Pictures. It effectively rotates the end point of the line around the start point. Using the old definition of Point based on a type synonym, the code was the following:

-- Using old definition of 'Point' as type synonym
rotateLine :: Float -> Line -> Line
rotateLine alpha l@(p1@(x1, y1), p2@(x2, y2)) 
  = ((x1, y1), (cos alpha * nx - sin  alpha * ny + x1,
                sin alpha * nx + cos  alpha * ny + y1))
  where
    nx = x2 - x1
    ny = y2 - y1

Adapting this code to the use the data type-based definition of Point, we get

rotatePoint :: Float -> Point -> Point -> Point
rotatePoint alpha (Point x0 y0) (Point x y)
  = Point (cos alpha * nx - sin alpha * ny + x0)
          (sin alpha * nx + cos alpha * ny + y0)
  where
    nx = x - x0
    ny = y - y0

We can easily map this function over the points in the path, to rotate them all in the same manner.

rotatePictureObject :: Float -> Point -> PictureObject -> PictureObject
rotatePictureObject angle point picObj@Path{}
  = picObj{ pointsPO = map (rotatePoint angle point) $ pointsPO pic }

In an analogous definition, we rotate the points constituting a polygon (which is essentially a closed path):

rotatePictureObject angle point picObj@Polygon{}
  = picObj{ pointsPO = map (rotatePoint angle point) $ pointsPO picObj }

Rotating a circle is easier — we simply rotate its center point around the rotation center:

rotatePictureObject angle point picObj@Circle{}
  = picObj{ centerPO = rotatePoint angle point $ centerPO picObj }

However, an ellipse presents an additional challenge as we also need to add the rotation factor to the rotation of the x-axis that is part of characterising an ellipse. If we add the equations for all four shapes, we get the following definition, where we combine the cases for Path and Polygon much as in the definition of movePictureObject.

rotatePictureObject :: Float -> Point -> PictureObject -> PictureObject
rotatePictureObject angle point picObj@Circle{}
  = picObj{ centerPO = rotatePoint angle point $ centerPO picObj }
rotatePictureObject angle point picObj@Ellipse{}
  = picObj{ centerPO   = rotatePoint angle point $ centerPO picObj
          , rotationPO = rotationPO picObj + angle
          }
rotatePictureObject angle point picObj
  = picObj{ pointsPO = map (rotatePoint angle point) $ pointsPO pic }

We use the rotation function in the following definition that combines multiple instances of the same magenta polygon rotated by varying degrees around a common center.

magentaPoly :: PictureObject
magentaPoly
  = Polygon [Point 430 400, Point 500 420, Point 680 400, Point 500 380] magenta Solid SolidFill
  where
    magenta = Colour{redC = 153, blueC = 153, greenC = 0, opacityC = 130}

rotatePic :: Int -> PictureObject -> Picture
rotatePic n shape = map (\n -> rotatePictureObject (0.2 * fromIntegral n) (Point 400 400) shape) [0..n]

one magenta polygon

32 magenta polygons

62 magenta polygons

By combining rotation with object translation, we can generate more dynamic pictures.

dynamicRotate:: PictureObject -> Point -> Vector -> Float -> Int -> Picture
dynamicRotate picObj _   _   _     0 = [picObj]
dynamicRotate picObj pnt vec alpha n 
  = rotatePicObj : dynamicRotate (movePictureObject vec rotatePicObj)(movePoint vec pnt) vec alpha (n-1)
  where
    rotatePicObj = rotatePictureObject alpha pnt picObj

purple spiral

green lattice

red spiral

Data Constructors Versus Functions

Data constructors and functions are somewhat similar. Both are applied to arguments, and then, produce a new entity. In fact, both share the same types. For example, we earlier discussed the type of Circle:

Circle :: Point -> Float -> Colour -> LineStyle -> FillStyle -> PictureObject

Hence, we may regard data constructors as a special kind of function, a function that wraps its arguments up, such that they can be unwrapped again by pattern matching. This is in contrast to other functions, where we cannot recover the arguments after the function has been applied. For example, if we know that the application of the function (+) resulted in the Int value 10, we can't undo this application and determine the arguments. Hence, data constructors are sometimes called free or uninterpreted functions — they merely store, but do not process their arguments.

Moreover, as data type definitions consist of a fixed number of data constructors, we (and the compiler) can infer from the type of an expression a set of patterns that will exhaustively cover all possible values that the expression may evaluate to. In fact, the Haskell system uses that information to generate a warning where patterns are not exhaustive, as this often constitutes an oversight that likely leads to incorrect program behaviour. For example, in the case of a PictureObject, we know the only way to produce a value of this type is as the result of the application of either Path, Circle, Ellipse, or Polygon. We can always determine which one it was, and what the arguments were, either by pattern matching in a function equation or by using a case expression:

showComponents :: PictureObject -> String
showComponents pic
  = case pic of
      Path points colour lineStyle
        -> "The value is the result of the application of `Path` to " ++
           show points ++
           " and " ++
           show colour ++
           " and " ++
           show linestyle
      Circle ...

As data constructors are just a special form of function, we can also partially apply data constructors: the function zeroX is defined in terms of a partial application of the Point constructor, which has type Float -> Float -> Point, to only the first argument:

zeroX :: Float -> Point
zeroX = Point 0

Applying zeroX to 5 yields, as expected, Point 0 5. Pattern matching can only be used on a data constructor once it has been applied to all its arguments — that is, we cannot use pattern matching on zeroX to extract the 0 without first applying zeroX to a second argument.

Generalised syntax for algebraic data types

The similarity between data constructors and functions is the foundation for an alternative syntax for declaring algebraic data types, often called the generalised syntax for algebraic data types, as it enables declarations that cannot be expressed with the standard syntax. We defer discussing the new capabilities enabled by the generalised syntax to a subsequent chapter of this tutorial, as it is an extensive topic in itself. However, we outline the basic notation here, as it nicely illustrates the notion of data constructors as functions. Specifically, the generalised syntax simply consists out of declaring data constructors by way of their function signature and is identified by using the where keyword instead of an equality symbol (=) after the declaration head. For example,

data Point = Point Float Float

becomes

data Point where
  Point :: Float -> Float -> Point

Similarly, the initial version of the PictureObject data type can be defined as follows with the generalised syntax:

data PictureObject where
  Path    :: [Point] ->                            Colour -> LineStyle              -> PictureObject
  Circle  :: Point   -> Float ->                   Colour -> LineStyle -> FillStyle -> PictureObject
  Ellipse :: Point   -> Float -> Float -> Float -> Colour -> LineStyle -> FillStyle -> PictureObject
  Polygon :: [Point] ->                            Colour -> LineStyle -> FillStyle -> PictureObject

To declare records with the generalised syntax, we put the record fields in braces, much like in the standard syntax. For example, we have

data PictureObject where
  Path :: { pointsPO    :: [Point] 
          , colourPO    :: Colour
          , lineStylePO :: LineStyle
          } -> PictureObject

The generalised syntax for algebraic data types (also known as GADT syntax) is available as an extension to the standard Haskell language. To use that extension, we need to put the following language pragma at the top of the Haskell module or playground, where we want to use it:

{-# LANGUAGE GADTSyntax #-}

Record Puns and Wildcards

We already saw that the record notation permits some freedom in how to decompose records. For instance, we had the example

greenComponent :: Colour -> Int
greenComponent Color{greenC = green} = green

where we bind the value of the record field greenC to the local variable green. We also saw that we can use projection functions instead:

greenComponent :: Colour -> Int
greenComponent colour = greenC colour

Even more compact notation is enabled by a language extension known as record punning, which we can use if we put the pragma

{-# LANGUAGE NamedFieldPuns #-}

at the top of a module or playground. Punning introduces a shorthand for binding record fields to local variables having the same name as the record field. For example,

greenComponent :: Colour -> Int
greenComponent Color{greenC} = greenC   -- with punning

is the same as

greenComponent :: Colour -> Int
greenComponent Color{greenC = greenC} = greenC   -- without punning

We can also use punning in the construction of records, as in

red = Colour{redC, opacityC, blueC, greenC}
  where
    redC     = 255
    opacityC = 255
    blueC    = 0
    greenC   = 0

Finally, record wildcards, enabled by {-# LANGUAGE RecordWildCards #-}, take this idea one step further. Instead of enumerating the field names, we can use ellipsis to enable punning for all field names that are not explicitly mentioned. This enables us to write the previous example as

red = Colour{..}
  where
    redC     = 255
    opacityC = 255
    blueC    = 0
    greenC   = 0

and the projection of the green component as

greenComponent :: Colour -> Int
greenComponent Color{..} = greenC   -- with wildcard punning

By explicitly mentioning some names, we can, for example, use pattern matching on selected fields, as in this example of a function that returns the opacity of any colour with a fully saturated green channel:

brightGreenOpacity :: Colour -> Int
brightGreenOpacity Color{greenC = 255, ..} = opacityC
brightGreenOpacity _                       = error "green not saturated"

Exercises

  1. Rewrite the defintion of map

    map :: (a -> b) -> [a] -> [b]
    map f [] = []
    map f (x : xs) = f x : map f xs

    to use case notation — i.e., complete the following definition

    map f xs = case xs of
  2. Write a function which, given a day, returns the data constructor representing the following day:

    nextDay :: Day -> Day
  3. How would you define a data type to represent the different cards of a deck of poker cards? How would you represent a hand of cards?

    Can you define a function which, given a hand of cards calculates its values according to the 21-rules: that is, all the cards from 2 to 10 are worth their face value. Jack, Queen, King count as 10. The Ace card is worth 11, but if this would mean the overall value of the hand exceeds 21, it is valued at 1.

  4. Can you define a function simpleCirclePic :: Float -> Picture, similar to simpleEllipsePic that we defined earlier, such that it generates a picture of circles, which increase in size in dependence on the floating point argument, such that the radius of the biggest circle does not exceed 400. The generated pictures should look like those in the following display.

    four stacked purple circles

    many stacked purple circles

    stacked purple circles bleeding into each other

  5. Sierpinski triangles are another well-know fractal pattern. Can you define a function

    sierpinski :: Int -> Picture

    which, when applied to 0 to 5, produces the following pictures?

    Sierpinski triangle for 0

    Sierpinski triangle for 1

    Sierpinski triangle for 2

    Sierpinski triangle for 3

    Sierpinski triangle for 4

    Sierpinski triangle for 5

  6. Can you adapt the colouredFTree tree function from Higher-order Functions to the new point, line, colour, and picture types, such that they produce trees consisting of coloured polygons (specifically, pentagons formed from the base square and the triangle used to construct the tree components) such that the function can produce the trees displayed below?

    blue ragged tree

    red and blue wide tree

    green low tree

    As another variation on tree generation, change the algorithm to produce rectangles of various length-width ratios instead of just squares as the basis for polygon construction. This leads to somewhat more organic looking shapes as those in the next set of images.

    blue fern-like tree

    round red tree

    green and brown organic tree