Warning! This blog's new home now is here.

# CIS 194, week 2, Algebraic Data Types

14 Aug 2014

This is a post about week 2 of the CIS 194: Introduction to Haskell course by Brent Yorgey, from the Penn University of Pennsilvania.

This week is all about Algebraic Data Types (ADT), not to be confused with Abstract Data Types (also ADT) which are another topic.

Haskell has enumeration types (like Java, but still less verbose and more intuitive). An example:

``````data Food = Pizza
| Bacon
deriving Show
``````

We have just declared a new (algebraic data) type called `Food`, with three data constructors which are the only values of the type `Food`.

We can now define new functions on the new data type using pattern matching:

``````isTempting:: Food -> Boolean
isTempting Salad = False
isTempting _ = True
``````

But in Haskell enumeration types are only a special case of Algebraic Data Types. One common class of ADT is called sum type (a.k.a. tagged union). A simple example of ADT which is not an enumeration is this:

``````data OperationResult = OK
| Error Integer
deriving Show
``````

Here we see that `Error` is a data constructor that takes an argument of type `Integer`. We can construct new `OperationResult` values using the `Error` data constructor:

``````success:: OperationResult
success = OK

failure:: OperationResult
failure = Error 404
``````

`OK` is a value of type `OperationResult` (since it's a data constructor with zero arguments), but `Error` by itself it's not. We have to pass an `Integer` value to it to build an `OperationResult` with it.

We've just introduced polymorphic data types. Specifically, we can have type signatures with variables just as we can have function implementations with variables. The difference here is that while in actual code variables are symbols bound to values, in type variables are bound to types of values. In other words, types are actually values in type signatures. We're reasoning on a higher and more abstract level. Take a moment to contemplate this fact.

Formally, in Haskell an ADT is a type with one or more data constructors, each one of them can have zero or more arguments.

A general example that shows how to build values is:

``````isSafeDiv:: Double -> Double -> OperationResult
isSafeDiv _ 0 = Error 1000
isSafeDiv _ _ = OK
``````

We can also use pattern matching to make decisions based on the structure of the `OperationResult` value and bind variables to the arguments:

``````isSuccessful:: OperationResult -> Boolean
isSuccessful OK = True
isSuccessful (Error n) = False
``````

It's idiomatic in Haskell when you have an algebraic data type with a single data constructor, to name it like the data type itself. Example:

``````data Person = Person String String Int
deriving Show
``````

This can be done since types and data constructors live in separate namespaces.

## Pattern Matching

In general, pattern matching is a way to know what data constructor has been used to create a value of a certain ADT, and to take apart its arguments. Effectively, in Haskell this is the only way to make decisions.

``````pat ::= _
| var
| var @ (pat)
| (Constructor pat1 pat2 ... patn)
``````

In order:

• `_` is a wildcard.
• We can pattern match against literal values (for example: `OK`).
• We can pattern match against a pattern, and still bind the entire value to a variable.
• We can pattern match against a data constructor (even recursively).

It's worth noting that types like `Int` can be viewed like an ADT:

``````data Int = 0 | 1 | 2 | ...
``````

Indeed, we can pattern match on its values. But, perhaps obviously, they are not implemented like that in the compiler.

### Case expressions

A way (the only one actually) to do pattern matching is by using case expressions:

``````case exp of
pat1 -> exp1
pat2 -> exp2
...
``````

For example, we could reimplement the `isSuccessful` function from earlier using a case expression:

``````isSuccessful:: OperationResult -> Boolean
isSuccessful op = case op of
OK -> True
(Error n) -> False
``````

However it's more elegant to use the first version. Indeed, the syntax for doing pattern matching in a function definition is just syntactic sugar on case expressions.

## Recursive algebraic data types

It's interesting to note that ADTs can be recursive. For example, let's define a list of integers:

``````data IntList = Empty
| Cons Int IntList
``````

This definition can be read as: "an `IntList` is either an `Empty` one or an `Int` value followed by an `IntList`". This kind of definition is quite clear and elegant (see Church encoding). For example:

``````-- [1,2,3] can be represented as an IntList:
l:: IntList
l = Cons 1 (Cons 2 (Cons 3 Empty))
``````

A recursive data ADT naturally leads to recursive functions. For example, to calculate the sum of all the values in an `IntList`:

``````calcSum:: IntList -> Int
calcSum Empty = 0
calcSum (Cons n ns) = n + calcSum ns
``````

So, we've seen so far that type signatures can have variables, and can be recursive. Sounds like we could have a Turing-complete type system... indeed, we have one. Someone even implemented a LISP interpreter that completely runs on the Haskell type system!

That's all for this week. Remember: do the exercises!