Warning! This blog's new home now is here.

CIS 194, week 2, Algebraic Data Types

14 Aug 2014

This is a post about week 2 of the CIS 194: Introduction to Haskell course by Brent Yorgey, from the Penn University of Pennsilvania.

This week is all about Algebraic Data Types (ADT), not to be confused with Abstract Data Types (also ADT) which are another topic.

Haskell has enumeration types (like Java, but still less verbose and more intuitive). An example:

data Food = Pizza
          | Bacon
          | Salad
    deriving Show

We have just declared a new (algebraic data) type called Food, with three data constructors which are the only values of the type Food.

We can now define new functions on the new data type using pattern matching:

isTempting:: Food -> Boolean
isTempting Salad = False
isTempting _ = True

But in Haskell enumeration types are only a special case of Algebraic Data Types. One common class of ADT is called sum type (a.k.a. tagged union). A simple example of ADT which is not an enumeration is this:

data OperationResult = OK
                     | Error Integer
    deriving Show

Here we see that Error is a data constructor that takes an argument of type Integer. We can construct new OperationResult values using the Error data constructor:

success:: OperationResult
success = OK

failure:: OperationResult
failure = Error 404

OK is a value of type OperationResult (since it's a data constructor with zero arguments), but Error by itself it's not. We have to pass an Integer value to it to build an OperationResult with it.

We've just introduced polymorphic data types. Specifically, we can have type signatures with variables just as we can have function implementations with variables. The difference here is that while in actual code variables are symbols bound to values, in type variables are bound to types of values. In other words, types are actually values in type signatures. We're reasoning on a higher and more abstract level. Take a moment to contemplate this fact.

Formally, in Haskell an ADT is a type with one or more data constructors, each one of them can have zero or more arguments.

A general example that shows how to build values is:

isSafeDiv:: Double -> Double -> OperationResult
isSafeDiv _ 0 = Error 1000
isSafeDiv _ _ = OK

We can also use pattern matching to make decisions based on the structure of the OperationResult value and bind variables to the arguments:

isSuccessful:: OperationResult -> Boolean
isSuccessful OK = True
isSuccessful (Error n) = False

It's idiomatic in Haskell when you have an algebraic data type with a single data constructor, to name it like the data type itself. Example:

data Person = Person String String Int
    deriving Show

This can be done since types and data constructors live in separate namespaces.

Pattern Matching

In general, pattern matching is a way to know what data constructor has been used to create a value of a certain ADT, and to take apart its arguments. Effectively, in Haskell this is the only way to make decisions.

pat ::= _
    | var
    | var @ (pat)
    | (Constructor pat1 pat2 ... patn)

In order:

_ is a wildcard.
We can pattern match against literal values (for example: OK).
We can pattern match against a pattern, and still bind the entire value to a variable.
We can pattern match against a data constructor (even recursively).

It's worth noting that types like Int can be viewed like an ADT:

data Int = 0 | 1 | 2 | ...

Indeed, we can pattern match on its values. But, perhaps obviously, they are not implemented like that in the compiler.

Case expressions

A way (the only one actually) to do pattern matching is by using case expressions:

case exp of
    pat1 -> exp1
    pat2 -> exp2
    ...

For example, we could reimplement the isSuccessful function from earlier using a case expression:

isSuccessful:: OperationResult -> Boolean
isSuccessful op = case op of
                    OK -> True
                    (Error n) -> False

However it's more elegant to use the first version. Indeed, the syntax for doing pattern matching in a function definition is just syntactic sugar on case expressions.

Recursive algebraic data types

It's interesting to note that ADTs can be recursive. For example, let's define a list of integers:

data IntList = Empty
             | Cons Int IntList

This definition can be read as: "an IntList is either an Empty one or an Int value followed by an IntList". This kind of definition is quite clear and elegant (see Church encoding). For example:

-- [1,2,3] can be represented as an IntList:
l:: IntList
l = Cons 1 (Cons 2 (Cons 3 Empty))

A recursive data ADT naturally leads to recursive functions. For example, to calculate the sum of all the values in an IntList:

calcSum:: IntList -> Int
calcSum Empty = 0
calcSum (Cons n ns) = n + calcSum ns

So, we've seen so far that type signatures can have variables, and can be recursive. Sounds like we could have a Turing-complete type system... indeed, we have one. Someone even implemented a LISP interpreter that completely runs on the Haskell type system!

That's all for this week. Remember: do the exercises!

CIS 194, week 2, Algebraic Data Types

Pattern Matching

Case expressions

Recursive algebraic data types

Post Correlati