Haskell - Typeclasses

by Zebulun Arendsee

Contents:

in progress

While Haskell is fundamentally simple, its pure paradigm and expressive type system allow the design of deeply abstract patterns. Learning a few of these basic patterns is essential to understanding Haskell.

Type classes: Foldable and the Monoid

A typeclass in Haskell is a collection of functions and terms that a type instance must implement. Typeclasses may be defined as follows (reusing our Foldable example from before):

class Foldable f where
    fold :: (a -> b -> b) -> b -> f a -> b

class Monoid a where
    mappend :: a -> a -> a
    mempty :: a

A default module that is loaded into every Haskell program is called the “Prelude”. There is one blessed (or more often cursed) Prelude that is the default default. Many Haskell projects replace the default Prelude with a more logical and performant one.

There are some who say terms like “Monoid” are unnecessarily academic. I disagree. It is better to use an unfamiliar but exact technical word than a familiar but misleading common word.

To show the limitations of common words, let’s define a few things that may be monoids. A Monoid can be a string with concatenation as an operator. Here the word “Appendable” may be suitable. A Monoid can be a numeric type with an operator. One such Monoid is addition over integers. Alternatively, the operator could be multiplication. What single word covers both these cases? I am not sure, but certainly “Appendable” is not it. A Monoid can be an unordered set with set union as the operator. What might we call this? “Joinable”, perhaps? It would not be right to call it “Appendable” since term assumes an ordered set.

Perhaps the best candidate for alternative names is “Composable”. This captures the intuition that Monoids make one thing from two things. But the word “Composable” doesn’t immediately imply a zero value must exist, so it could equally well be used as a name for a “semigroup”. A semigroup is another word from abstract algebra that refers to a type that has a closed operator (a->a->a) but no special “zero” value. “Composable” is ambiguous; “Monoid” is not.

Personally, I believe precise words lead to precise thoughts and communication. Learning a little extra vocabulary is not so hard.

I’ll end my rant and get back to typeclasses. Once we’ve defined the typeclasses, they can be used in a type signature as follows:

generalSum :: (Foldable f, Monoid a) => f a -> a

So what’s the point? That doesn’t seem much cleaner than our prior function passing approach.

In fact, some Haskell-like languages don’t have typeclasses at all. Elm, for instance. Instead, they rely on explicit dictionary or function passing. It works fine. However, there are some conveniences that explicit typeclasses add. The biggest one, perhaps, is default values.

The actual Haskell Foldable typeclasses defines 17 functions. Of these, only one is required. The rest have defaults defined in terms of the required function. This means that when I implement a Foldable instance for a new data type and provide just the required function, I get 16 additional functions for free.

It is even better than this, though. Many type signatures are sufficiently specific that there exists a single reasonable implementation. When this is true, writing the implementation is just boilerplate, why write something when you can automatically derive it? fold is actually one such function. So Foldable instance can be generated automatically for any type where folding is meaningful. Other typeclass instances that can be generated include Eq, Ord and Show. These allow comparisons on the datatype and string representations. These three typeclasses are usually derived for every defined type. Thus a more likely definition of Bool would probably be:

data Bool = True | False
    deriving(Eq, Ord, Show)

Typeclasses may also inherit from one another. Monoid is often implemented as inheriting from the more general Semigroup typeclass. As below:

class Semigroup a where
    (<>) :: a -> a -> a

class (Semigroup a) => Monoid a where
    mempty :: a

A Semigroup is any type that has an operator that is closed over the type. That is, for any two values from the type, the operator can return a third value. A Monoid is a Semigroup with a zero element.

Type classes: Eq, Ord, Show, and Num

The main Haskell typeclasses are very general and defining new ones is not often necessary. A type in Haskell is usually an instance of many typeclasses. There are maybe a dozen or so typeclasses in Haskell that vast majority of what you will see in the wild. I’ll list a few here with their simplified definitions (I leave out the defaults):

class Eq a where
   -- define a binary operator
   (==) :: a -> a -> Bool 

   -- you can define default functions
   -- implementations of this class may use the default or provide a custom
   -- alternative (though the type must remain the same)
   (/=) :: a -> a -> Bool
   (/=) x y = not (x == y) 

class Ord a where
    (<=) :: a -> a -> Bool

class Show a where
    show :: a -> String

Eq, Ord and Show allow a data to be compared for equality (Eq), order (Ord) and to be converted to strings (Show). Instances for all three type classes can be automatically derived by the compiler if desired. Typically a programmer will always ask for there to be derived. The syntax is as follows:

data Maybe a = Nothing | Just
    deriving(Eq, Ord, Show)

Other type classes, like Foldable and Functor, may also be derived for some data types. It is also possible to define automatic derivation for custom typeclasses, though this is a more advanced topic.

There is a whole family of numeric type classes. The most important one is Num:

class Num a where
    (+) :: a -> a -> a
    (-) :: a -> a -> a
    (*) :: a -> a -> a
    negate :: a -> a
    abs :: a -> a
    signum :: a -> a
    fromIntegral :: Integer -> a

None of the types classes so far have been too mind bending. The next three typeclasses, though, are where people start getting lost. Say hello to Functor, Applicative, and Monad.

Type classes: Functors, Applicatives, and Monads

If you are going to be productive in Haskell, you are going to have to learn about Functors, Applicatives, and Monads.

class Functor f where
    fmap :: (a -> b) -> f a -> f b
    (<$>) = fmap  -- a common infix alias

class Functor f => Applicative f where
    pure a = f a 
    (<*>) = f (a -> b) -> f a -> f b

class Applicative m => Monad m where
    (>>=) :: m a -> (a -> m b) -> m b

The typeclasses listed above provide a framework for managing the context of a computation. Contexts that are “Foldable” can be reduced out of existence. A list is foldable. Maybe and Either are as well.

The complexity of typeclasses like Monad are comparable to the complexity of object oriented patterns such as the Visitor pattern. They tend to be more general than the OOP patterns. And yes, they are mostly drawn from category theory. While this lineage gives them their obscure names, it also makes them systematic.

Haskell is simple, but the implications are deep.

Type classes: Functor

class Functor f where
    fmap :: (a -> b) -> f a -> f b

The typeclasses can get very abstract. This is where the real learning curve in Haskell is. You wrap your mind around some very abstract mathematical ideas. These ideas will be familiar from other languages, they encapsulate general categories of problems we solve daily as programmers without, perhaps, even thinking of them as discrete patterns. But Haskell takes these general patterns and make them explicit and central. Then, from the abstract, the particular patterns found in other languages are derived. This is hard. But also beautiful. Would you rather spend your time trying to understand the logical consequences of math or the conventions of some idiosyncratic API? At the risk of being overly-dramatic, we here we meet the beauty and horror of Haskell. Everything you build in Haskell is founded on math. If you want a web server, you need to work out the Platonic idea of a server. This abstract bent is why Haskellers love Haskell and why everyone else thinks they are pretentious blowhards.

I do not want to delve into the details of these typeclasses. Plenty has been written about them already. However, learning to think with these patterns is the first step to becoming productive in Haskell. So I will briefly describe the roles of each typeclass. If you want more info, google is your friend.

A Functor is a type with one parameter that may be transformed. The output type f b knows nothing about type a. So whatever fmap does, no a values can remain. Let’s say f is a list. What possible implementations could fmap have? Here are a couple:

fmap1 _ xs = []
fmap2 _ _ = error "Hahaha" 
fmap3 f xs = [f (head xs)]
fmap4 f xs = map f (reverse xs)

Well, those are clearly pathological. They have the correct type, but the first ignores the functions and returns the empty list, the second is undefined, the third only maps the first value, the fourth reverses the list and then maps it.

The problem is that the Functor typeclass specification is not strict enough. There is an implicit law of Functors that Haskell’s type system is too weak to specify. Specifically:

fmap id x == x

This law states that mapping the identity function over a value should return the original value. Functor’s must preserve structure.

Here are a few examples of functors:

> fmap (+ 1) [1,2,3]
[2,3,4]
> fmap (+ 1) (Just 1)
Just 2
> fmap (+ 1) Nothing
Nothing

Type classes: Applicative

class Functor f => Applicative f where pure :: a -> f a (<*>) :: f (a -> b) -> f a -> f b

Type classes: Traversable

class (Functor t, Foldable t) => Traversable t where traverse :: Applicative f => (a -> f b) -> t a -> f (t b) sequenceA :: Applicative f => t (f a) -> f (t a)

Type classes: The infamous Monad

The Monad is infamous. Mostly because of its unfamiliar name and

class Applicative m => Monad a where (>>=) :: m a -> (a -> m b) -> m b



## Type classes: `Monad`


Let's return to the "Hello World" program:

``` haskell
main = print "Hello World"

Write this into a program called Main.hs and then you can compile it with GHC:

ghc Main.hs

This will create the executable file Main, which, when run, will say hello.

The filename Main.hs and the function name main are both special. Every executable Haskell program has Main.hs file with a main function that specifies a single IO operation.

The type of the main function is IO (). The print function has the following type:

print :: Show a => a -> IO ()

The () type is the “Unit” type. Here it means that nothing is returned. Here is the type of readFile, a function that (surprise!) reads a file and returns its contents as a string:

type FilePath = String  -- define an alias just for convenience
readFile :: FilePath -> IO String

How do you get this String out of IO? You don’t. Technically, there is an unsafe function that can extract it, but in all my years of programming in Haskell, I have never used it. Neither have I ever wanted to.

So let’s say we wanted to read a file and then print the first 10 lines. Here are a few functions from the default Prelude that will be helpful:

lines :: String -> [String]    -- break a string on newlines
take :: Int -> [a] -> [a]      -- get the first n elements from a list
unlines :: [String] -> String  -- concatenate lines

We can compose these functions:

textHead x = unlines( take 10 (lines x ) ) 

Or equivalently, we can use the function composition operator .:

textHead = unlines . take 10 . lines

Now what? We could try this:

main = print( textHead (readFile "/dev/stdin") )   -- FAIL

But this does not typecheck. The type of textHead is String -> String but the type of readFile "/dev/stdin" is IO String. textHead expects a String but is given an IO String.

What can we do?

Well, IO is the member of several type classes. These include Monoid, Functor, Applicative, and Monad. If we look over the methods in these type classes, we can find a few that apply to our problem. One is fmap. Since IO is a functor, we can apply a pure function to its contents. Here I’ve written out the types of the relevant expression (this isn’t Haskell code):

textHead |- String -> String
readFile "/dev/stdin"  |- IO String
fmap |- (a -> b) -> f a -> f b

Here I am using |- to mean “has type” so as not to confuse this with Haskell code. textHead is a instance of the general type (a -> b), where a |- String and b |- String. So we can use IO’s fmap instance to change its contents to the file head:

main = print( fmap textHead (readFile "/dev/stdin") )   -- FAIL

But this still doesn’t work. print has type String -> IO (). It expects a String but is getting IO String. We’ve just shifted the type error from readFile to print.

We need a function that can apply String -> IO () to IO String. More generally, we need a function of type IO a -> (a -> IO b) -> IO b. Looking back over our type classes, we see this the signature of the Monad (>>=) operator (with the generic substituted for IO).

Using the (>>=) operator, we finally get correct expression:

main = fmap textHead (readFile "dev/stdin") >>= print

This examples shows two ways to interact with data that is hidden in the IO monad. You can apply a pure function to it using the Functor function fmap. Or you can use the monadic bind operation to chain together operations of type (a -> IO b). The input to each function is the pure a type. If a previous step failed, there won’t be a pure a type and the function will not be called. All this is handled in the implementation of the bind operator.

This is probably all terribly confusing. Let me try another angle. Suppose we have the following Bash script:

grep eggs myfile.txt | sed 's/eggs/EGGS/' | sort > "EGGS.txt"
rm myfile.txt

We could implement this in Haskell with the following functions:

grep :: String -> String -> [String]
sed :: String -> String
sort :: [String] -> [String]
saveAs :: String -> String -> IO ()
rm :: String -> IO ()

main = grep "eggs" "myfile.txt" >>= pure . sed "s/eggs/EGGS/" >>= pure . sort >>= saveAs

Enter :i IO in GHCI and you will find that IO is Monoid, Functor, Applicative, and Monad. It is not, however, Traversable.

This function takes any value that can be mapped to a String type and writes it to the console. It returns nothing wrapped in an IO datatype.

Overview

In Haskell, there are no classes and objects. Data and functions are separate. In contrast, in Python everything is an object and objects all have a small set of methods that the object programmer decided you will want. For example, in Python v3.10, the builtin string type has 47 methods. This creates an artificial divide between the object methods and external functions. And since, with a few specific exceptions, data is mutable, there are 4 ways to change a value:

x = f(x)  -- point x to a modified version using a function
f(x)      -- mutate x by function
x.f()     -- mutate x by method
x = x.f() -- reasign x to a modified version using a method

In Haskell, there is only function application, f x.

From this comparison, we can begin to see a design difference between Python and Haskell. Python has a builtin support for multi-paradigm programming. There are builtin classes, support for inheritance, method over-riding, and such. There are builtin imperative structures like for loops, while loops, and if-else statements. There are functions and support for function passing. There is also builtin decorators and builtin generators for adding laziness.

Haskell, in contrast, has none of this. It is a pure functional language.

built on 2024-08-12 11:47:46.22895832 UTC from file 2024-03-18-type-classes