Contents:
- Why I write and who you are
- Haskell - The language of terms
- Haskell - The language of types
- Haskell - Typeclasses
- Interlude: how typing simplifies development
- Parser combinators: Fast streaming of huge and complex files
in progress
While Haskell is fundamentally simple, its pure paradigm and expressive type system allow the design of deeply abstract patterns. Learning a few of these basic patterns is essential to understanding Haskell.
Type classes: Foldable and the Monoid
A typeclass in Haskell is a collection of functions and terms that a type instance must implement. Typeclasses may be defined as follows (reusing our Foldable example from before):
class Foldable f where
fold :: (a -> b -> b) -> b -> f a -> b
class Monoid a where
mappend :: a -> a -> a
mempty :: a
A default module that is loaded into every Haskell program is called the “Prelude”. There is one blessed (or more often cursed) Prelude that is the default default. Many Haskell projects replace the default Prelude with a more logical and performant one.
There are some who say terms like “Monoid” are unnecessarily academic. I disagree. It is better to use an unfamiliar but exact technical word than a familiar but misleading common word.
To show the limitations of common words, let’s define a few things that may be monoids. A Monoid can be a string with concatenation as an operator. Here the word “Appendable” may be suitable. A Monoid can be a numeric type with an operator. One such Monoid is addition over integers. Alternatively, the operator could be multiplication. What single word covers both these cases? I am not sure, but certainly “Appendable” is not it. A Monoid can be an unordered set with set union as the operator. What might we call this? “Joinable”, perhaps? It would not be right to call it “Appendable” since term assumes an ordered set.
Perhaps the best candidate for alternative names is “Composable”. This captures
the intuition that Monoids make one thing from two things. But the word
“Composable” doesn’t immediately imply a zero value must exist, so it could
equally well be used as a name for a “semigroup”. A semigroup is another word
from abstract algebra that refers to a type that has a closed operator
(a->a->a)
but no special “zero” value. “Composable” is ambiguous; “Monoid” is
not.
Personally, I believe precise words lead to precise thoughts and communication. Learning a little extra vocabulary is not so hard.
I’ll end my rant and get back to typeclasses. Once we’ve defined the typeclasses, they can be used in a type signature as follows:
generalSum :: (Foldable f, Monoid a) => f a -> a
So what’s the point? That doesn’t seem much cleaner than our prior function passing approach.
In fact, some Haskell-like languages don’t have typeclasses at all. Elm, for instance. Instead, they rely on explicit dictionary or function passing. It works fine. However, there are some conveniences that explicit typeclasses add. The biggest one, perhaps, is default values.
The actual Haskell Foldable
typeclasses defines 17 functions. Of these, only
one is required. The rest have defaults defined in terms of the required
function. This means that when I implement a Foldable
instance for a new data
type and provide just the required function, I get 16 additional functions for
free.
It is even better than this, though. Many type signatures are sufficiently
specific that there exists a single reasonable implementation. When this is
true, writing the implementation is just boilerplate, why write something when
you can automatically derive it? fold
is actually one such function. So
Foldable
instance can be generated automatically for any type where folding is
meaningful. Other typeclass instances that can be generated include Eq
, Ord
and Show
. These allow comparisons on the datatype and string representations.
These three typeclasses are usually derived for every defined type. Thus a more
likely definition of Bool
would probably be:
data Bool = True | False
deriving(Eq, Ord, Show)
Typeclasses may also inherit from one another. Monoid is often implemented as inheriting from the more general Semigroup typeclass. As below:
class Semigroup a where
(<>) :: a -> a -> a
class (Semigroup a) => Monoid a where
mempty :: a
A Semigroup is any type that has an operator that is closed over the type. That is, for any two values from the type, the operator can return a third value. A Monoid is a Semigroup with a zero element.
Type classes: Eq
, Ord
, Show
, and Num
The main Haskell typeclasses are very general and defining new ones is not often necessary. A type in Haskell is usually an instance of many typeclasses. There are maybe a dozen or so typeclasses in Haskell that vast majority of what you will see in the wild. I’ll list a few here with their simplified definitions (I leave out the defaults):
class Eq a where
-- define a binary operator
(==) :: a -> a -> Bool
-- you can define default functions
-- implementations of this class may use the default or provide a custom
-- alternative (though the type must remain the same)
(/=) :: a -> a -> Bool
/=) x y = not (x == y)
(
class Ord a where
(<=) :: a -> a -> Bool
class Show a where
show :: a -> String
Eq
, Ord
and Show
allow a data to be compared for equality (Eq
), order
(Ord
) and to be converted to strings (Show
). Instances for all three type
classes can be automatically derived by the compiler if desired. Typically a
programmer will always ask for there to be derived. The syntax is as follows:
data Maybe a = Nothing | Just
deriving(Eq, Ord, Show)
Other type classes, like Foldable
and Functor
, may also be derived for some
data types. It is also possible to define automatic derivation for custom
typeclasses, though this is a more advanced topic.
There is a whole family of numeric type classes. The most important one is
Num
:
class Num a where
(+) :: a -> a -> a
(-) :: a -> a -> a
(*) :: a -> a -> a
negate :: a -> a
abs :: a -> a
signum :: a -> a
fromIntegral :: Integer -> a
None of the types classes so far have been too mind bending. The next three
typeclasses, though, are where people start getting lost. Say hello to Functor
,
Applicative
, and Monad
.
Type classes: Functors, Applicatives, and Monads
If you are going to be productive in Haskell, you are going to have to learn about Functors, Applicatives, and Monads.
class Functor f where
fmap :: (a -> b) -> f a -> f b
<$>) = fmap -- a common infix alias
(
class Functor f => Applicative f where
pure a = f a
<*>) = f (a -> b) -> f a -> f b
(
class Applicative m => Monad m where
(>>=) :: m a -> (a -> m b) -> m b
The typeclasses listed above provide a framework for managing the context of a
computation. Contexts that are “Foldable” can be reduced out of existence. A
list is foldable. Maybe
and Either
are as well.
The complexity of typeclasses like Monad are comparable to the complexity of object oriented patterns such as the Visitor pattern. They tend to be more general than the OOP patterns. And yes, they are mostly drawn from category theory. While this lineage gives them their obscure names, it also makes them systematic.
Haskell is simple, but the implications are deep.
Type classes: Functor
class Functor f where
fmap :: (a -> b) -> f a -> f b
The typeclasses can get very abstract. This is where the real learning curve in Haskell is. You wrap your mind around some very abstract mathematical ideas. These ideas will be familiar from other languages, they encapsulate general categories of problems we solve daily as programmers without, perhaps, even thinking of them as discrete patterns. But Haskell takes these general patterns and make them explicit and central. Then, from the abstract, the particular patterns found in other languages are derived. This is hard. But also beautiful. Would you rather spend your time trying to understand the logical consequences of math or the conventions of some idiosyncratic API? At the risk of being overly-dramatic, we here we meet the beauty and horror of Haskell. Everything you build in Haskell is founded on math. If you want a web server, you need to work out the Platonic idea of a server. This abstract bent is why Haskellers love Haskell and why everyone else thinks they are pretentious blowhards.
I do not want to delve into the details of these typeclasses. Plenty has been written about them already. However, learning to think with these patterns is the first step to becoming productive in Haskell. So I will briefly describe the roles of each typeclass. If you want more info, google is your friend.
A Functor
is a type with one parameter that may be transformed. The output
type f b
knows nothing about type a
. So whatever fmap
does, no a
values
can remain. Let’s say f
is a list. What possible implementations could fmap
have? Here are a couple:
= []
fmap1 _ xs = error "Hahaha"
fmap2 _ _ = [f (head xs)]
fmap3 f xs = map f (reverse xs) fmap4 f xs
Well, those are clearly pathological. They have the correct type, but the first ignores the functions and returns the empty list, the second is undefined, the third only maps the first value, the fourth reverses the list and then maps it.
The problem is that the Functor typeclass specification is not strict enough. There is an implicit law of Functors that Haskell’s type system is too weak to specify. Specifically:
fmap id x == x
This law states that mapping the identity function over a value should return the original value. Functor’s must preserve structure.
Here are a few examples of functors:
> fmap (+ 1) [1,2,3]
2,3,4]
[> fmap (+ 1) (Just 1)
Just 2
> fmap (+ 1) Nothing
Nothing
Type classes: Applicative
class Functor f => Applicative f where pure :: a -> f a (<*>) :: f (a -> b) -> f a -> f b
Type classes: Traversable
class (Functor t, Foldable t) => Traversable t where traverse :: Applicative f => (a -> f b) -> t a -> f (t b) sequenceA :: Applicative f => t (f a) -> f (t a)
Type classes: The infamous Monad
The Monad is infamous. Mostly because of its unfamiliar name and
class Applicative m => Monad a where (>>=) :: m a -> (a -> m b) -> m b
## Type classes: `Monad`
Let's return to the "Hello World" program:
``` haskell
main = print "Hello World"
Write this into a program called Main.hs
and then you can compile it with GHC:
ghc Main.hs
This will create the executable file Main
, which, when run, will say hello.
The filename Main.hs
and the function name main
are both special. Every
executable Haskell program has Main.hs
file with a main
function that
specifies a single IO operation.
The type of the main
function is IO ()
. The print
function has the following type:
print :: Show a => a -> IO ()
The ()
type is the “Unit” type. Here it means that nothing is returned. Here
is the type of readFile
, a function that (surprise!) reads a file and returns
its contents as a string:
type FilePath = String -- define an alias just for convenience
readFile :: FilePath -> IO String
How do you get this String out of IO? You don’t. Technically, there is an unsafe function that can extract it, but in all my years of programming in Haskell, I have never used it. Neither have I ever wanted to.
So let’s say we wanted to read a file and then print the first 10 lines. Here are a few functions from the default Prelude that will be helpful:
lines :: String -> [String] -- break a string on newlines
take :: Int -> [a] -> [a] -- get the first n elements from a list
unlines :: [String] -> String -- concatenate lines
We can compose these functions:
= unlines( take 10 (lines x ) ) textHead x
Or equivalently, we can use the function composition operator .
:
= unlines . take 10 . lines textHead
Now what? We could try this:
= print( textHead (readFile "/dev/stdin") ) -- FAIL main
But this does not typecheck. The type of textHead
is String -> String
but
the type of readFile "/dev/stdin"
is IO String
. textHead
expects a
String
but is given an IO String
.
What can we do?
Well, IO is the member of several type classes. These include Monoid, Functor,
Applicative, and Monad. If we look over the methods in these type classes, we
can find a few that apply to our problem. One is fmap
. Since IO is a functor,
we can apply a pure function to its contents. Here I’ve written out the types of
the relevant expression (this isn’t Haskell code):
textHead |- String -> String
readFile "/dev/stdin" |- IO String
fmap |- (a -> b) -> f a -> f b
Here I am using |-
to mean “has type” so as not to confuse this with Haskell
code. textHead
is a instance of the general type (a -> b)
, where a |- String
and
b |- String
. So we can use IO
’s fmap
instance to change its contents to
the file head:
= print( fmap textHead (readFile "/dev/stdin") ) -- FAIL main
But this still doesn’t work. print
has type String -> IO ()
. It expects a
String
but is getting IO String
. We’ve just shifted the type error from
readFile
to print
.
We need a function that can apply String -> IO ()
to IO String
. More
generally, we need a function of type IO a -> (a -> IO b) -> IO b
. Looking
back over our type classes, we see this the signature of the Monad
(>>=)
operator (with the generic substituted for IO
).
Using the (>>=)
operator, we finally get correct expression:
= fmap textHead (readFile "dev/stdin") >>= print main
This examples shows two ways to interact with data that is hidden in the IO
monad. You can apply a pure function to it using the Functor function fmap
. Or
you can use the monadic bind operation to chain together operations of type (a -> IO b)
. The input to each function is the pure a
type. If a previous step
failed, there won’t be a pure a
type and the function will not be called. All
this is handled in the implementation of the bind operator.
This is probably all terribly confusing. Let me try another angle. Suppose we have the following Bash script:
grep eggs myfile.txt | sed 's/eggs/EGGS/' | sort > "EGGS.txt"
rm myfile.txt
We could implement this in Haskell with the following functions:
grep :: String -> String -> [String]
sed :: String -> String
sort :: [String] -> [String]
saveAs :: String -> String -> IO ()
rm :: String -> IO ()
= grep "eggs" "myfile.txt" >>= pure . sed "s/eggs/EGGS/" >>= pure . sort >>= saveAs main
Enter :i IO
in GHCI and you will find that IO
is Monoid, Functor,
Applicative, and Monad. It is not, however, Traversable.
This function takes any value that can be mapped to a String
type and writes
it to the console. It returns nothing wrapped in an IO datatype.
Overview
In Haskell, there are no classes and objects. Data and functions are separate. In contrast, in Python everything is an object and objects all have a small set of methods that the object programmer decided you will want. For example, in Python v3.10, the builtin string type has 47 methods. This creates an artificial divide between the object methods and external functions. And since, with a few specific exceptions, data is mutable, there are 4 ways to change a value:
= f(x) -- point x to a modified version using a function
x -- mutate x by function
f(x) -- mutate x by method
x.f() = x.f() -- reasign x to a modified version using a method x
In Haskell, there is only function application, f x
.
From this comparison, we can begin to see a design difference between Python and Haskell. Python has a builtin support for multi-paradigm programming. There are builtin classes, support for inheritance, method over-riding, and such. There are builtin imperative structures like for loops, while loops, and if-else statements. There are functions and support for function passing. There is also builtin decorators and builtin generators for adding laziness.
Haskell, in contrast, has none of this. It is a pure functional language.