In this blog post, I would like to talk about trees in Haskell, mostly because I do not like the way trees are presented in various Haskell educational materials. So here are trees in Haskell, presented in a way that I hope will make more sense than the material that is already available out there.
Like any other language that I know of, trees are not supported in Haskell. I mean, trees are not built in. You create the notion of a tree. The language itself knows nothing about trees. Haskell knows about lists. It does not know about trees. You program the notion and the operations of trees in Haskell and in all other languages (well, someone may create a language that has built in support for trees, but until then…)
Now when people talk about trees in Haskell they start by giving the following code:
data Tree a = Nil | Node a (Tree a) (Tree a)
The problem with this code is that although it perfectly represents a binary tree in Haskell, it gives the impression that Haskell implicitly supports trees. Let us see why this code may mislead newcomers:
First of all, the word data is a reserved word and we are creating a new data type, so we have to use this word. So far, so good. But the word Tree is user-defined even though it does not look like it. Indeed, we define Tree to be our data type, our tree type, our tree that holds a‘s. Now a‘s can be Integers, Strings, whatever. We will define them when we create a new tree. But the word Tree as it stands there gives the impression of a built in word.
So it would be best to change it to MyTree or something that denotes that we have coined the name and not the creators of Haskell itself. We could also call it MyRecursiveDataStructure because that is what it is, but more on this later.
Then we come to the right part of the equation, the definition of our data structure. The first word we encouter is Nil and this again gives the impression of a builtin word that amounts to a not defined value or a zeroed value or a value that amounts to nothing. Wrong! Nil is also user defined. The user has to define a value that corresponds to an empty tree. The user has to somehow name this value. Well, Nil may be a bad choice, since it looks like something that is built in. A better choice would be MyEmptyNode or InHereWeHaveNothing. And again, after the guard (|) the second option for a value for our data structure is Node. Again, this looks like something that is built in to the language, although it is not. Again, it is user defined. A better option would be MyFilledNode or InHereWeHaveSomething.
So, here is a clearer version of the code above:
data MyTree a = MyEmptyNode | MyFilledNode a (MyTree a) (MyTree a)
All that Haskell knows is the data keyword and the guard (|). We define the words MyTree, MyEmptyNode and MyFilledNode and they carry no “power” whatsoever. They are mere words. No tree-like functionality can come from them.
In the code above, we defined a binary tree. But how is this possible, you may ask, since Haskell knows nothing about trees? Well, Haskell may not know about trees, but it knows about recursion and recursive data structures. And in the the code above, we defined a recursive data structure. And Haskell knows about that and can appreciate that.
OK, you might ask, we defined a data structure, because we used the keyword data, but what on earth makes our data structure a recursive one? Well, our data structure is a recursive one, because its name (MyTree) is used in the right side of the definition as well as on the left side. I mean, you have to have the name (MyTree) on the left side because that is what you are defining, but if you include it in the right side as well, then you have a recursive declaration, a data structure that is defined recursively, thus a recursive data structure.
So, for a data structure to be recursive, its name has to appear at least once in the right part of the declaration equation; in our example, MyTree appears two times. It appears two times, because we defined a binary tree. It would have appeared more if our tree would have more branches. More on this later.
So, what exactly does the declaration of MyTree in the code above means? Well, it means that we have a data structure that we named MyTree and it also takes a type a, which should be given when we create a new MyTree. MyTree can have a value of MyEmptyNode or a value of MyFilledNode a (MyTree a) (MyTree a), which means that MyTree is recursive and wherever we have a value of the type a, we also have two MyTrees there as well. Thus, we have a tree that in each node it either has nothing or it has a value and two branches (subtrees) emanating from there. But here is the thing: one or both of these two branches can as well be nonexistent, since a MyTree can be equal to MyEmptyNode. So, all this is the definition of a binary tree.
Here is a small program that will help you get a grasp of the concepts:
module Main where data MyTree a = MyEmptyNode | MyFilledNode a (MyTree a) (MyTree a) deriving (Eq,Ord,Show,Read) main :: IO () main = do putStrLn "Begin program" let aMyTree = MyFilledNode 5 (MyFilledNode 3 MyEmptyNode MyEmptyNode) (MyFilledNode 2 MyEmptyNode MyEmptyNode) print aMyTree print (sumMyTree aMyTree) let bMyTree = MyFilledNode "r" (MyFilledNode "s" MyEmptyNode MyEmptyNode) (MyFilledNode "a" MyEmptyNode MyEmptyNode) print bMyTree putStrLn "End program" sumMyTree :: Num a => MyTree a -> a sumMyTree MyEmptyNode = 0 sumMyTree (MyFilledNode n t1 t2) = n + sumMyTree t1 + sumMyTree t2
OK, and now I am ready to take questions from my audience… Yes, the gentleman in the middle who is smoking… Oh, you are a lady! Please tell me the brand of your cigarettes, so I can avoid them… And your question is…
All right, the request from the lady is that I should explain the last program a little bit. OK, we define a data structure called MyTree which is a binary tree. Again, Haskell knows about recursive data structure declaration, but knows nothing about trees. In the main program we create a MyTree of Integers and a MyTree of Strings. As far as the binary tree that holds integers, we can find the sum of all its integers by defining a function called sumMyTree. We can see from its definition that this function is recursive, since the function’s name appears in the right side of the equation as well. Obviously, recursive data structures will be served well by recursive functions.
OK, another question… Yes, the gentleman in the front, wearing the high stiletto heels – I guess I never got the memo… OK, the gentleman’s request is for me to provide some examples of binary trees in drawing and, correspondingly, in Haskell. Right:
An empty tree would be:
A tree with just a root (the root’s value being equal to 123) would be:
MyFilledNode 123 MyEmptyNode MyEmptyNode
A tree like the following would be:
MyFilledNode 11 (MyFilledNode 12 MyEmptyNode MyEmptyNode) (MyFilledNode 13 (MyFilledNode 14 MyEmptyNode MyEmptyNode) MyEmptyNode)
And so on. Again, out data structure can hold the data. But we have to tell Haskell how to manipulate our data structure. We have to manipulate our tree. We have to tell Haskell how to traverse it, calculate its sum of the values of the nodes, balance it, unbalance it, graph it, graft it, prune it, copy it etc. Of course, in functional programming we have immutability, but we can perform changes to a data structure by creating a new data structure.
OK, another question, please. Anyone?… Yes, the gentleman with the leather mask and the chainsaw standing in the aisle… Yes, the question is how other recursive structures may look like. Right. The MyFilledNode part of the declaration can have one, two, three etc, as many subnodes as we want. If it has two subnodes, as in all the examples above, we are talking about binary trees. But we can have trees with three or more branches from each node. A tree in which each node has at most three child nodes is called a ternary tree and in Haskell we would denote it as follows:
data MyTree a = MyEmptyNode | MyFilledNode a (MyTree a) (MyTree a) (MyTree a)
OK, now say that we want to denote a tree that has a variable number of subnodes for each node. We might decide to use a list of nodes for that, as follows:
data MyTree a = MyEmptyNode | MyFilledNode a [(MyTree a)]
I hope this answers your question, sir. No, don’t start the chainsaw… Oh, thank you. I did answer your question then. I am glad. Are there any other questions… No? OK, thank you for listening!