The first way that we will construct evolutionary trees is

by using a distance matrix.

And one of the most common ways of constructing such a matrix is to begin

with a multiple alignment.

Here is a toy multiple alignment that we'll use throughout this talk.

We can form a distance matrix from this multiple alignment in which D sub i,j

is simply the number of differing symbols between the i-th and

j-th rows of the multiple alignment.

i.e., there are three mismatches between the first and second rows.

So we simply set D sub i,2 and D sub 2,1 of this distance matrix equal to 3.

Of course, we have encountered several other different notions of distance

already that we could also use to construct a distance matrix.

But constructing a distance matrix from a multiple alignment is very common.

In order to construct a multiple alignment of coronaviruses,

researchers chose the spike protein, which identifies and

binds to receptor sites on the host's cell membrane.

Here is the spike protein for the SARS coronavirus.

You may have noticed that evolutionary trees take the shape of a graph.

So we'll define a tree as a connected, acyclic graph.

Because the tree is connected, it means that it holds the tree in one piece.

And because it's acyclic, it means that the tree can branch out

without growing back in on itself and forming a cycle.

When we use a tree to model a phylogeny,

the present day species should be at the ending nodes of the tree.

More formally, these ending nodes are called "leaves", which I've shown in dark in

this tree of life and they're simply nodes that have degree one.

The remaining nodes, shown in a lighter color, are called "internal nodes", and

they're simply nodes of a larger degree than one.

Now, trees come in all different shapes and sizes, but the fact that they

are connected in a cyclic by definition leads to some common properties.

For example, we can always find two leaves in a tree as long as the tree

has at least two nodes, of course.

Also, a tree with n nodes has exactly n - 1 edges.

For example, the tree on the left has 14 nodes.

So regardless of what shape the tree has or how it's nodes are connected,

we know that it must have 13 edges.

In order to model the most common ancestor of all the species in trees,

we will sometimes place a "root" in a tree.

If a tree is rooted, shown here by the green node, then the root can be

drawn at the top of tree with the leaves at the bottom of the tree.

By reorienting the tree in this way, we know that time flows

downward from the root to the leaves in the sense that each edge

of the tree connects an older species to a more recent species.

For now though, let's assume that we are working with unrooted trees.

This leads us to a computational

problem of finding the tree that "fits" a distance matrix.

Of course, I should have known that Son would somehow show up today.

The problem that he has is that we have been careless and

we haven't defined what we mean for a tree to "fit" a distance matrix.

If we come back to the toy distance matrix that we had for four species,

then I will go ahead and show you a tree that fits it.

Here it is.

We will need to assign weights to the edges of this tree so

that the sum of weights along a path that connects two leaves

corresponds to the distance matrix value for those two leaves.

Here we have weights of 1 and 2 and

they combine to give us a distance of 3 between chimp and human.

The distance between chimp and seal is 6, and

edge weights 1, 3, and 2 give us 6, and so on.

So you can verify that each possible path through this tree,

if you add up the weights of the edges along the path,

that gives you the distance matrix value.

Now we will use lower case d to indicate the distance between leaves and a tree.

If you're given a tree whose edges are weighted,

then it's a pretty straightforward problem to just compute the distances

between each pair of leaves in that tree.

Of course, what we really want to do is solve the reverse problem,

which is finding a tree whose distance between leaves fits a given matrix.

You think this is now well defined?

If you're not sure what I'm driving at with this question,

feel free to pause and take moment to try this exercise.