0:00
And now we're going to go into a little bit more detail of BitTorrent ideas.
We'll highlight three ideas that's used by BitTorrent to enable the kind of
scalability that it enjoys. Okay?
So out of the three ideas, the first one we start with is the idea of smaller
granularity. This, in turn, give us more flexibility in
resource allocation. Okay?
In this case, the resource is the, content and the applicable capacities shared by
each peer. Now, what is the granularity now?
Instead of looking at the whole file, to be shared as a single unit, we'll look at,
chunks. This chunks could be, say 250, 256
kilobytes. Now that is a very small number compared
to, let's say, you know, 100 megabyte even multiple gigabyte of file size.
And then, each chunk can traverse a different multicast tree.
Okay? Each chunk can go down a different multicast tree.
What's the advantage of doing that? Well, first of all, as we will see is in
the next module that employing multiple trees to support the same multicast
session is a very smart idea. Multicasting here refers to one source and
destination. Multiple trees here refers to the fact
that different parts of this file be multicast can go down different trees.
Now we can compare this to packet switching, two lectures ago.
As we went from circuit to packet switching, we, use a smaller unit of
granularity called a packet and each packet can go down possibly a different
path. And this lead to one of the two advantages
of scalability of efficiency of resource usage in the Internet.
And now, it is not a packet but a chunk of a file, and it goes down not just multiple
path, but multiple trees. Okay. So one tree could be like this,
another tree could be like this. Okay? And it turns out that actually going down
multiple trees perhaps because, the rich features of the topology called the tree
actually is easier to handle and manage than going down multiple paths as well.
And this provides basically what we call it a spatial pipeline.
What's a pipeline? It's like if you know you're going to wash
something and dry something, And then you can, first, you know, put one
pile in the washer. And then, when it's done put it in the
dryer, while the washer is used for the second pile.
That is called temporal pipelining, scheduling different components of an
overall job. And now, we can do spatial pipeline
because some packets will traverse this tree some chunks will traverse this tree,
some other chunks go through another tree. This enhances over efficiency.
Okay? The smart idea number one. The smart idea number two, is that,
4:27
So, we can have those that are just sources like servers, those that are only
destinations like clients and those that are both like peers,
Also those that are neither, actually, routers.
Okay? They just relay information.
Alright. So, now I have to think about how do I
share different chunks? For example, if one peer says I got chunks
one, two, three. Another says I got chunk two, three, four.
Another says I've got chunk one, two, four.
And then another says I got chunks two, three, five.
Okay? When they exchange this information, this
is called bitmap exchange. They will realize that, hey, we all want
to collect let's say, chunks one, two, three, four, five.
So, there are only four chunks to constitute this small file.
Right? So that's multicast, we all want the whole
file. We all want, therefore, chunks one to
five, But chunk five is the rarest. Only one pair has this chunk,
Whereas chunks one, two, three, four have, you know, two or three different peers
possessing it. So, while we exchange the chunks, let's
start with the rarest one. This is the idea of rarest chunk first,
it'll try to, equalize the popularity or availability of different chunks across
different peers. And clearly doesn't good idea, so that it
won't happen that all the peers got one, two, three, four and everybody is waiting
for this rare chunk five. And also, just in case this peer got all
the files and decided to leave, the system the other, peers will still have an
available chunk number five. So the smart idea number two, rarest chunk
first. Smart idea three, behind the design of Bit
Torrent is perhaps the one that most people talk about.
It's about the peering construction. Not only it matters what chunks you use,
It matters also which neighbors you choose as pairs.
So again, remember that there is an underlay network, which is the actual
physical IP connection network. Above that, there's one overlay, okay,
overlay one. That is the set of neighbors who possess
some part of the relevant files and the tracker tells a new peer, hey, here are
your possible neighbors. And then, on top of that is another layer
of overlay, overlay two, which are the actual peers.
This overlay network changes over time, but not that dynamically.
This underlay changes very slowly. The time scale of sharing a file pretty
much remains static, But this top layer, the second layer of
overlay network changes very dynamically. Okay?.
These peers can change from one time slot to another.
So the question is, how can I select a subset of my neighbors,
Okay, told by the tracker to me and called those by actual peers and start sharing
files chunks with them. So how to I go from a list of neighbours
to peering relationships? as you might remember, we talked about 1919 and 2000,
Kaza and Gnutella, And those first generation P2P networks
had this free rider problem. That is, everybody would like to get the
content from others get content from others,
But they may not want to contribute, either their content or their ability to
share the content, which is the upload capacity.
8:32
So, how do I solve this free rider problem?
We have to provide some kind of incentives and this may remind you of the kind of
incentives we talked about back in lecture eleven and twelve for Internet pricing and
back in lecture, say, two for auction. Okay? So the kind of incentives that can
be provided could be, say, tit for tat. That is, I'm going to pick five peers out
of the list of neighbors at each time slot.
And four out of the five peers will be picked by looking at the top four peers
who have contributed content to me, okay? So the top four that provided content to
me with the fastest upload capacity, That would be the list of five peers I
choose. So, if you don't contribute your upload
capacity to others, then soon no one would want to be your peer, and you will get no
help from others either. Okay? What about the remaining one?
Well, that actually goes back to randomization.
This is, I guess the third time we've seen randomization, cuz a lot of the design in
the network requires algorithms acting on things that you don't quite know exactly.
In this case, you also don't quite know exactly what is the optimal configuration.
In the advanced material, we will talk about a very difficult optimization
problem, by far the most difficult one in this course,
Peering relationship optimization. But that require, some centralized control
and global view, short of that, you really have idea about the globally optimal
configuration. So,
What can you do to prevent yourself from getting stuck into an undesirable
configuration space? One possibility is randomization.
It also helps with unfairness to those who actually have little to contribute to
start up with. They simply may not have the right content
or the upload capacity is very small. In the traditional design of the Internet,
the upload capacity from the clients, the end-user devices is a much smaller than
the download capacity, cuz back then, like five, ten, fifteen years ago people
thought that, hey, why would you need so much upload capacity?
Okay? You are mostly consuming content from servers somewhere in the Internet,
web, e-mail servers, video servers and so on.
But of course in the age of YouTube and user-generated content upload is becoming
an increasingly and indeed for P2P is often the bottleneck.
11:27
Okay? So, maybe you just don't have enough
upload capacity and therefore a little randomization will give you a lift.
So the remaining one of the five pairs in each time slot, will be just randomly
chosen. And this is similar to lecture three, when
we talk about randomization in Google's ranking of webpages.
Pagerank has say, fifteen percent randomization going on, this is basically
twenty percent randomization, one out of five.
So tit for tat, together with randomization in peer selection is the
powerful idea of number three in BitTorrent.
Now, there are many other engineering details, but they don't concern us in this
lecture. Now we can summarize the essential steps
in a BitTorrent operation. Let's say, a new peer A arrives, and then
it will contact the web server and receive a doctrine file, which contains the IP
address and the port number of another server, the tracker.
So, it registers with the tracker of, make sure that the tracker knows existent, its
existence, updates its database saying that, hey peer, number A is here alive.
In return, you also receive a list of neighbors, a possible peers.
Then using some mechanism, you're going to select, say, five peers.
Possibly four tit for tat, and then one randomization.
And then, you can start exchanging bitmaps directly with these peers, and they
indicate the availability of chunks of content.
Then based on say, rarest chunk first policy, you select the chunk and finally
you start sharing the content on the data plane.
13:13
Now so far, we have not talked about the quantitative side all these qualitative
description of scaling property. We keep saying that,
Hey, if you are smart enough, P2P law says, unlike Metcalfe's law, you can
maintain a constant number of neighbors, And yet, the overall system can scale very
well. How do we quantify this self-scaling
property the most important property in P2P networks?
So we're going to do a little back of envelope, and then a simple numerical
example in some ideal situation to illustrate those points,
Leaving the full scale version to the advanced material.