0:08
Now, we're going to talk about how web pages get constructed.
Now, you might say, Why are we looking at all this detail?
Partly because what I want you to really understand is how
web applications work because web applications and,
frankly, many mobile applications, have multiple layers of components,
multiple parts that work together.
In a web browser, you've got the browser
and then you've got the server and then you've got a database server.
What I want you to understand is I want you to be able to,
as you write code,
know what part of this multi-system system that you're working with.
The browser is the thing that runs on your hard drive,
it might run on your phone or something,
and the things the technologies that we're going to learn are like HTML and CSS.
Later, we'll learn about the document object model which
is a way HTML is what formats this,
CSS kind of picks the fonts and things,
and the document object model is the way that Javascript goes in and plays with stuff,
changes words, we'll learn all that, how that happens,
and JQuery makes that even easier.
We have a lot of technologies in the browser that we're going
to learn and then we're going to have a lot of
technologies in the server that we're going to learn.
We're going to learn about PHP so that we can write code in the server,
we're going to write SQL so we can find stuff
hidden throughout in our databases and bring that up and send it back.
And then we're going to learn about the request response cycle,
and that is where the browser is talking to
the server and we're writing code in the server to respond,
and what goes back and forth,
and that's HTTP, Hyper Text Transport Protocol.
Also, JSON is part of what we're going to do back and forth
so when we have Javascript talking and getting data back and forth, that uses JSON.
I have to start somewhere, right?
Where I want to start is I want to start to give you a sense of what happens here,
because this is the basic separation point
between what goes on in the browser and what goes on in the server..
Talk a little bit about the Hyper Text Transport Protocol,
but mostly I'm trying to give you a way to realize that there are
different pieces of code and technologies that are being used in these applications.
HTTP is the dominant application layer protocol.
It was invented in 1989,
1990 by Tim Berners-Lee and Robert Cailliau when they invented the web,
and there's other protocols that predate HTTP like FTP for file transfer,
Telnet, SMTP for simple mail transfer,
these are all well-developed.
But Tim Berners-Lee and Robert Cailliau,
they wanted an easy protocol,
mostly because they weren't protocol developers.
They just wanted to build an application that could
show pages and let people edit those pages.
So, they kind of built the simplest possible protocol.
These protocols go back and forth and back and forth and HTTP,
it makes a connection,
it asks for a document and it gets the document back.
It's really quite simple and I'm not
sure if it was engineered to be simple but what happened was,
as we were able to layer on top of the simple things like
web services and other things that are kind of just a slight variation on it,
most of the new protocols that we use are simply
protocols that sort of take a different convention on top of HTTP.
It's beautiful and it's elegant and it's the
only one that we can hack because it's so simple that we can hack it.
I used to be able to hack the mail protocol but then
security got in the way and made it difficult to use.
Another of the core contributions of HTTP,
again by Tim and Robert in the early 1990s,
is that the notion of a Uniform Resource Locator.
Now, you'll pick up a brochure about HTTP colon blah blah and like, Oh,
okay, I know what that is,
and I type that into my browser and I go get a document or a set of documents.
But there's actually some science that goes into
these Uniform Resource Locators and before the web,
you have to know what host to go to, what protocol,
what set of commands to send to that host,
and then what kind of things on that host that you might want to retrieve.
URL just comes up with a convention and then concatenates these things together.
So, http:// is the protocol,
there are more than just that one.
This server that you talk to and there's, like,
out in the cloud, there's, like, server,
server, server, server, server, server.
Which of these servers has the document?
What protocol is what we do to talk to it.
And then within the server, there's files and so this document
within the server tells us what we're going to look at.
Now, it's more complex than that.
You can put parameters on the end of this,
you can put what are called anchors on the end of it,
so there's all kinds of things, X=2.
But that's the basic idea of a URL is; how to get it,
where to go get it and what to get,
all concatenated into one long string.
The idea is if we have one of these URLs and we type it into the browser we get it.
The other way that we can do this is not just typing the URL isn the browser,
but embedded in the HTML are what are called anchor
tags which are clickable links that have an href,
a Hyper Text Reference,
inside that and we'll talk about this in the next HTTP lecture.
The href that says, Oh when someone clicks on this link,
throw this page away and go to a different page.
It's the hypertext aspect of it.
You click on a link and then you go to the next link.
The link turns into
a GET request which retrieves a page which gives us back new HTML which,
then is shown in our browser.
The first thing we're going to do is type a URL.
If you type data.pr4e.org/page1.htm,
and then you hit enter,
you have told it to go get a GET request.
And if I take a look at the source code here,
View Page Source, you might have to put, like,
a developer mode on or something to see
View Page Source but we can see the page source here.
You see this markup, this is HTML.
We'll talk about this in some detail.
And this is an anchor tag that says,
When this second page is clicked on,
go and get this URL.
We're on page1.htm and this is page2.htm.
If I click on this,
it will just be, like, instantaneous.
But we've been to another page and there's a link back to the first page.
Back to the second page.
Back to the first page.
Okay? That's basically hypertext navigation
but our browsers are doing things and seeing
these little clicks and then doing something and getting a different page.
Now, of course, it's far more complex than that and we'll see that in just a moment.
But I want to go and take a very close look at exactly the steps
that happen with this simple hypertext navigation.
At some point, we typed our URL into the browser,
we hit the enter key and that caused a GET request to happen and now, we see this page.
And now, we have an anchor tag in there,
that's a hypertext reference,
and in the early days,
they were all colored blue and they all were
underlined because people needed permission to click on things.
We're so used to looking at pages that you couldn't click.
Now, everyone thinks hypertext is right and you
know you'll just click on everything and say,
What can I click on here?
Well, in the old days, we used blue text and underlines to say, Click here.
When you click on this,
your browser is a piece of code that's running on your computer.
This big white box is your computer,
whether it's your phone or whatever browser is a piece of software, like Chrome, Firefox,
Internet Explorer, I guess they call it Edge now,
Safari, Opera, dot dot dot, lots of browsers.
Those are software applications that are
your client that views the web. They're a browser.
The browser, an application,
it's the thing that is showing you this page and when you click on it,
the browser on your computer says, Oh,
somebody clicked on a link and it goes and looks to figure out what link you want.
And then when it knows what link you asked for,
it then makes a connection based on parsing that link.
It makes a connection to the right web server on a port
called port 80 which is the normal port for web servers to live on.
And then it sends a request,
it sends a little line of text that looks just like this. It sends this whole line says,
That is the document I want.
Then, somehow, in this web server,
this is the Internet, right?
This is the Internet and, somehow, on this web server,
it either generates a response or reads it off of
a disk or something and out comes the response,
and it comes back to us and that response itself is in the format HTML,
Hyper Text Markup Language,
just like I showed you in the view source.
These are tags, end tag, tag, tag,
and then there is a hypertext reference here that turns this first page
into a hypertext link
and this was purple because I clearly clicked on it before I took the screenshot.
When this HTML comes back,
your browser parses it and then renders it.
There is the document object model that's kind of in the middle here that it parses it,
puts a document object model and then it comes to your page.
But that's the basic idea, you click, request,
page is done, there's a response,
parsed, shown, and you get the page.
Click, request, response, click, request, response.
Now, it's going to be way more complex by the time we're done.
There'll be, like, many request response cycles, okay?
But that's the basic request response cycle.
Now, this is all governed by Internet standards and
the Internet standards come from a very open source and a very open culture.
They were developed by a group called the Internet Engineering Task Force that
predates what we think of as the modern Internet which is the mid-80s.
The IETFs came from even the 70s,
an earlier network called the ARPANET and they came up with the idea of the IETF,
Internet Engineering Task Force, and these standards.
And the standards are open,
they're free, they're unencumbered,
meaning that anyone can read them and implement things that
comply with the standards and then,
as such, build yourself a web browser or
build yourself a web server if such a thing doesn't exist.
Of course, these days, all these things exist and we just download them.
But the sort of thing that governs the protocols that we
all use so that our code inter-operates are these IETFs.
There's other sources of standards like
the World Wide Web Consortium for things like HTML and CSS,
but the protocol stuff is very much through IETF.
Now, the fun thing about this is the name of these things are
called RFCs which stands for Requests For Comments and that's
kind of a nerdy engineering acknowledgement
that no standard is ever perfect and even when
they're done and they are 10 or 15 years old or 1981 to 2017,
that's like 30 some years old now,
they still might need to be improved so that even though it's 30 years old plus,
it's still waiting for comments.
If you have a comment, if you find something wrong or have a better way to build it,
then engineers want to hear about that.
So, that's kind of a fun,
ironic naming conventions for these Internet standards, RFCs.
And so, if you were to pick up,
there's many standards that govern HTTP,
the HyperText Transfer Protocol.
You go read them and you could page down, page, page,
page, and you will realize quickly that you don't want to write a browser,
you'd much rather use a browser.
But eventually, you'll get down to a page way, way,
way down that basically says how to make a request.
And if you keep looking, it tells you that you're supposed to
put the method followed by the URI,
a protocol version and then end with a Carriage Return Line Feed, right?
That's what it's saying right here.
The method token, which is going to be GET in our case,
followed by the URI,
URL, and then with the Carriage Return Line Feed, et cetera.
If you read long enough,
you could eventually see how we're supposed to do that.
Based on how we're supposed to do that, up next,
I'm going to show you how to manually hack an HTTP request.