So let's talk about Compute Engine, and Cloud Storage. It's useful to know how compute instances and cloud work because a Datalab instance is going to run on these. For persistent data in the Cloud, you will use Cloud Storage, so you need to understand Cloud Storage as well. Think of Compute Engine as a global distributed CPU, and Cloud Storage as a global distributed disk. Datalab though is a single node program, so it runs on a single Compute Engine instance. However, when we launch off Dataflow jobs, or Cloud ML Engine jobs, we kick off the processing to many Compute Engine instances. Compute Engine essentially allows you to rent a virtual machine on the Cloud to run your workloads. So what are some of the things you can customize? Things like the number of cores, the amount of memory, the disk size, the operating system, but things like load balancing, networking, et cetera come baked in. But you are not tied into your initial choices, you can always change them. And the billing discounts are automatic depending on how much you use the machine. The disks attached to Compute Engine instances are fast, but they're ephemeral. When the VM goes away, the disk goes away. Well, Google also offers persistent disks, but lets ignore that for now. Cloud Storage is durable. That is, blobs in Cloud Storage are replicated and stored in multiple places. Cloud Storage is also accessible from any machine. And because of the speed of the network petabit bisectional bandwidth within a Google Center, which essentially means that 100,000 machines can talk to each other at 10 gigabits per second. You can directly read off Cloud Storage. In fact, that's what we will do when we write our transfer flow programs. The purpose of Cloud Storage is to give you a durable global file system, but how is it organized? A typical Cloud Storage URL might look like gs:acme-sales/data/sales003.csv. The acme-sales, that's called a bucket. The name of the bucket is globally unique. Think of it like a domain name and an internet URL. The way to get a globally unique bucket name is to use a reverse domain name, in which case Google Cloud Platform will ask you to prove that you own the domain name in question, or simply use your project ID. Unless you are extremely unlucky, your project ID which is also globally unique will not have already been used for a bucket name. The rest of the gs URL is by convention like a folder structure, with a complete gs URL referring to an object in Cloud Storage. So, how do you work with it? You can use gsutil. This is a command line tool that comes with the Google Cloud SDK. If you spin up a Compute Engine instance, gsutil is already available. On your laptop, you can download the Google Cloud SDK to get gsutil. Gsutil uses a familiar Unix command line syntax. So for example, MB and RB are make bucket and remove bucket. You can do CP to do a copy. And instead of a command line, you can also use a GCP console, or you can use a programming API, or you can use a REST API. Here, I'm showing you how to copy a bunch of files, sales*.csv to a specific Cloud Storage location. Remember I said Cloud Storage buckets are durable. This means that they're stored redundantly. You also get edge caching and failover simply by putting your object in Cloud Storage. However, just because Cloud Storage is a global file system, doesn't mean you can forget about latency considerations. You are better off storing the data close to your compute nodes. However, what happens about service disruption? You need to distribute your apps and data across multiple zones to protect yourself in case a single zone goes down. So for example, if a zone suffers a power outage. So you can leverage zones in different regions if you need to, for even additional redundancy. So a zone is an isolated location within a region. It is named region name-a zone, letter. And then finally, for global availability. So if you're building a global application where you have customers spread across the globe, then you would want to distribute your apps and data across regions.