The first case study is a fictional company called Flowlogistic. Flowlogistic is a global logistics and transport company. They started off as a regional trucking company. Now they've grown internationally, and they want to continue growing. But they've run into some problems with their current data engineering solution. Flowlogistic has grown, the company now encompasses railway, trucking, air transport, and ocean-based shipping of parcels for its customers. One of the factors that's allowed them to grow is proprietary, real-time tracking technology. The case information doesn't go into much detail about this technology or explicitly describe how information is transmitted from parcels back to the data center. However, we can assume that it's something like a small package or box that's tossed in with the shipment and reports on the location of the cargo in real time. We must assume that there's sufficient connectivity to send tracking information in real time. Immediately, without any more information, we should be thinking about the multiplicity of these devices and how to manage them and their connections. That might indicate that a consideration for part of the solution could be Cloud IoT Core, which is used to manage device connections for internet of things devices. Cloud IoT Core doesn't store or accumulate data. So we would need something that can handle the message demand of the devices and control the flow of the information to a data center resource. In the case study, we read that Flowlogistic isn't able to deploy or scale their solution due to a messaging issue with the technology stack, which is based on Kafka. The case study tells us that Kafka has a design-based bottleneck that limits performance and prevents it from scaling up beyond a certain volume. The Google Cloud platform service that provides sophisticated messaging is Cloud Pub/Sub. And it doesn't have Kafka's design limitation. So capacity isn't going to be a problem. We could use Cloud Pub/Sub to buffer and aggregate messages. It seems like this is going to be an essential element of the solution. Notice that we haven't really gotten to the value or goals of the company, and we already have elements of a solution forming. The core business value of the company is analysis. It wants to identify target customers, new customers or existing customers that might offer new business, and use historical data or perform predictive analysis. In other words, the company wants to know in advance when a shipment will be delayed. And using this intelligence, it can help mitigate the business impact of the delay. A little bit of foreknowledge is a very valuable thing. Now, if the company's interested in analysis, and not only that but predictive analysis, what kinds of technologies are candidates? What specific technologies or services on Google Cloud Platform should be considered? Well, analysis probably means BigQuery, at least if it's interactive. That might mean Data Studio for visualization and that predictive analysis sounds like a machine learning model. But it sounds like they're already doing this kind of work in their data center. That would probably mean that they're using Hadoop. It would be one case if they were starting from scratch, we might send them directly to Cloud ML Engine and TensorFlow and so forth. But given that this is a big and expanding company, they probably already have significant investment in their current data center solution. And that means we might want to first help them migrate to the cloud, overcome their immediate business limitation, which is not being able to scale. And then look at improvements in other services in another phase. So I'm thinking that this is going to be a variation on lift and shift scenario where we're going to pick and choose which technologies to move as is, from the data center, and which to suplant or replace with other technologies. In the immediate business goals, their key issue is overcoming data center scaling limits. Real-time inventory tracking is important. And analytics followed by predictive analytics is next most important. So, the case itself is telling what the priorities are for the business. And that helps define what will be most important in a solution. Notice that the case study doesn't include this table about the business. It's important for you to start filling out a table like this mentally when you're reading a case study. You won't have scratch paper or notepad during the exam. So, you can't actually write anything down during the exam. But what you can do is start preparing to be an effective reader of case studies. And identify what's most important and how basic information starts to shape a possible solution. Flowlogistic has found that their data center technologies have become a ceiling for growth. The data center can't keep up with the capacity and doesn't offer the machine learning and analysis options it would like to leverage in the future. The data center and Kafka messaging in particular is now a key limiting factor. Moving to Cloud will enable global expansion and will allow Flowlogistic to continue to meet its business goals. Over the next several slides we're going to tear apart and analyze the case. We do this one step at a time. First are business requirements, second are technical requirements, and finally are technical watchpoints. Requirements are usually conditions, the solution has to meet those conditions, or it won't be accepted. A technical watchpoint is information that starts to indicate a solution. Let's start with the business requirements. In the actual data engineer job, conversations tend to use the language of the industry rather than the language of data engineering. Technical leaders will likely discuss infrastructure and architecture rather than communicate in data engineering terms. You need to consider these statements in the context of the business requirements to identify elements that are important to the data engineering solution. There are key words and phrases in the case study that map directly to data engineering solutions, or imply data engineering solutions. Here are a few common business requirements that executives often state. Create more revenue opportunities, reduce costs to increase profits, differentiate from competitors. Keywords and phrases are clues that help the data engineer narrow the possible solutions to a specific or best solution. The same is true for the exam questions. Can you identify information that will help drive data engineering recommendations. Here's some examples, keywords and phrases like track every shipment, aggregate, and analytics map perfectly to the ETL paradigm extract, transform, and load. Phrases like rapid provisioning and business agility also suggest that speed and latency should be important considerations in the solution. Notice that these business requirements do not clearly any particular technology solution. So the tip here is to separate business requirements from technical requirements. And to be begin noticing words and phrases that will help you to find the solution. On the exam, the technical requirements will often result in a couple of equally likely candidate solutions. You should use the business requirements as the tiebreaker to determine the solution that's best for the business and not just technically feasible. Let's take a look at the technical requirements. The technical requirements are the most important indicators and guide for shaping the data engineering solution. Managed services mean they want to do as little infrastructure administration as they can with a scaling and elastic service. Google Cloud in general, and big data products specifically, are designed with a managed services philosophy. Remember that in Google Cloud jargon, a managed service means that the service may still identify an individual instance or a cluster to the user. So there's still potentially some IT overhead along with customization and control. A serverless service, on the other hand, means that there is no individual instance or cluster visible to the user. So the IT overhead is abstracted away. If you are consulting to a real customer, you'd want to clarify a statement like this to make sure you understand what service it is the customer is willing to consider. Migrating Hadoop points to the clients thinking and maturity with data engineering technology. You should expect to encounter system like HDFS, Spark, Hive and Kafka, popular open source implementations of different workloads. This is a clear indicator of the GCP product to use in the solution, which is Cloud Dataproc. Notice that this technical requirement starts to directly imply one or more technical solutions. Both streaming and batch, there's a phrase that indicates how they want to transform their data streams. You should immediately be thinking that Cloud DataFlow can process both streaming and batch with the same pipeline solution. So it should be a candidate that, however they may have an investment in the existing software running on Hadoop. So streaming to cloud data pro might be our first step and cloud data flow consideration might be for a later phase. Encrypt and connect a VPN highlights the clients bias towards towards security. This isn't the core of data engineering solutions on GCP, but it is part of the infrastructure and part of the requirements of the job. So it's important to be familiar with networking and security best practices. These seven categories are a great way to organize evaluation of the case study or question. I'll just point out some of the details in a few of them. Location and distribution, that's whether the solution is currently in a single data center or in multiple locations. And that'll have a direct influence over where cloud resources will be located in the solution design. Storage, which is what technology they are using to store different kinds of data. In this case, storage area network, or SAN, is being used for SQL Server storage. And NAS or network attached storage is being used for backups, logs and system images. And databases, they're using two SQL Servers, and for data processing, they're using Spark. There are 60 application servers and another 20 servers used for hosting and infrastructure. They don't have a dedicated infrastructure for machine learning or predictive analysis. So that means whatever modeling they're doing is probably on Hadoop and Spark. You've read the case study, we've been over the analysis. You m ight want to redo the analysis based on your own experience and understanding. Now it's time for you to define the solution you would recommend to FlowLogistic. Take a brief period to decide what kind of recommendation you would make. This is the activity you would do on the job and what you will want to be able to do when reading a case study as part of an exam question. When you return, we'll examine one possible solution.