Cloud Computing Applications, Part 2: Big Data and Applications in the Cloud

University of Illinois at Urbana-Champaign

Cloud Computing Applications, Part 2: Big Data and Applications in the Cloud

This course is part of Cloud Computing Specialization

Taught in English

Some content may not be translated

Instructors: Reza Farivar

30,376 already enrolled

Included with Coursera Plus

Learn more

Course

Gain insight into a topic and learn the fundamentals

4.3

(327 reviews)

19 hours (approximately)

Flexible schedule

Learn at your own pace

Prepare for a degree

Learn more

View course modules

Skills you'll gain

Details to know

Shareable certificate

Add to your LinkedIn profile

Assessments

5 quizzes

Course

Gain insight into a topic and learn the fundamentals

4.3

(327 reviews)

19 hours (approximately)

Flexible schedule

Learn at your own pace

Prepare for a degree

Learn more

View course modules

See how employees at top companies are mastering in-demand skills

Learn more about Coursera for Business

Build your subject-matter expertise

This course is part of the Cloud Computing Specialization

When you enroll in this course, you'll also be enrolled in this Specialization.

Learn new concepts from industry experts
Gain a foundational understanding of a subject or tool
Develop job-relevant skills with hands-on projects
Earn a shareable career certificate

Earn a career certificate

Add this credential to your LinkedIn profile, resume, or CV

Share it on social media and in your performance review

There are 5 modules in this course

Welcome to the Cloud Computing Applications course, the second part of a two-course series designed to give you a comprehensive view on the world of Cloud Computing and Big Data!

In this second course we continue Cloud Computing Applications by exploring how the Cloud opens up data analytics of huge volumes of data that are static or streamed at high velocity and represent an enormous variety of information. Cloud applications and data analytics represent a disruptive change in the ways that society is informed by, and uses information. We start the first week by introducing some major systems for data analysis including Spark and the major frameworks and distributions of analytics applications including Hortonworks, Cloudera, and MapR. By the middle of week one we introduce the HDFS distributed and robust file system that is used in many applications like Hadoop and finish week one by exploring the powerful MapReduce programming model and how distributed operating systems like YARN and Mesos support a flexible and scalable environment for Big Data analytics. In week two, our course introduces large scale data storage and the difficulties and problems of consensus in enormous stores that use quantities of processors, memories and disks. We discuss eventual consistency, ACID, and BASE and the consensus algorithms used in data centers including Paxos and Zookeeper. Our course presents Distributed Key-Value Stores and in memory databases like Redis used in data centers for performance. Next we present NOSQL Databases. We visit HBase, the scalable, low latency database that supports database operations in applications that use Hadoop. Then again we show how Spark SQL can program SQL queries on huge data. We finish up week two with a presentation on Distributed Publish/Subscribe systems using Kafka, a distributed log messaging system that is finding wide use in connecting Big Data and streaming applications together to form complex systems. Week three moves to fast data real-time streaming and introduces Storm technology that is used widely in industries such as Yahoo. We continue with Spark Streaming, Lambda and Kappa architectures, and a presentation of the Streaming Ecosystem. Week four focuses on Graph Processing, Machine Learning, and Deep Learning. We introduce the ideas of graph processing and present Pregel, Giraph, and Spark GraphX. Then we move to machine learning with examples from Mahout and Spark. Kmeans, Naive Bayes, and fpm are given as examples. Spark ML and Mllib continue the theme of programmability and application construction. The last topic we cover in week four introduces Deep Learning technologies including Theano, Tensor Flow, CNTK, MXnet, and Caffe on Spark.

You will become familiar with the course, your classmates, and our learning environment. The orientation will also help you obtain the technical skills required for the course.

What's included

1 video4 readings1 quiz1 discussion prompt1 plugin

1 videoTotal 26 minutes

Welcome to Cloud Applications, Part 2!26 minutesPreview module

4 readingsTotal 40 minutes

Syllabus10 minutes
About the Discussion Forums10 minutes
Updating Your Profile10 minutes
Social Media10 minutes

1 quizTotal 30 minutes

Orientation Quiz30 minutes

1 discussion promptTotal 60 minutes

Getting to Know Your Classmates60 minutes

1 pluginTotal 15 minutes

Welcome! Please tell us about yourself.15 minutes

In Module 1, we introduce you to the world of Big Data applications. We start by introducing you to Apache Spark, a common framework used for many different tasks throughout the course. We then introduce some Big Data distro packages, the HDFS file system, and finally the idea of batch-based Big Data processing using the MapReduce programming paradigm.

What's included

13 videos1 reading1 quiz

13 videosTotal 108 minutes

1.1.1 Motivation for Spark8 minutesPreview module
1.1.2 Apache Spark11 minutes
1.1.3 Spark Example: Log Mining9 minutes
1.1.4 Spark Example: Logistic Regression7 minutes
1.1.5 RDD Fault Tolerance4 minutes
1.1.6 Interactive Spark4 minutes
1.1.7 Spark Implementation4 minutes
1.2.1 Introduction to Distros3 minutes
1.2.2 Hortonworks23 minutes
1.2.3 Cloudera CDH2 minutes
1.2.4 MapR Distro2 minutes
1.3.1 HDFS Introduction15 minutes
1.3.2 YARN and MESOS9 minutes

1 readingTotal 10 minutes

Module 1 Overview10 minutes

1 quizTotal 30 minutes

Module 1 Quiz30 minutes

In this module, you will learn about large scale data storage technologies and frameworks. We start by exploring the challenges of storing large data in distributed systems. We then discuss in-memory key/value storage systems, NoSQL distributed databases, and distributed publish/subscribe queues.

What's included

24 videos1 reading1 quiz

24 videosTotal 303 minutes

Module 2 Introduction5 minutesPreview module
2.1.1 Introduction to MapReduce with Spark3 minutes
2.1.2 MapReduce: Motivation15 minutes
2.1.3 MapReduce Programming Model with Spark9 minutes
2.1.4 MapReduce Example: Word Count9 minutes
2.1.5 MapReduce Example: Pi Estimation & Image Smoothing15 minutes
2.1.6 MapReduce Example: Page Rank13 minutes
2.1.7 MapReduce Summary4 minutes
2.2.1 Eventual Consistency – Part 110 minutes
2.2.2 Eventual Consistency – Part 220 minutes
2.2.3 Consistency Trade-Offs4 minutes
2.2.4 ACID and BASE19 minutes
2.2.5 Zookeeper and Paxos: Introduction10 minutes
2.2.6 Paxos17 minutes
2.2.7 Zookeeper16 minutes
2.3.1 Cassandra Introduction27 minutes
2.3.2 Redis7 minutes
2.3.3 Redis Demonstration14 minutes
2.4.1 HBase Usage API15 minutes
2.4.2 HBase Internals - Part 117 minutes
2.4.3 HBase Internals - Part 29 minutes
2.4.4 Spark SQL8 minutes
2.5.5 Spark SQL Demo8 minutes
2.5.1 Kafka17 minutes

1 readingTotal 10 minutes

Module 2 Overview10 minutes

1 quizTotal 30 minutes

Module 2 Quiz30 minutes

This module introduces you to real-time streaming systems, also known as Fast Data. We talk about Apache Storm in length, Apache Spark Streaming, and Lambda and Kappa architectures. Finally, we contrast all these technologies as a streaming ecosystem.

What's included

18 videos1 reading1 quiz

18 videosTotal 216 minutes

Module 3 Introduction10 minutesPreview module
3.1.1 Streaming Introduction9 minutes
3.1.2 "Big Data Pipelines: The Rise of Real-Time"7 minutes
3.1.3 Storm Introduction: Protocol Buffers & Thrift15 minutes
3.1.4 A Storm Word Count Example3 minutes
3.1.5 Writing the Storm Word Count Example10 minutes
3.1.6 Storm Usage at Yahoo3 minutes
3.2.1 Anchoring and Spout Replay17 minutes
3.2.2 Trident: Exactly Once Processing10 minutes
3.3.1 Inside Apache Storm9 minutes
3.3.2 The Structure of a Storm Cluster4 minutes
3.3.3 Using Thrift in Storm10 minutes
3.3.4 How Storm Schedulers Work12 minutes
3.3.5 Scaling Storm to 4000 Nodes14 minutes
3.3.6 Q&A with Bobby Evans (Yahoo) on Storm32 minutes
3.4.1 Spark Streaming18 minutes
3.4.2 Lambda and Kappa Architecture4 minutes
3.4.3 Streaming Ecosystem24 minutes

1 readingTotal 10 minutes

Module 3 Overview10 minutes

1 quizTotal 30 minutes

Module 3 Quiz30 minutes

In this module, we discuss the applications of Big Data. In particular, we focus on two topics: graph processing, where massive graphs (such as the web graph) are processed for information, and machine learning, where massive amounts of data are used to train models such as clustering algorithms and frequent pattern mining. We also introduce you to deep learning, where large data sets are used to train neural networks with effective results.

What's included

18 videos1 reading1 quiz1 discussion prompt1 plugin

18 videosTotal 173 minutes

4.1.1 Graph Processing22 minutesPreview module
4.1.2 Pregel - Part 17 minutes
4.1.3 Pregel - Part 211 minutes
4.1.4 Pregel - Part 36 minutes
4.1.5 Giraph Introduction6 minutes
4.1.6 Giraph Example4 minutes
4.1.7 Spark GraphX15 minutes
4.2.1 Big Data Machine Learning Introduction13 minutes
4.2.2 Mahout: Introduction8 minutes
4.2.3 Mahout kmeans5 minutes
4.2.4 Mahout: Naïve Bayes9 minutes
4.2.5 Mahout: fpm6 minutes
4.2.6 Spark Naïve Bayes2 minutes
4.2.7 Spark fpm2 minutes
4.2.8 Spark ML/MLlib11 minutes
4.2.9 Introduction to Deep Learning20 minutes
4.2.10 Deep Neural Network Systems17 minutes
4.3.1 Closing Remarks1 minute

1 readingTotal 10 minutes

Module 4 Overview10 minutes

1 quizTotal 30 minutes

Module 4 Quiz30 minutes

1 discussion promptTotal 30 minutes

Final Reflections30 minutes

1 pluginTotal 15 minutes

How was the course?15 minutes

Instructors

Instructor ratings

4.8 (17 ratings)

Reza Farivar

University of Illinois at Urbana-Champaign

5 Courses65,286 learners

Roy H. Campbell

University of Illinois at Urbana-Champaign

5 Courses66,102 learners

Offered by

University of Illinois at Urbana-Champaign

Recommended if you're interested in Computer Security and Networks

University of Illinois at Urbana-Champaign
Cloud Networking
Course
University of Illinois at Urbana-Champaign
Cloud Computing Applications, Part 1: Cloud Systems and Infrastructure
Course
LearnQuest
Development Methodologies Overview
Course
LearnQuest
Test-Driven Development Project: Random Person Generator
Course

Prepare for a degree

Taking this course by University of Illinois at Urbana-Champaign may provide you with a preview of the topics, materials and instructors in a related degree program which can help you decide if the topic or university is right for you.

Why people choose Coursera for their career

Felipe M.

Learner since 2018

"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."

Jennifer J.

Learner since 2020

"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."

Larry W.

Learner since 2021

"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."

Chaitanya A.

"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."

Learner reviews

Showing 3 of 327

4.3

327 reviews

5 stars
52.29%
4 stars
30.27%
3 stars
11.92%
2 stars
3.36%
1 star
2.14%

Reviewed on Oct 30, 2016

Reviewed on May 22, 2020

Reviewed on Feb 22, 2020

View more reviews

New to Computer Security and Networks? Start here.

Open new doors with Coursera Plus

Unlimited access to 7,000+ world-class courses, hands-on projects, and job-ready certificate programs - all included in your subscription

Learn more

Advance your career with an online degree

Earn a degree from world-class universities - 100% online

Explore degrees

Join over 3,400 global companies that choose Coursera for Business

Upskill your employees to excel in the digital economy

Learn more

Frequently asked questions

Access to lectures and assignments depends on your type of enrollment. If you take a course in audit mode, you will be able to see most course materials for free. To access graded assignments and to earn a Certificate, you will need to purchase the Certificate experience, during or after your audit. If you don't see the audit option:

The course may not offer an audit option. You can try a Free Trial instead, or apply for Financial Aid.
The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.

When you enroll in the course, you get access to all of the courses in the Specialization, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile. If you only want to read and view the course content, you can audit the course for free.

If you subscribed, you get a 7-day free trial during which you can cancel at no penalty. After that, we don’t give refunds, but you can cancel your subscription at any time. See our full refund policy.

Cloud Computing Applications, Part 2: Big Data and Applications in the Cloud

Course

Skills you'll gain

Details to know

Course

See how employees at top companies are mastering in-demand skills

Build your subject-matter expertise

Earn a career certificate

There are 5 modules in this course

Course Orientation

What's included

Module 1: Spark, Hortonworks, HDFS, CAP

What's included

Module 2: Large Scale Data Storage

What's included

Module 3: Streaming Systems

What's included

Module 4: Graph Processing and Machine Learning

What's included

Instructors

Offered by

Recommended if you're interested in Computer Security and Networks

Cloud Networking

Cloud Computing Applications, Part 1: Cloud Systems and Infrastructure

Development Methodologies Overview

Test-Driven Development Project: Random Person Generator

Prepare for a degree

Master of Computer Science

Why people choose Coursera for their career

Learner reviews

New to Computer Security and Networks? Start here.

Open new doors with Coursera Plus

Advance your career with an online degree

Join over 3,400 global companies that choose Coursera for Business

Frequently asked questions

When will I have access to the lectures and assignments?

What will I get if I subscribe to this Specialization?

What is the refund policy?

More questions