Name: Distributed Computing with Spark SQL
Brand: University of California, Davis
Availability: OnlineOnly
Rating: 4.5 (163 reviews)

Back to Distributed Computing with Spark SQL

Learner Reviews & Feedback for Distributed Computing with Spark SQL by University of California, Davis

4.5

stars

661 ratings

About the Course

This course is all about big data. It’s for students with SQL experience that want to take the next step on their data journey by learning distributed computing using Apache Spark. Students will gain a thorough understanding of this open-source standard for working with large datasets. Students will gain an understanding of the fundamentals of data analysis using SQL on Spark, setting the foundation for how to combine data with advanced analytics at scale and in production environments. The four modules build on one another and by the end of the course you will understand: the Spark architecture, queries within Spark, common ways to optimize Spark SQL, and how to build reliable data pipelines. The first module introduces Spark and the Databricks environment including how Spark distributes computation and Spark SQL. Module 2 covers the core concepts of Spark such as storage vs. compute, caching, partitions, and troubleshooting performance issues via the Spark UI. It also covers new features in Apache Spark 3.x such as Adaptive Query Execution. The third module focuses on Engineering Data Pipelines including connecting to databases, schemas and data types, file formats, and writing reliable data. The final module covers data lakes, data warehouses, and lakehouses. Students build production grade data pipelines by combining Spark with the open-source project Delta Lake. By the end of this course, students will hone their SQL and distributed computing skills to become more adept at advanced analysis and to set the stage for transitioning to more advanced analytics as Data Scientists....

Top reviews

Jun 9, 2020

I highly recommend this course for anyone in the BI and Data space interested in learning Spark. The course gives an easy to understand to the framework and applicable hands on examples.

May 13, 2020

Amazing course that really cuts through the fundamentals of using distributed computing power to analyze and manipulate data. Well organised structure on fundamentals

Filter by:

1 - 25 of 162 Reviews for Distributed Computing with Spark SQL

By Steven O

•

Apr 5, 2020

A more appropriate title for the class would be "a brief introduction to Databricks". Very disappointing class. There are Youtube tutorials out there with more content than this class. This is one of the only classes that I have ever taken on Coursera where I could complete 2 weeks worth of all the lectures, assignments, and quizzes in a Sunday afternoon. I think this class was hastily slapped together, there is so little content. If your organization uses Spark and is not a Databricks client (as mine is), you will learn absolutely nothing here. The lectures are extremely short and devoid of any substance. I am still looking for a good online class in Spark. It certainly is not this one.

By Sacha v W

•

Feb 19, 2020

very superficial using databricks. The courses misses depth to be of any use. It is more a Databricks commercial. Executing pieces of available course without sufficient practice

By Alex C

•

May 27, 2020

it was an interesting course in as much as it has got me interested in spark and it was doable. I think it tried to cover too much ground in not enough depth. After completing I have gone off and am doing the datacamp spark courses which are also interesting.

The implementation stuff in databricks was really annoying in that the platform used a ´´ whatever it actually was - i still dont know!!!! i just had to copy and paste it every time...it was never mentioned that it didnt work like sql with [] or that it wasnt a apostrophe or whatever.

The use of jupyter notebooks itself was nice, and the exercises were also nice as a learning exercise, i got a lot out of them by having to actually find out some things and see ah ha thats how it works.

The presenters were very good. I could be critical of a few points but i wont as i am guessing its there first mooc or so, and my personal opinions are irrelevant in my annoyances :-)

All in all a nice course as it has good me interested and actually up and running with spark, so i can see where and how it fits and will look further...

Many thansk!

By Bryan B

•

Jul 5, 2020

The first module felt more like a sales pitch for DataBricks than anything else, and the last module was about machine learning, and not distributed computing. So, in my opinion, only 2 of the weeks attempted to focus on distributed computing, but even they failed. The course seemed to focus way more on SQL, and less on Spark and how it works. Sure, there were pieces of information on how to how to change the number of partitions, but how partitions work, or how Spark actually handles distributed computing was lackluster at best. If you have even a rudimentary understanding of data engineering, you should be able to ace this course with minimal effort, but you'll likely not take much away from it. Great course for absolute beginners though.

By Palak S

•

Jun 6, 2020

I did not like the flow of content explained! I expected a lot from this course but at then end I just have basic idea of queries at the end of the course! Nothing in deep about Spark's core concepts. Also the assignment quiz on queries were very weird and not properly formed! The Week 3 assignmnet was not displaying feedback! It was a really messy course!

By Sahil G

•

Aug 15, 2022

The course assumed that the learner knows a lot of things about apache spark beforehand. It did not explain concepts very deeply, given very brief overview in a very haste manner. There should have been given some solid understanding of concepts.

By George T

•

Jun 10, 2020

I highly recommend this course for anyone in the BI and Data space interested in learning Spark. The course gives an easy to understand to the framework and applicable hands on examples.

By Elliot T

•

Jul 13, 2020

Great introduction to Spark with Databricks that seems to be an intuituve tool! Really cool to do the link between SQL and Data Science with a basic ML example!

By Dilin J K J

•

Feb 11, 2020

This has been an amazing course. What is worth mentioning is how the content was delivered. Nice hands on. Highly recommended for anyone who is new to Spark

By Joseph B

•

Jan 6, 2020

Extremely informative for those who are seeking to learn the fundamentals for distributed computing using Spark SQL.

By Daniel Y

•

Sep 9, 2020

very useful

By Noah M

•

May 10, 2020

A highly polished presentation, however I still feel only a superficial understanding of partitions and other Spark optimisation techniques. In Course 4 of this Specialization, I had to google myself how best to set partition parameters (ie. how to choose a value) which perhaps shouldve been covered in this course.

High-level definitions are given, but not so much in way of actual application to clarify the concepts.

By Kumar S

•

May 14, 2020

Amazing course that really cuts through the fundamentals of using distributed computing power to analyze and manipulate data. Well organised structure on fundamentals

By Shubhang K

•

Jun 13, 2022

A good course to learn the fundamentals of databricks, distribtued computing, and spark unified analytics platform.

By Cheuk M J H

•

Aug 17, 2022

Solid course. Taught me lots of useful things

By Yolanda S C A

•

Jun 20, 2022

this course was very interesting and helped me to better understand how I can use SQL.

By Zaynul A

•

Mar 4, 2020

Expecting more advance material

By Colleen

•

Dec 21, 2020

This course was a good introduction to Spark SQL, but they tried to cram too much into one course and ended up not being very clear about most of the material. This could be broken down into a specialization and it would be more effective. I got through by Googling a lot, and I already have a background in Machine Learning, but for someone new to the material it would be overwhelming. I wish they'd explained things like how parallelization works more effectively instead of rushing through too much material. The instructors clearly know their stuff, but they need to work on breaking these concepts down for people who don't have much experience in computer science.

By k b

•

Dec 23, 2021

Even though the course is informative, it lacks support from course instructors. You would find many unanswered questions on the discussion board. Coursera should look into that.

By Sizhe L

•

Oct 26, 2019

video quality needs to be improved. Be careful about the last assignment. The accuracy asked in the question is the accuracy over the training data.

By Chen W

•

May 20, 2020

great course. but the last assignment has too many coding problems to fix after q2. dont know why

By Ho y L

•

Dec 18, 2021

difficult for new beginners to learn

By Federico G

•

Apr 14, 2023

The course was too introductory.

It is named "Distributed Computing with Spark SQL" but it seems to be a DataBricks advertisement course. It use their platform, it only shows their tools. Also, after starting this course I was contacted to my email by a DataBricks seller... I wasn't expecting this course to be a way to collect people information to spam with selling emails.

The Peer Review assignment ask you to upload a file produced by DataBricks which includes your email, so your reviewer is able to contact you by your personal/work email!

By Renzo S M A

•

Jun 3, 2022

databricks is a good platform but their free version of it is lacking support for the coursera lesson

By Obira D

•

Apr 30, 2022

This was a very nice introuduction to Apache Spark and using it on Data bricks, with best guides for large data, parallel SQl work loads, Caches, Parquet and Delta files VS normal text files, partioning data and how you can (SQL, python, Java, Scale concurrenlty in databricks notebooks for querries, analysis and data management).

Then their is the portion on traditional data warehouses , data lakes and benefits of the delta data lake (lake house) that achieves both data warehouse and data lake functionality through a medallion architecture (brone, silver ang gold versions with each version much more cleaner than the latter and a delta transaction log for some kind of versioning).

Higly recommended, though Databricks is run entirely in the cloud, I wonder if one have an on premise setup on their own datacenter or private cloud.

Higly recommended.