This lecture is about Apache Spark. Apache Spark is a Big Data General Processing Technology. It is currently the most popular and most active big data open source project in the world and therefore you need to know the details about it. Before Spark, Hadoop was a most widely used open source big data technology and we're going to see how Hadoop is being used by Spark. Spark conducts its own cluster management, it supports batch applications, iterative algorithms, and interactive queries. Spark is independent of Hadoop, but Spark can use Hadoop for storage or for processing. Spark has a built-in machine learning library which makes it so powerful. Spark supports stream processing functionality. It has a streaming mode for real-time applications which uses micro-batch technology. Spark has been reported to be tens to hundreds of times faster than Hadoop, and we will soon see why. Spark is very fast and uses improved data processing techniques, which include In-memory ram processing, Resilient Distributed Datasets; RDD, DAG; Directed Acyclic Graphs, advanced scheduling, persisting techniques, real-time streaming, and others. Spark does not have its own unique distributed storage system but it is built to use various third party distributed file organizing systems. Many spark systems are connected to Hadoop systems because already a lot of data is already in the Hadoop distributed file systems, the HDFS. Hadoop's MapReduce is replaced with Sparks RDD and DAG and transformations and actions. Spark uses the HDFS through the YARN resource manager and we will look at details about this. Spark's advanced analytics, applications, and built-in machine learning library functions enable remarkable information extraction from data stored in HDFS and various other datasets. For Hadoop, Hadoop does not have a built-in machine learning library, so it uses a third party machine learning library Mahout. Hadoop was slow because MapReduce saves all of it's processed data in a physical storage medium after each operation to default tolerant. Normally, the storage medium is a hard disk drive and we will soon see the speed limits of hard disk drives. Hadoop repeat this process multiple times in a job, which makes it even slower. Spark applications include retailer recommendation engines, industry machinery and manufacturing monitoring and automation, prediction systems that estimate when parts will malfunction, when best to replace, and when to order replacement components, controllers for Internet of Things and cyber physical systems. Spark and Mesos were developed by the AMPLab at UC Berkeley. In 2010, Spark became an Open Source software based on a BSD, a Berkeley Software Distribution license. In 2013, Spark was donated to the Apache Software Foundation. In 2014, Apache Spark became a top-level Apache project. In 2014 May, Apache Spark was initially released. Looking into Spark's characteristics, first Spark scales a very well. Spark can be executed on clusters consisting of thousands of nodes processing petabytes size databases. Looking into Spark and Hadoop, their relationship, Hadoop and Spark are both big data technologies. Both provide some of the most popular big data tools, and both are Apache Software Foundation tools. Hadoop and Spark systems can work together and we will look at examples of this. Many Spark systems are connected to a Hadoop HDFS through YARN, both are scalable and more data drives can be added to the network as the dataset grows so therefore it's wonderful to expand and include more information, more data. Task management and data processing schemes are different. These are the references that I used and I recommend them to you. Thank you.