Spatial (map) is considered as a core infrastructure of modern IT world, which is substantiated by business transactions of major IT companies such as Apple, Google, Microsoft, Amazon, Intel, and Uber, and even motor companies such as Audi, BMW, and Mercedes. Consequently, they are bound to hire more and more spatial data scientists. Based on such business trend, this course is designed to present a firm understanding of spatial data science to the learners, who would have a basic knowledge of data science and data analysis, and eventually to make their expertise differentiated from other nominal data scientists and data analysts. Additionally, this course could make learners realize the value of spatial big data and the power of open source software's to deal with spatial data science problems.
This course will start with defining spatial data science and answering why spatial is special from three different perspectives - business, technology, and data in the first week. In the second week, four disciplines related to spatial data science - GIS, DBMS, Data Analytics, and Big Data Systems, and the related open source software's - QGIS, PostgreSQL, PostGIS, R, and Hadoop tools are introduced together. During the third, fourth, and fifth weeks, you will learn the four disciplines one by one from the principle to applications. In the final week, five real world problems and the corresponding solutions are presented with step-by-step procedures in environment of open source software's.

Na lição

Practical Applications of Spatial Data Science

The sixth module is entitled to "Practical Applications of Spatial Data Science", in which five real-world problems are introduced and corresponding solutions are presented with step-by-step procedures in the solution structures and related open source software's, discussed in Module 2. The first lecture presents an example of Desktop GIS, in which only QGIS is used, to find the top 5 counties for timberland investment in the southeastern states of the U.S, in which simple differencing of demand and supply is applied to figure out counties of large deficit of timber supply in comparison with timber demand. In the second lecture, an example of sever GIS, in which QGIS and PostgreSQL/PostGIS are used, will be presented as a solution for a given problem of NYC spatial data center, which required multiple user access and different levels of privileges. The third lecture presents an example of spatial data analytics, in which QGIS and R are used, to find out any regional factors which contribute to higher or lower disease prevalence in administrative districts, for which spatial autocorrelation analysis is conducted and decision tree analysis is applied. The fourth lecture is another example of spatial data analytics, to find optimal infiltration routing with network analysis, in which cost surface is produced and Dijkstra's algorithm is used. The fifth lecture is an example of spatial big data management and analytics, in which QGIS, PostGIS, R, and Hadoop MapReduce are all used, to provide a solution of "Passenger Finder", which can guide to the places where more passengers are waiting for taxi cabs. For the solution, spatial big data, taxi trajectory, are collected, and noise removal and map matching are conducted in Hadoop environment. Then, a series of spatial data processing and analysis such as spatial join in PostGIS, hotspot analysis in R are conducted in order to provide the solution. All in all, learners will realize the value of spatial big data and power of the solution structure with combination of four disciplines.