Course Outline

Outline

The course materials are organized to cover an overview of the subject matter and four topics in detail: big data analytics, big data computing environment, machine learning techniques and scaling up machine learning. Therefore, the course materials are divided into five areas:

Conceptualization and summarization: Students will be introduced to a technique to conceptualize and summarize big data problem using "thinking with examples for effective learning." It will help them on representation of data, modeling of machine learning, and application of big data computing technologies.

Understanding of data and big data: It includes representation learning, publicly available datasets, scalability and scaling up techniques, and report writing using Latex.

Understanding of big data systems: It includes modern data analytics technologies like Hadoop and MapReduce, suitable programming languages like Python, Java and C, big data friendly machine learning libraries, software platforms like Matlab or R.

Understanding of machine learning techniques: It includes three phases of machine learning, types of learning, support vector machine, decision trees, random forests, and deep learning.

Understanding of scaling up machine learning: It includes dimensionality reduction like principal component analysis and feature hashing. Online processing technique called stochastic gradient descent.

Opensource

Several open-source materials will be used to deliver the course to both undergraduate and graduate students. The big data analytics using machine learning is an emerging topics and thus there are many useful Internet resources that can provide benefits to students.