MATH-511 Advanced Mathematical and Statistical Computing
Fall for 2016-2017
J K Shaw
Course description. The topics covered are all concerned with analytics and computation involving data sets which may be very large. The term analytics will refer to modeling and computation within specific mathematical frameworks such as large matrices and other definite file types, and tools that enable measuring, parsing, understanding and visualizing the data.
The course goals are to learn about big data concepts, the Hadoop ecosystem, distributed computing , the map reduce framework and applying these concepts to working with and understanding large datasets. Additional topics include big data analysis and data munging using Hadoop clusters. This is a hands-on course.
Data Analytics with Hadoop - Ben Bengfort, Jenny Kim (Early Edition available, publishing date May 2016)
Mining of Massive Datasets - Jure Leskovec, Anand Rajaraman, Jeffrey David Ullman (available online)
Field Guide to Hadoop - Kevin Sitto, Marshall Presser
Prerequisites: Math 510 Mathematical and Statistical Computing, or equivalent.
- Comfortable with the Linux command line: students should work through The Command Line Crash Course http://cli.learncodethehardway.org/book/
- Familiarity with SQL, R, and/or Python: students must be able to write a script to read in, write, and manipulate data
- Familiarity with data mining techniques
- Math skills: probability, statistics, matrices
Must be enrolled in one of the following Levels:
MN or MC Graduate
Must be enrolled in one of the following Majors:
Mathematics and Statistics
Prerequisites: Math 510 or the equivalent