Hadoop Traffic Analysis

This module was created for CSInParallel by Jeffrey Lyman in 2014 (JLyman@macalester.edu)

The purpose of this module is to teach students how to analyze datasets distributed over multiple files using the Hadoop framework. It is assumed that students are already familiar with the basics of hadoop and CSInParallel’s Web Map Reduce hadoop interface.

The exercises in this module use a dataset from the UK department of Transportation that contains detailed records of traffic accidents split into three separate files.

The dataset can be obtained from Academic Torrents. More about the source of this public data from the United Kingdom can be found on its Wikipedia page.