Map-reduce Computing for Introductory Students using WebMapReduce

Last Updated: Nov. 13, 2012

The first section below is preliminary reading for any of the following three sections. It describes a map-reduce system, or framework, using distrubted computers in a cluster to carry out analysis of massive amounts of data concurently, or “in parallel”, by having processes on each computer work on small portions of a much larger dataset. The next section, Using WebMapReduce (WMR), walks you through a simple, basic example of counting words in text files. The third section, entitled Counting words with WebMapReduce (WMR): adding efficiency, shows a slightly revised version of the basic word counting that is more efficient than basic example, with some small modifications to the code and the files it works on. The fourth section provides further example data and analyses that shows the range of what you can do with map-reduce systems.

Next topic

Using Parallelism to Analyze Very Large Files: Google’s Map Reduce Paradigm