- Details
- Parent Category: Programming Assignments' Solutions
We Helped With This Python Programming Assignment: Have A Similar One?
Assignment Description
Assignment
This coursework requires you to write a number of MapReduce programs. These programs should be written using the Python mrjob library. Each solution should distribute computation across multiple map and/or reducer tasks.
Part 1
Given a CSV file where each line contains a set of numbers, write a MapReduce program which determines the minimum of all numbers in the file. For example, consider the following sample CSV file:
2,2,3
4,3
Given this CSV file, the minimum is 2.
Entitle the python program in question part1.py. That is, entering the following command at
the terminal should result in your MapReduce program being applied to fileName.txt and
the result being printed in the terminal.
python part1.py fileName.txt
Part 2
Given a CSV file where each line contains a set of numbers, write a MapReduce program which determines the standard deviation of all numbers in the file. For example, consider the following sample CSV file:
2,2,3
4,3
Given this CSV file, the standard deviation is 0.84.
Entitle the python program in question part2.py. That is, entering the following command at
the terminal should result in your MapReduce program being applied to fileName.txt and
the result being printed in the terminal.
python part2.py fileName.txt
Part 3
Uniform Resource Locator (URL) links describe the structure of the web. Consider a CSV file
where each line contains two URLs which specify a single link. That is, the first and second
values on each line specify the source and destination of the link in question. For example,
consider the following sample CSV file:
url1,url2
url1,url3
url2,url3
url4,url5
url2,url4
Given such a CSV file, write a MapReduce program which finds all paths of length two in the corresponding URL links. That is, it finds the triples of URLs (u, v, w) such that there is a link from u to v and a link from v to w.
For example, the sample CSV file above contains the following paths of length two:
url2, url4, url5
url1, url2, url3
url1, url2, url4
Entitle the python program in question part3.py. That is, entering the following command at
the terminal should result in your MapReduce program being applied to fileName.txt and
the result being printed in the terminal.
python part3.py fileName.txt
Part 4
Write a mapReduce program which takes as input a file where each line contains a comma
separated set of words and outputs for each word the lines that the word appears in. This is
an inverted index. For example, consider a file containing the following text:
goat,chicken,horse
cat,horse
dog,cat,sheep
buffalo,dolphin,cat
sheep
The corresponding inverted index will be the following:
"buffalo" ["buffalo,dolphin,cat"]
"cat" ["buffalo,dolphin,cat", "cat,horse", "dog,cat,sheep"]
"chicken" ["goat,chicken,horse"]
"dog" ["dog,cat,sheep"]
"dolphin" ["buffalo,dolphin,cat"]
"goat" ["goat,chicken,horse"]
"horse" ["cat,horse", "goat,chicken,horse"]
"sheep" ["dog,cat,sheep", "sheep"]
Entitle the python program in question part4.py. That is, entering the following command at
the terminal should result in your MapReduce program being applied to fileName.txt and
the result being printed in the terminal.
python part4.py fileName.txt