An implemenation of the PageRank algorithm using the MapReduce model in Hadoop
Hadoop 3.2.x
Python 3.6
and later
Clone this repository:
git clone https://github.com/roshan-d21/page-rank.git
Start Hadoop
$HADOOP_HOME/sbin/start-all.sh
Move the SNAP dataset into hdfs
cd page-rank
$HADOOP_HOME/bin/hdfs dfs -put ./web-Google.txt /input_SNAP
Pick one of the implementations and cd
into the corresponding directory:
cd AdjacencyList
OR
cd SparseMatrix
Configure file paths in iterate-hadoop.sh
Configure the convergence value in check_conv.py
Give necessary file permissions using:
chmod 755 -R .
Finally, execute the script using:
sh iterate-hadoop.sh SNAP
The PageRank calculated for each node is stored in the file v