Some tutorials and demos on Hadoop, Spark, etc., mostly in the form of Jupyter notebooks.
- Hadoop single-node setup on Google Colab
- Hadoop_Setting_up_a_Single_Node_Cluster.ipynb Set up a single-node Hadoop cluster on Google Colab and run some basic HDFS and MapReduce examples
- Hadoop_single_node_cluster_setup_Python.ipynb Set up a single-node Hadoop cluster on Google Colab using Python
- Apache Spark standalone on Google Colab
- Hadoop_Setting_up_Spark_Standalone_on_Google_Colab.ipynb Set up a single-node Spark server on Google Colab and estimate „π“ with a Montecarlo method
- mapreduce_with_bash.ipynb An introduction to MapReduce using MapReduce Streaming and bash to create mapper and reducer
- simplest_mapreduce_bash_wordcount.ipynb A very basic MapReduce wordcount example
- mrjob_wordcount.ipynb A simple MapReduce job with mrjob
- Hadoop_spilling.ipynb Hadoop spilling explained
- TestDFSio.ipynb Demo of TestDFSio for benchmarking Hadoop clusters
- demoSparkSQLPython.ipynb Pyspark basic demo
- ngrams_with_pyspark.ipynb Basic example of ngrams generation with pyspark
- Encoding+dataframe+columns.ipynb Encoding Spark dataframe columns
- Unicode.ipynb Exploring Unicode categories ()
- polynomial_regression.ipynb Worked out example of polynomial regression with numpy
- generate_data_with_Faker.ipynb Generate fake data with the Faker Python library
- Virtualization
- docker_for_beginners.md Docker for beginners: an introduction to the world of containers
- Terraform for beginners.md Getting started with Terraform
- online_resources.md Online resources for learning Big Data