Jonathan Reis's Projects
A curated list to learn about distributed systems
Best Practices and Style Guides in BEEVA
CKAN is an open-source DMS (data management system) for powering data hubs and data portals. CKAN makes it easy to publish, share and use data. It powers datahub.io, catalog.data.gov and europeandataportal.eu/data/en/dataset among many other sites.
Data Engineering Repository
The Data Engineering Cookbook
Repository for the Data Migration Team
Docker Ubuntu Focal base image
This is a docker containing refine and many extensions (included rdf)
Collection of tools and code examples to demonstrate best practices in using Amazon EC2 Spot Instances.
Amazon EC2 instance comparison site
[Deprecated and unmaintained] Uses boto to retrieve current spot instance prices on Amazon EC2.
A command-line tool for launching Apache Spark clusters.
Ready-to-run Docker images containing Jupyter applications
Katacoda Scenarios
This is a collection of tutorials for learning how to use Docker with various tools. Contributions welcome.
Code repository for Learning PySpark by Packt
Scripts and tools for troubleshooting and performance analysis in Linux. This includes dynamic tracing scripts with SystemTap both for system calls and for userspace function tracing.
Scripts and code examples. Includes Spark notes, Jupyter notebook examples for Spark, Impala and Oracle.
Serverless proxy for Spark cluster
Run MapReduce jobs on Hadoop or Amazon Web Services
📘 The interactive computing suite for you! ✨
Tips and tricks for getting through on-call
A collective list of free APIs
An open collection of Python anti-patterns and worst practices.
"The mother of all demo apps" — Exemplary fullstack Medium.com clone powered by React, Angular, Node, Django, and many more 🏅
Script SQL Server configuration information in a format suitable for DR purposes or checking into a source control system