Richard Pelgrim's Projects
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Pre-trained Transformers for the Arabic Language Understanding and Generation (Arabic BERT, Arabic GPT2, Arabic Electra)
Largest list of Arabic stop words on Github. أكبر قائمة لمستبعدات الفهرسة العربية على جيت هاب
This is a repo for my missing persons database written in Python
Find real-time sales with AI-powered Python API using ChatGPT and LLM (Large Language Model) App.
Coiled resources! Including notebooks to accompany blog posts and videos.
This repo contains testing material related to Coiled / Dask
The repository contains a collection of Arabic tweets IDs associated with the novel coronavirus COVID-19. The dataset contains Tweets' ids from 2020-01-01 to 2020-04-30. The Twitter search API was used to gather real-time tweets that contained specific keywords in the Arabic language. The dataset contains almost four millions and half Arabic tweets.
An orchestration platform for the development, production, and observation of data assets.
Getting started with Dagster
Parallel computing with task scheduling
Dask development blog
A Delta Lake reader for Dask
Easy-to-run example notebooks for Dask
This repo contains a short version of a dask tutorial.
Scrape and analyse Dask-tagged questions on StackOverflow
Dask tutorial
Fork of mrocklin's dask tutorial
A Pythonic introduction to methods for scaling your data science and machine learning work to larger datasets and larger models, using the tools and APIs you know and love from the PyData stack (such as numpy, pandas, and scikit-learn).
reproducible benchmark of database-like ops
Tutorial for onboarding dbt
Delta Lake examples
A distributed task scheduler for Dask
Adapted exercise from here: https://github.com/Featuretools/predict-customer-churn/blob/master/churn/3.%20Feature%20Engineering.ipynb