A 30 minute minute demo of some of the 'less used in data science' software tools that I’ve been using. For example: Clojure instead of Python, Clerk instead of Jupyter, DVC instead of run-all.sh.
These need to have been installed on the computer. Their exact version numbers aren’t crucial but they probably need to be year 2022 -ish versions.
# Git
$ git --version
git version 2.32.1
# Java
$ java --version
openjdk 17.0.6 2023-01-17
OpenJDK Runtime Environment Homebrew (build 17.0.6+0)
OpenJDK 64-Bit Server VM Homebrew (build 17.0.6+0, mixed mode, sharing)
# Clojure
$ clj --version
Clojure CLI version 1.11.1.1208
# Python
$ python3 -VV
Python 3.8.9 (default, Apr 13 2022, 08:48:06)
[Clang 13.1.6 (clang-1316.0.21.2.5)]
# R
$ R --version
R version 4.2.2 (2022-10-31) -- "Innocent and Trusting"
# DVC
$ dvc --version
2.43.0
Install a local copy of this repository.
# Clone this repo
$ git clone https://github.com/ash-mcc/demo-alt-datasco-tools.git
$ cd demo-alt-datasco-tools
Python libraries
# Install Python libraries
$ python3 -m pip install -r requirements.txt
R libraries
# Install R libraries
$ R
> install.packages("RServe")
> install.packages("GGally")
> install.packages("Cairo")
> x <- installed.packages(); x[ is.na(x[,"Priority"]), c("Package", "Version")]
Package Version
"Cairo" "1.6-0"
"GGally" "2.1.2"
"Rserve" "1.8-12"
# Needed to install the XQuartz lib (X11 for mac)
# to make clojisr.v1.applications.plotting/plot->file
# fully work
Commands to demo tracked ML experiments over an Iris dataset.
# Get the data
$ dvc update data/*.dvc
# Run the whole pipeline
$ dvc repro
# Show the metrics
$ dvc exp show
# Pipe those metrics into a CSV
$ dvc exp show --csv --all-branches > show.csv
# Show those metrics as a parallel coordinates plot
$ dvc exp show --pcp --all-branches