I attended the 15th European Conference on Python in Science (EuroScipy2023). Here in this repository I summerized the teaching materials (github links) from the workshops I have attended.
Network analysis tutorial: https://github.com/ericmjl/Network-Analysis-Made-Simple
Project Homepage / Git: github.com/networkx/networkx
Public link to supporting material: https://ericmjl.github.io/Network-Analysis-Made-Simple/ Workshop outline:
- Networks of all kinds: biological, transportation, web.
- Representation of networks, NetworkX data structures
- Introduction to NetworkX API for modelling and graph operations.
- Finding important nodes; applications
- Pathfinding algorithms and their applications
- Hands-on: implementing path-finding algorithms
- Visualize degree and betweenness centrality distributions.
- Quick introduction to GraphBLAS
- Moving between GraphBLAS and NetworkX.
- Speed up your NetworkX code by changing one line of code!
geopandas-tutorial: https://github.com/jorisvandenbossche/geopandas-tutorial
srai-tutorial: https://github.com/kraina-ai/srai-tutorial
By the end of the tutorial, attendees will be able to:
- Install and set up the SRAI library.
- Use SRAI to download and process geospatial data.
- Apply various regionalization and embedding techniques to geospatial data.
- Utilize pre-trained embedding models for clustering and similarity search.
- Build predictive models on top of SRAI embeddings
- Pre-train available models from scratch.
- Understand the potential applications and future enhancements of the SRAI library.
Tutorial notebooks: https://vincent-maladiere.github.io/survival-analysis-demo
Here is a tentative agenda:
- What is time-censored data and why it is a problem to train time-to-event regression models.
- Single event survival analysis with Kaplan-Meier using scikit-survival.
- Evaluation of the calibration of survival analysis estimators using the integrated brier score (IBS) metric.
- Predictive survival analysis modeling with Cox Proportional Hazards, Survival Forests using scikit-survival, GradientBoostedIBS implemented from scratch with scikit-learn.
- How to use a trained GradientBoostedIBS model to estimate the median survival time and the probability of survival at a fixed time horizon.
- Inspecting the learned statistical association between input features and survival probabilities using partial dependence plot.
Workshop Outline:
- Machine Learning 101 (10 min.)
- What is scikit-learn? (5 min.)
- Practical Part (+60 min.)
- Predictive modeling pipeline
- Evaluation of models
- Hyperparameters tuning
Public link to supporting material: https://inria.github.io/scikit-learn-mooc/
Tutorial: https://github.com/glemaitre/euroscipy-2023-scikit-image
Workshop Outline:
- image histogram and contrast
- image filtering: transformations of an image resulting in a new image of similar size (for example, thresholding, edge enhancement, etc.)
- image segmentation: partitioning an image into several regions (objects)
- image descriptors
Github link: scikit-learn/scikit-learn#26120
Scikit-learn does not provide any flexibility to go from "soft" to "hard" predictions: it uses a cut-off point at a confidence score of 0.5 (or 0 when using decision_function) to get class labels. However, optimizing a classifier to get a confidence score close to the true probabilities (i.e. a calibrated classifier) does not guarantee to obtain accurate "hard" predictions using this heuristic. Reversely, training a classifier for an optimum "hard" prediction accuracy (with the cut-off constraint at 0.5) does not guarantee obtaining a calibrated classifier. In this talk, we will present a new scikit-learn meta-estimator allowing us to get the best of the two worlds: a calibrated classifier providing optimum "hard" predictions.
Github link: https://github.com/Kai-Striega/EuroSciPy-2023-Speech/blob/main/EuroSciPy_Speech.pdf
Scientific code is often complex, resource-intensive, and sensitive to performance issues, making accurate timing and benchmarking critical for optimising performance and ensuring reproducibility. However, benchmarking scientific code presents several challenges, including variability in input data, hardware and software dependencies, and optimisation trade-offs. In this talk, I discuss the importance of timing and benchmarking for scientific code and outline strategies for addressing these challenges. Specifically, I emphasise the need for representative input data, controlled benchmarking environments, appropriate metrics, and careful documentation of the benchmarking process. By following these strategies, developers can effectively optimise code performance, select efficient algorithms and data structures, and ensure the reliability and reproducibility of scientific computations.
Github link: https://github.com/widgetti/solara/ Public link to supporting material: https://solara.dev
Solara is a pure Python web framework designed to scale complex applications. Leveraging a React-like API, Solara offers the scalability, component-based coding, and simple state management that have made React a standard for large web applications. Solara uses a pure Python implementation of React, Reacton, to create ipywidgets-based applications that work both in the Jupyter Notebook environment and as standalone web apps with frameworks like FastAPI. This talk will explore the design principles of Solara, illustrate its potential with case studies and live examples, and provide resources for attendees to incorporate Solara into their own projects. Whether you're a researcher developing interactive visualizations or a data scientist building complex web applications, Solara provides a Python-centric solution for scaling your projects effectively.
Github link: https://github.com/ifpen/chalk-it
Please also visit the templates galleries, that can be explored online using the hosted version of Chalk’it : https://ifpen.github.io/chalk-it/ halk'it is an open-source framework that transforms Python scripts into distributable web app dashboards. It utilizes drag-and-drop widgets to establish an interface linked to a dataflow connecting Python code and various data sources. Chalk'it supports multiple Python graphics libraries, including Plotly, Matplotlib and Folium for interactive mapping and visualization. The framework operates entirely in web browsers using Pyodide. In our presentation, we will showcase Chalk'it, emphasizing its primary features, software architecture, and key applications, with a special focus on geospatial data visualization.
Github link: https://github.com/mljar/mljar-supervised
The mljar-supervised is an Automated Machine Learning Python package that works with tabular data. It is designed to save time for a data scientist. It abstracts the common way to preprocess the data, construct the machine learning models, and perform hyper-parameters tuning to find the best model 🏆. It is no black-box as you can see exactly how the ML pipeline is constructed (with a detailed Markdown report for each ML model).
Github link: https://github.com/skrub-data/skrub
Poster link: https://drive.google.com/file/d/1TtmJ3VjASy6rGlKe0txKacM-DdvJdIvB/view?pli=1
skrub provides tools (TableVectorizer, fuzzy_join...) and encoders (GapEncoder, MinHashEncoder...) for morphological similarities, for which we usually identify three common cases: similarities, typos and variations
Github link: https://scverse.org/packages/
This repository contains the list of scverse ecosystem packages that are displayed on scverse.org. The goal is to increase visibility of ecosystem packages and make it easier for users to find appropriate software. Registered ecosystem packages can also get their own tag to use on the scverse forum for user discussion. Authors of these packages can be added the scverse github organization. In the future, we may also test releases of core packages against the test suites of ecosystem packages.
If a package is part of this list, it means it fulfills certain minimum requirements as outlined below. It does not imply endorsement or that an in-depth review has been performed.
Github link: https://github.com/genialis/RNAnorm
Python implementation of common RNA-seq normalization methods: • CPM (Counts per million) • FPKM (Fragments per kilobase million) • TPM (Transcripts per million) • UQ (Upper quartile) • CUF (Counts adjusted with UQ factors) • TMM (Trimmed mean of M-values) • CTF (Counts adjusted with TMM factors)
Github link: https://github.com/coffeine-labs/coffeine
Coffeine is designed for building biomedical prediction models from M/EEG signals. The library provides a high-level interface facilitating the use of M/EEG covariance matrix as representation of the signal. The methods implemented here make use of tools and concepts implemented in PyRiemann. The API is fully compatible with scikit-learn and naturally integrates with MNE.