Code Monkey home page Code Monkey logo

datascience-python's Introduction

Introduction to Data Science in Python by Appsilon

Introduction

Welcome to the course Introduction to Data Science in Python by Appsilon!

Target audience

This course aims to introduce people that know how to code in Python into the Data Science world. In particular I show tricks and tips useful for STEM/economic students. One of secondary goals is to show students how use free tools that are industry standards at the same time instead of Matlab/Statistica/SAS and so on.

Covered topics

  1. The course starts with introducing what does Data Scientist do in his work and why this job is so important in XXI century. Then we start the technical part of the course.
  2. numpy - numbers and vectors, fundamentals of all calculations in Python
  3. pandas - data frames - SQL-like, in-memory data, fundamentals of data processing in Python
  4. matplotlib and plotly - plots, basics of data visualization
  5. scikit-learn - introduction to machine learning, examples from the go-to library in Python
  6. streamlit, quarto, fastapi - simple, useful and creative ways to share your work in Python and to generate beautiful reports

Apart from those libraries I present and benchmark the polars library - a high-performant replacement for pandas if you work datasets of sizes 0.5GB - 5GB and pandas starts to be too slow.

Course materials

All course materials are located either here or on google drive. Code and small datasets are in repo, while large size datasets are located on google drive.

I suggest using html files, generated from qmd and ipynb with quarto.

Guide to setup an environment included in the introduction presentation.

tl;dr You can try

conda create -n ds-course python=3.10
conda activate ds-course
pip install -r requirements.txt

Homeworks

Each lecture has also some homework assignment. For every homework, there's provided solution in a separate directory. Note that solutions are not necessarily the best possible, but may present some interesting approach. Very often there are multiple ways you can approach the same problem.

License

The course has been prepared by Piotr Pasza Storożenko from Appsilon. It is available under CC BY 4.0 license. Feel free to use these materials for your use, you just have to attribute the original author.

Some exercise have been inspired by the exercises author had to solve while studying.

datascience-python's People

Contributors

nerwosolek avatar pstorozenko avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.