Code Monkey home page Code Monkey logo

fintech540-group_project's Introduction

Fintech540-Machine Learning For Fintech - Group_projet:smiling_imp:

Description:

This is repo for a Machine learning Model-building project that takes cryptocurrency data as input and used some supervised(regression,classification) and unsupervised(density astimation, clustering) machine learning algorithms to find out some interesting patterns in the dataset.

Architecture:

WeChat21a3a4952e0b415c9d0456e469408f09

Reggression (1 MODEL- liner regression: 2-3 people):

  1. Fuction: As a benchmark to prove clusterings will be better.
  2. Tasks: a. We need to have factors which bring influence on our cryptocurrencies' price. (S&P500, etc...) b. Choose three cryptocurrency to present c. Finish in one week

Requirements

1.one model 2.focus on one kind of crypocurrency or top 50/100 market cap 3.Deadline of presentation: 11/17/2022

Motivation:

Machine learning in finance is now considered a key aspect of several financial services and applications, including managing assets, evaluating levels of risk, calculating credit scores, and even approving loans. Machine learning is a subset of data science that provides the ability to learn and improve from experience without being programmed.
In this project, we will explore some possible ways that how unsupervised learnig algorithms(Clustering) could be applied on cryptocurrency and access their performance to find out whether there are some interesting discoveries.

Take a peep into our dataset:

You can find out dataset here: web_link For dataset, we have 1243590 entries and 12 columns:

  • time_open :
  • time_high : Time cryptocurrency reachs highest price.
  • time_low :Time cryptocurrency reachs lowest price.
  • quote.USD.open :
  • quote.USD.high :
  • quote.USD.low :
  • quote.USD.close :
  • quote.USD.volume :
  • quote.USD.market_cap : The total market value of a cryptocurrency's circulating supply. It is analogous to the free-float capitalization in the stock market.
  • quote.USD.timestamp :
  • symbol : The symbol of cryptocurrency
  • id : With symbol, they are the unique id for cryptocurrency.

Addtional Notes:

We might not try out all machine learning algorithms at the first stage. We might focus on unsupervised learning algorithm such as clustering.

Overall Progress:

  • README file and some EDA work
  • Assign works
  • Building models
  • Interpret the results
  • Make PPT

Problems we're facing

Heyyy! Write your own progress here!๐Ÿ‘ป

Patrick Duan

  • K-means finished inertia_vs_k_plot
  • I did some EDA work and feature engineering on our data
    • extract minute and sec as new features from time_high and time_low
    • drop other categorical columns
    • stanardize all numerical columns since distance matters in our model
  • Remain to do:
    • result interpretation

Chenxi Rong & Yiwei Cheng

  • What we mainly did are the steps before Stacey's fancy plots!
    • checked the raw data and dropped the missing values after testing.
    • added a new column representing the symbol and id.
    • extracted the date from time stamp.
    • drawing the rough plot and made a few assumptions about clustering.
  • Remain to do:
    • building up new models

Yiwei Cheng

  • worked with Stacey and Chenxi for EDA before clustering
  • Did GMM and DBSCAN model with the data frame after Petrick's feature engineering
    • GMM Package(model and probability)
    • DBSCAN: find and visualize the best EPS and min_samples
    • DBSCAN result: With eps=1.5, min samples=4, and data= df[0: 10000], we have 3 clusters: cluster 0, cluster 1, and cluster 2
    • DBSCAN result: Cluster -1 is the noise
  • Wrote Powerpoint slides for introduction, interpretation, and conclusion of DBSCAN model and revised some format problems of the presentation slides

Stacey Fang

Check_FinalType

CheckNull

Description

All_Coin

-EDA for whole dataset finished

Zhuo Yang

  • On the basis of Steven's multi-model fitting, random forest was selected for further optimization.
    1. Pull BTC price data directly in the parquet file
    2. Routine and targeted data processing
    3. Select a group of seven days for data restructuring in order to extract feature values
    4. Extract feature values using tsfresh
    5. Use train_test_split to partition the data into training and testing sets
    6. Training Model
    7. Using the model to make predictions
    8. Evaluate models through numerical evaluation and visualization

Yiwei Cheng & Stacey Fang

Selected Coins- "BTC_1","ETH_1027", "BNB_1839", "ADA_2010"

boxplot

selectedCoin_Description

Overall Progress:

fintech540-group_project's People

Contributors

staceyff avatar petriiick avatar mestrada21 avatar yic221 avatar joeyang-1010 avatar petrrick avatar stevenwei1 avatar elainercx avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.