Code Monkey home page Code Monkey logo

stat4012's Introduction

STAT4012

Completed:

  1. Web scrapping from crypto slate
  2. Gather BTC data from exchange
  3. Labelling data

To do list:

  1. Feature engineering / transforming title data - Winnie
  2. Model training (construction, tuning, evaluation): RNN (LSTM) - Winnie
  3. 2-3 Model for comparison: KNN, Logistic regression, Classification tree (proposed)
  4. Backtest (calculate return and win rate of the model) - Jadon

STAT4012 Group 2 Project Outline

Fong Yuen Tung 1155158139, Ng Weng Sam 1155133815, Leung Ho Kwan 1155144270, Cheng Chu Fung 1155143199, So Fu Shing 1155159902 Objective: The aim of this project is to develop a model that accurately predicts the direction of Bitcoin price movements through news sentiment analysis using natural language processing (NLP) techniques. Specifically, we will generate signals for Bitcoin trading, indicating whether to buy, sell, or wait, based on analyzing news headlines from Coindesk. In this analysis, the timescope of open positions will be five minutes.

Dataset: The dataset for this project will consist of two parts. The first dataset is news headlines from Coindesk. We choose Coindesk because it is a popular and well-established media outlet that covers the cryptocurrency and blockchain space extensively. We will gather ~2200 headlines with their publishing time from coindesk in a 1-year period (2022/03/01 - 2023/02/28). The second dataset is the corresponding Bitcoin price movements in the 5 minutes following the release of each headline via Binance API as the same period with headlines.

(CoinDesk webshttps://www.coindesk.com/tag/bitcoin/

Method: The project will involve several steps. First, we will collect news headlines from coindesk and preprocess them to ensure consistency in formatting and remove any irrelevant information. Also, after gathering data from Binance, we will classify them into three classes decided by a classification threshold which will be decided by the statistics of data gathered.

The dataset will be split into training and testing sets, with the training set used to train the model and the testing set used to evaluate its performance. Next, we will use NLP techniques to analyze the sentiment of each headline.

We will use the sentiment analysis with Multi-layer Perceptron and Recurrent Neural Network results to generate buy, sell, or hold signals for Bitcoin trading, based on the expected price movements in the 5 minutes following each news release. We will evaluate the performance of our models using metrics such as accuracy, precision, and recall and compare them with other machine learning models like KNN and logistic regression.

stat4012's People

Contributors

jadonleung avatar fongyuentung avatar donaldccf avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.