Code Monkey home page Code Monkey logo

stock-market-prediction-via-google-trends's Introduction

stock-market-prediction-via-google-trends's People

Contributors

cristianpjensen avatar dependabot[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

stock-market-prediction-via-google-trends's Issues

Update README.md with Data Collection information

Description

Update the README.md with information on how the data_collector.py script works.

Acceptance Criteria

When a technically written paragraph has been added to the README.md about the data_collector.py script.

Why?

To improve the documentation.

new complementary tool

My name is Luis, I'm a big-data machine-learning developer, I'm a fan of your work, and I usually check your updates.

I was afraid that my savings would be eaten by inflation. I have created a powerful tool that based on past technical patterns (volatility, moving averages, statistics, trends, candlesticks, support and resistance, stock index indicators).
All the ones you know (RSI, MACD, STOCH, Bolinger Bands, SMA, DEMARK, Japanese candlesticks, ichimoku, fibonacci, williansR, balance of power, murrey math, etc) and more than 200 others.

The tool creates prediction models of correct trading points (buy signal and sell signal, every stock is good traded in time and direction).
For this I have used big data tools like pandas python, stock market libraries like: tablib, TAcharts ,pandas_ta... For data collection and calculation.
And powerful machine-learning libraries such as: Sklearn.RandomForest , Sklearn.GradientBoosting, XGBoost, Google TensorFlow and Google TensorFlow LSTM.

With the models trained with the selection of the best technical indicators, the tool is able to predict trading points (where to buy, where to sell) and send real-time alerts to Telegram or Mail. The points are calculated based on the learning of the correct trading points of the last 2 years (including the change to bear market after the rate hike).

I think it could be useful to you, to improve, I would like to share it with you, and if you are interested in improving and collaborating I am also willing, and if not file it in the box.

Use K-fold cross-validation

Description

Utilise the powers of k-fold cross-validation in the machine learning model.

Acceptance Criteria

When K-fold cross-validation has been implemented.

Why?

All machine learning models that perform well are very overfitted (training accuracy = 1.0). This may help against that.

README for the deployment folder

The deployment needs to be shown via GIFs. No one will open the index.html in a live server. This is the easiest/best way to show it.

Hyperparameter tuning

Description

Use a cloud computing service (AWS, Google Cloud, Microsoft Azure ...) to find the best hyperparameters for the model.

Acceptance Criteria

When a model with > 0.65 accuracy has been found.

Why?

Hyperparameter tuning is a big part of machine learning and can make or break the algorithm. Good hyperparameters mean better accuracy, better accuracy means more money in this case.

Feed data to neural network

Description

There has to be a method to how the data will be fed to a neural network. So that is what this issue has to solve. This issue has to answer some questions, like "How many weeks back will the model be fed?".

Acceptance Criteria

This issue will be considered done when a viable method of feeding data to the neural network has been made.

Why?

This is required, because the data the neural network gets fed, will also determine the accuracy of it.

Search box change

Description

The search box needs to only change the graph upon selecting a search term.

Acceptance Criteria

When after hitting either "enter" or pressing on a search term suggestion, the graph changes, and not in another way.

Why?

Because else the graph will be empty half the time, and it causes lag, because the graph keeps updating.

Figures for README.md (and deployment).

Plots needed

  • A graph, where the Google Trends data is a heatmap with a line plot of the stock price data over it;
    • This is to indicate the correlation between the Google Trends data and the stock price.
  • Various graphs where the adjustments made are clear and concise, perhaps an example.
    • This is to indicate why the adjustments are needed, and how they were made.

Various graphs could be added to this issue.

How can i set the location to Global?

Hello, thank you for this repository, i've been doing some research and all of your methodology was very helpful. But, I have a question, how can I set the API to get daily data without the geo parameter? I was trying and every time i get some error. Thanks for your time!

Opacity problem

Description

The opacity doesn't go to 1 when the page is being scrolled quickly.

Feature engineering

Description

Determine which features are worth keeping and which aren't.

Acceptance Criteria

When a model has been made - with features - which can outperform the stock market using a buy-and-hold strategy.

Why?

This is an essential part of the machine learning workflow.

Update the main README

Description

The main README is quite outdated, a lot more progress has been made since its creation. Thus an updated README should be made. Things to cover:

  • New plots (made with Seaborn);
  • Look over the existing text and determine whether it is still usable;
  • More text, containing information on;
    • Machine Learning model;
    • The deployment of the webpage;
    • The feature engineering and its various methods.

Acceptance Criteria

When a good-looking README has been made. It should be up to par with the Google documentation style.

Why?

This feature is required to get more people interested in the project. It helps others with understanding the project and the decisions made.

Pull data from Google Trends

Description

The pulling of the data from Google Trends has to be quick and automatic. Google Trends doesn't have an API, so it will have to be done from scratch.

Acceptance Criteria

This issue will be considered closed when a script is able to pull data from Google Trends; an unofficial API.

Why?

This feature is necessary, because if a user wants to use another search term for their instance, they would have to spend hours collecting all data. However, with this feature it would only require seconds/minutes. This would also provide a foundation to being able to collect new data from Google Trends when actively using this program.

Initial letter doesn't work on Google Chrome

Description

On Google Chrome, initial-letter is not an option (it is in safari). Thus a way of making sure the initial letter also works on Google Chrome will have to be figured out. This could be the fix.

Acceptance Criteria

When the drop caps works on all web browsers.

Why?

Because accessibility is important, and accessibility means all browsers.

Search box compare to stock price.

Description

There has to be the ability to compare the search terms to the stock price of ^DJI.

Acceptance Criteria

When the line for the stock price of ^DJI is also in the "Explore" graph.

Why?

The interesting part of the project is to compare the search terms to the stock price, since that is the point of the project.

Update docstrings in `make_dataset.py`

Description

Make all docstrings in make_dataset.py comply with the Google Docstrings Style.

Why?

This is the convention that has been chosen for this project. It has already been implemented in build_features.py.

Combine ML features

Description

Combine features and figure out which features are the best for making predictions.

Acceptance Criteria

When the best possible - with the best possible features - machine learning algorithm has been found.

Why?

Better features mean a better algorithm. This could enforce the machine learning model and better its accuracy.

Adjust Google Trends data

Description

The data pulled from Google Trends has to be adjusted after the data has been pulled. The adjustments have to be made according to Method in the README.md.

Acceptance Criteria

This issue will be considered done when after the data from Google Trends has been pulled, it will be automatically adjusted, and exported to a .csv-file, as described in Method

Why?

This feature is required, because in order to feed data to the neural network, the structure of the data has to stay consistent.

README for `data`-folder

Description

Create a README.md for the data-folder with multiple graphs containing data from the data in this folder. The plots should be made using seaborn. I will be making all plots for this project with seaborn so that all READMEs are consistent with each other.

Acceptance Criteria

When a README has been made. The graphs should be clear. There should be multiple plots (look in plot galleries for inspiration.

Why?

This is required because it shows people what the data looks like. They can of course also look at the deployment, but graphs in a README are more accessible.

Text under graphs

Description

There has to be a text under the graphs to explain what is being visualised.

Merge data_adjuster.py and data_collector.py

Description

Change data_adjuster and data_collector, so that they are one script, and not all files from Google Trends will be downloaded, only the adjusted daily data is outputted.

Acceptance Criteria

When one script does the job of both of these scripts.

Why?

This saves space on hard drives, it declutters, and it looks more clean.

Change all " to '

Description

Consistency, consistency, consistency...

A lot of " in make_dataset.py in particular.

Use many features with one keyword

Description

Use only one keyword (stock market) and create many types of features with that (Bollinger Bands, EMA, MA, etc.). Use stock market, because what of what I can find out, it is the best search term that correlates with the stock market.

Acceptance Criteria

When good features for this problem have been found, implement, and visualised.

Why?

This will make for some great visualisations and I might be able to find out which feature(s) is the best for multiple keywords

Some data in the CSV files overlap

Description

The last and the first of the daily data are the same days, however only the daily data files shouldn't end on the first, but on the last day of a particular month.

Acceptance Criteria

When - after export - the data doesn't overlap anymore.

Why?

Makes it easier to manipulate them as panda dataframes.

Feature engineer percentage changes.

Description

Instead of using absolute values, relative values would be more valuable. Would also be easier to compute in the future to make predictions when deployed.

Why?

Because it is easier to retrieve after the fact. Don't need to normalize the future data points with the existing data points (google trends is quite annoying in this regard). But percentage changes are the same always. That the Google Trends data is relative doesn't affect percentage changes.

Search box suggestions

Description

The suggestions made by the search box while typing need to 1. look nice, and 2. be clickable.

Acceptance Criteria

When above criteria are fulfilled.

Why?

This makes it easier and look nicer.

Pull stock data

Description

The stock price data has to be pulled, so it can be fed to the neural network. However the weekends aren't incorporated in the stock price data, so it has to be manipulated so that they fit. Either the weekends have to be skipped, the weekends have to be indicated as NaN, or it has to be the same data has the Friday before (or work day before the NaN). Research has to be done on this subject.

Acceptance Criteria

When a viable solution to the weekend/holiday problem has been found, and incorporated.

Why?

The neural network has to be given an outcome of all the Google Trends data.

Configurable model

Description

Make a neural network, which is configurable to the user's likings.

Acceptance Criteria

When a neural network model, which is configurable is made. This will most likely be made using TensorFlow.

Why?

Because one size doesn't fit all in neural networks. The model has to be configurable to account for this.

Complete the README

Description

Make sure that all information that should be in the documentation (readme), is present. Also write following the Google developer documentation style guide.

Acceptance Criteria

When all documentation, which is needed, is written according to the Google developer documentation style guide, and that anyone who is not necessarily an expert (but is interested), understands everything written in the documentation.

Why?

This feature is required, because the documentation is the first thing that people look at, and good-looking documentation inclines people to leave a star, or get engaged in the project.

Feature based on days since last peak

Description

A feature, which is essentially just how many days the last peak was. Peaks can be found using the scipy.signals library. Only major peaks should be used, not all the small little peaks (where there are hundreds of). For example, only the peaks preceding a stock crash. Define the characteristics of these peaks, so that the algorithm can start looking for the same kind of peaks in the future. This could definitely help with prediction.

Also, this feature could be one-hot encoded. For example:

Days since last peak

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 > 15
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

Acceptance Criteria

Three criteria:

  • Whether or not this feature is possible;
  • Definition of the characteristics of the peaks preceding a stock crash (may have to be done per keyword);
  • Best way of presenting this feature to the machine learning model.

Why?

All features, which could help with better accuracy, is a feature that should be explored.

Structure

Description

Structure the project according to the cookiecutter data science template.

Why?

Because it makes it more structured and helps people navigate through the project.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.