cristianpjensen / stock-market-prediction-via-google-trends Goto Github PK

Attempt to predict future stock prices based on Google Trends data.

License: MIT License

Python 100.00%

trends stock-price-prediction machine-learning google-trends data-visualisation stock-prices stock-market google-trends-api bollinger-bands mlpclassifier

stock-market-prediction-via-google-trends's Introduction

stock-market-prediction-via-google-trends's People

Contributors

Stargazers

Watchers

stock-market-prediction-via-google-trends's Issues

Update README.md with Data Collection information

Description

Update the README.md with information on how the data_collector.py script works.

Acceptance Criteria

When a technically written paragraph has been added to the README.md about the data_collector.py script.

Why?

To improve the documentation.

new complementary tool

My name is Luis, I'm a big-data machine-learning developer, I'm a fan of your work, and I usually check your updates.

I was afraid that my savings would be eaten by inflation. I have created a powerful tool that based on past technical patterns (volatility, moving averages, statistics, trends, candlesticks, support and resistance, stock index indicators).
All the ones you know (RSI, MACD, STOCH, Bolinger Bands, SMA, DEMARK, Japanese candlesticks, ichimoku, fibonacci, williansR, balance of power, murrey math, etc) and more than 200 others.

The tool creates prediction models of correct trading points (buy signal and sell signal, every stock is good traded in time and direction).
For this I have used big data tools like pandas python, stock market libraries like: tablib, TAcharts ,pandas_ta... For data collection and calculation.
And powerful machine-learning libraries such as: Sklearn.RandomForest , Sklearn.GradientBoosting, XGBoost, Google TensorFlow and Google TensorFlow LSTM.

With the models trained with the selection of the best technical indicators, the tool is able to predict trading points (where to buy, where to sell) and send real-time alerts to Telegram or Mail. The points are calculated based on the learning of the correct trading points of the last 2 years (including the change to bear market after the rate hike).

I think it could be useful to you, to improve, I would like to share it with you, and if you are interested in improving and collaborating I am also willing, and if not file it in the box.

Update documentation for `build_features.py`

Description

The documentation for the scripts is still documenting previous code. Thus it has to be updated to document the current code.

Use K-fold cross-validation

Description

Utilise the powers of k-fold cross-validation in the machine learning model.

Acceptance Criteria

When K-fold cross-validation has been implemented.

Why?

All machine learning models that perform well are very overfitted (training accuracy = 1.0). This may help against that.

README for the deployment folder

The deployment needs to be shown via GIFs. No one will open the index.html in a live server. This is the easiest/best way to show it.

Hyperparameter tuning

Description

Use a cloud computing service (AWS, Google Cloud, Microsoft Azure ...) to find the best hyperparameters for the model.

Acceptance Criteria

When a model with > 0.65 accuracy has been found.

Why?

Hyperparameter tuning is a big part of machine learning and can make or break the algorithm. Good hyperparameters mean better accuracy, better accuracy means more money in this case.

Have `data.py` import keywords from a `keywords.txt` file.

Description

This makes it easier to import a big amount of keywords.

Feed data to neural network

Description

There has to be a method to how the data will be fed to a neural network. So that is what this issue has to solve. This issue has to answer some questions, like "How many weeks back will the model be fed?".

Acceptance Criteria

This issue will be considered done when a viable method of feeding data to the neural network has been made.

Why?

This is required, because the data the neural network gets fed, will also determine the accuracy of it.

Search box change

Description

The search box needs to only change the graph upon selecting a search term.

Acceptance Criteria

When after hitting either "enter" or pressing on a search term suggestion, the graph changes, and not in another way.

Why?

Because else the graph will be empty half the time, and it causes lag, because the graph keeps updating.

Figures for README.md (and deployment).

Plots needed

A graph, where the Google Trends data is a heatmap with a line plot of the stock price data over it;
- This is to indicate the correlation between the Google Trends data and the stock price.
Various graphs where the adjustments made are clear and concise, perhaps an example.
- This is to indicate why the adjustments are needed, and how they were made.

Various graphs could be added to this issue.

How can i set the location to Global?

Hello, thank you for this repository, i've been doing some research and all of your methodology was very helpful. But, I have a question, how can I set the API to get daily data without the geo parameter? I was trying and every time i get some error. Thanks for your time!

Opacity problem

Description

The opacity doesn't go to 1 when the page is being scrolled quickly.

Links not changing color when hovered over in Google Chrome

Description

Links should be changing color to blue when hovered over, but they are remaining white in google chrome.

Feature engineering

Description

Determine which features are worth keeping and which aren't.

Acceptance Criteria

When a model has been made - with features - which can outperform the stock market using a buy-and-hold strategy.

Why?

This is an essential part of the machine learning workflow.

Update the main README

Description

The main README is quite outdated, a lot more progress has been made since its creation. Thus an updated README should be made. Things to cover:

New plots (made with Seaborn);
Look over the existing text and determine whether it is still usable;
More text, containing information on;
- Machine Learning model;
- The deployment of the webpage;
- The feature engineering and its various methods.

Acceptance Criteria

When a good-looking README has been made. It should be up to par with the Google documentation style.

Why?

This feature is required to get more people interested in the project. It helps others with understanding the project and the decisions made.

README for the scripts folder

The scripts need documentation.

Pull data from Google Trends

Description

The pulling of the data from Google Trends has to be quick and automatic. Google Trends doesn't have an API, so it will have to be done from scratch.

Acceptance Criteria

This issue will be considered closed when a script is able to pull data from Google Trends; an unofficial API.

Why?

This feature is necessary, because if a user wants to use another search term for their instance, they would have to spend hours collecting all data. However, with this feature it would only require seconds/minutes. This would also provide a foundation to being able to collect new data from Google Trends when actively using this program.

Update README.md with data adjustment information

self-explanatory.

Initial letter doesn't work on Google Chrome

Description

On Google Chrome, initial-letter is not an option (it is in safari). Thus a way of making sure the initial letter also works on Google Chrome will have to be figured out. This could be the fix.

Acceptance Criteria

When the drop caps works on all web browsers.

Why?

Because accessibility is important, and accessibility means all browsers.

Unable to select text in the introduction

Why?

This is required because it shows people what the data looks like. They can of course also look at the deployment, but graphs in a README are more accessible.

Text under graphs

Description

There has to be a text under the graphs to explain what is being visualised.

Merge data_adjuster.py and data_collector.py

Description

Change data_adjuster and data_collector, so that they are one script, and not all files from Google Trends will be downloaded, only the adjusted daily data is outputted.

Acceptance Criteria

When one script does the job of both of these scripts.

Why?

This saves space on hard drives, it declutters, and it looks more clean.

The first two lines of csv file have to be deleted

Description

Automatically delete the first two lines of the csv files on download.

Why?

To make it better integrate with a pandas Dataframe.

Change all " to '

Description

Why?

Because one size doesn't fit all in neural networks. The model has to be configurable to account for this.

Search box sometimes only uses a part of the search term specified

Description

Happens sometimes when a suggestion has been clicked. The search term doesn't work anymore after that and the webpage has to be refreshed for the search box to be used again.

Complete the README

Description

Make sure that all information that should be in the documentation (readme), is present. Also write following the Google developer documentation style guide.

Acceptance Criteria

When all documentation, which is needed, is written according to the Google developer documentation style guide, and that anyone who is not necessarily an expert (but is interested), understands everything written in the documentation.

Why?

This feature is required, because the documentation is the first thing that people look at, and good-looking documentation inclines people to leave a star, or get engaged in the project.

Feature based on days since last peak

Description

A feature, which is essentially just how many days the last peak was. Peaks can be found using the scipy.signals library. Only major peaks should be used, not all the small little peaks (where there are hundreds of). For example, only the peaks preceding a stock crash. Define the characteristics of these peaks, so that the algorithm can start looking for the same kind of peaks in the future. This could definitely help with prediction.

Also, this feature could be one-hot encoded. For example:

Days since last peak

0	1	2	3	4
1	0	0	0	0
0	1	0	0	0
0	0	1	0	0
0	0	0	1	0
0	0	0	0	1

Acceptance Criteria

Three criteria:

Whether or not this feature is possible;
Definition of the characteristics of the peaks preceding a stock crash (may have to be done per keyword);
Best way of presenting this feature to the machine learning model.

Why?

All features, which could help with better accuracy, is a feature that should be explored.

Add text under the page

Description

Credits, links to profiles, conclusion...

Search term with 2 words can't be downloaded.

Description

Make it possible to download Google Trends data of keywords with more than 1 word.

Description

Structure the project according to the cookiecutter data science template.

Why?

Because it makes it more structured and helps people navigate through the project.

0	1	2	3	4
1	0	0	0	0
0	1	0	0	0
0	0	1	0	0
0	0	0	1	0
0	0	0	0	1

0	1	2	3	4
1	0	0	0	0
0	1	0	0	0
0	0	1	0	0
0	0	0	1	0
0	0	0	0	1

cristianpjensen / stock-market-prediction-via-google-trends Goto Github PK

stock-market-prediction-via-google-trends's Introduction

stock-market-prediction-via-google-trends's People

Contributors

Stargazers

Watchers

Forkers

stock-market-prediction-via-google-trends's Issues

Description

Acceptance Criteria

Why?

Description

Description

Acceptance Criteria

Why?

Description

Acceptance Criteria

Why?

Description

Description

Acceptance Criteria

Why?

Description

Acceptance Criteria

Why?

Plots needed

Description

Description

Description

Acceptance Criteria

Why?

Description

Acceptance Criteria

Why?

Description

Acceptance Criteria

Why?

Description

Acceptance Criteria

Why?

Description

Description

Acceptance Criteria

Why?

Description

Why?

Description

Acceptance Criteria

Why?

Description

Acceptance Criteria

Why?

Description

Acceptance Criteria

Why?

Description

Description

Acceptance Criteria

Why?

Description

Why?

Description

Description

Acceptance Criteria

Why?

Description

Acceptance Criteria

Why?

Description

Why?

Description

Acceptance Criteria

Why?

Description

Acceptance Criteria

Why?

Description

Acceptance Criteria

Why?

Description

0	1	2	3	4
1	0	0	0	0
0	1	0	0	0
0	0	1	0	0
0	0	0	1	0
0	0	0	0	1