Due to the volatility in investment performance of human portfolio managers, many investment firms rely on trading algorithms to produce consistent investment performance capable of outperforming the markets. One such investing strategy is the combination of Machine Learning Algorithms and High Frequency Algorithms. This strategy has become very popular and requires skills from multiple of the previous units, such as automatically retrieving data used to make investment decisions, training a model, and building an algorithm to execute the trading.
You have been tasked by the investment firm Renaissance High Frequency Trading (RHFT) to develop such an algorithm. RHFT wants the algorithm to be based on stock market data for FB
, AMZN
, AAPL
, NFLX
, GOOGL
, MSFT
, and TSLA
at the minute level. It should conduct buys and sells every minute based on 1 min, 5 min, and 10 min Momentum. The CIO asked you to choose the Machine Learning Algorithm best suited for this task and wants you to execute the trades via Alpaca's API.
You will need to:
- Prepare historical return data for training and testing (optional)
- Compare prediction performance of multiple ML-Algorithms
- Implement a fully functional trading algorithm that buys and sells the stocks selected by the Model
NOTE: If you choose do Part 1, you can ignore the provided csv-file and use the optional notebook. The regular notebook begins at Part 2.
Starter Notebook (Optional Section)
To begin, create an env
file to store your Alpaca credentials using ALPACA_API_KEY
and ALPACA_SECRET_KEY
as the variable names. Store this file in the same folder as your starter notebook.
Open the starter notebook. The code for this section has been provided, however the steps are still outlined below to allow for understanding. The code contains functions to aquire and clean data, compute returns and create a final cleaned DataFrame to hold momentums.
- You should have already created an
env
file to store your Alpaca credentials usingALPACA_API_KEY
andALPACA_SECRET_KEY
as the variable names. - Using the starter notebook, run the provided code cells to:
- Load your environment variables.
- Generate your Alpaca API object, specifying use of the paper trading account with the base url "https://paper-api.alpaca.markets".
- Create a ticker list, beginning and end dates, and timeframe interval.
- Ping the Alpaca API for the data and store it in a DataFrame called
prices
by using theget_barset
function combined with thedf
method from the Alpaca Trade SDK. - Store only the close prices from the
prices
DataFrame in a new DataFrame calleddf_closing_prices
, then view the head and tail to confirm the following:- The first price for each stock on the open is at 9:30 Eastern Time.
- The last price for the day on the close is at 3:59 pm Eastern Time.
- When viewing the head and tail, you'll notice several
NaN
values.- Alpaca reports
NaN
for minutes without any trades occuring as missing. - These values are removed using Panda's
ffill()
function to "forward fill", or replace, those prices with the previous values (since the price has not changed).
- Alpaca reports
- Compute the percentage change values for 1 minute as follows:
- Create a variable called
forecast
to hold the forecast, in this case1
for 1 minute. - Use the
pct_change
function, passing in theforecast
, on thedf_closing_prices
DataFrame, storing the newly generated DataFrame in a variable calledreturns
. - Convert the
returns
DataFrame to show forward returns by passing-(forecast)
into theshift
function.
- Create a variable called
- Convert the DataFrame into long form for merging later using
unstack
andreset_index
. - Compute the 1, 5, and 10 minute momentums that will be used to predict the forward returns, then merge them with the forward returns as follows:
- Create the list of moments:
list_of_momentums = [1,5,10]
- Write a for-loop to loop through the
list_of_momentums
, applying them topct_change
with thedf_closing_price
with each iteration. - With each loop, the temporary DataFrame,
returns_temp
will need to be prepped withunstack
andreset_index
, then merged with the originalreturns
DataFrame. - Complete this step by dropping the null values from
returns
and creating a multi-index based on date and ticker.
- Create the list of moments:
In this section, you'll train each of the requested algorithms and compare performance. Be sure to use the same parameters and training steps for each model. This is necessary to compare each model accurately.
Using the results
DataFrame from part one, you'll preprocess your data to make it ready for machine learning.
- Generate your feature data (
X
) and target data (y
):- Create a dataframe
X
that contains all the columns from the returns dataframe that will be used to predictF_1_m_returns
. - Create a variable, called
y
, that is equal to 1 ifF_1_m_returns
is larger than 0. This will be our target variable
- Create a dataframe
- Use the train_test_split library to split the dataset into a training and testing dataset, with 70% used for testing
- Set the shuffle parameter to
False
to use only the first 70% of the data for training (this prevents look ahead bias). - Make sure you have these 4 variables:
X_train
,X_test
,y_train
,y_test
- Set the shuffle parameter to
- Use the
Counter
function to test the distribution of the data. The result ofCounter({1: 668, 0: 1194})
reveals the data is indeed unbalanced. - Balance the dataset with the Oversampler libary, setting
random state= 1
. - Test the distribution once again with
Counter
. The new result ofCounter({1: 1194, 0: 1194})
shows the data is now balanced.
With the data preprocessed, you can now train the various algorithms and evaluate them based on precision using the classification_report
function from the sklearn library.
-
The first cells in this section provide an example of how to fit and train your model using the
LogisticRegression
model from sklearn:- Import selected model.
- Instantiate model object.
- Fit the model to the resampled data -
X_resampled
andy_resampled
. - Predict the model using
X_test
. - Print the classification report.
-
Use the same approach as above to train and test the following ML Algorithms:
-
Use the classification reports to answer the following questions in a markdown cell:
Which model produces the highest Accuracy?
Which model produces the highest performance over time?
Which model produces the highest Sharpe Ratio?
-
Using the classification report for each model, choose the model with the highest precision for use in your algo-trading program.
-
Save the selected model with the
joblib
libary to avoid retraining every time you wish to use it.
In this final section, you'll create your algo-trading program by pulling live data at the minute frequency and ensuring that the model is buying the selected stocks. If you need a refresher on methods available using the Alpaca SDK, you can view their docs here.
- Use the provided code to ping the Alpaca API and create the DataFrame needed to feed data into the model.
- This code will also store the correct feature data in
X
for later use.
- This code will also store the correct feature data in
- Using
joblib
, load the chosen model. - Use the model file to make predicttions:
- Use
predict
onX
and save this asy_pred
. - Convert
y_pred
to a DataFrame, setting the index to the index ofX
. - Rename the column 0 to 'buy', be sure to set
inplace =True
.
- Use
- Filter the stocks where 'buy' is equal to 1, saving the filter as
y_pred
. - Using the
y_pred
filter, create a dictionary calledbuy_dict
and assign 'n' to each Ticker (key value) as a placeholder. - Obtain the total available equity in your account from the Alpaca API and store in a variable called
total_capital
. You will split the capital equally between all selected stocks per the CIO's request. - Use a for-loop to iterate through
buy_dict
to determine the number stocks you need to buy for each ticker. - Cancel all previous orders in the Alpaca API (so you don't buy more than intended) and sell all currently held stocks to close all positions.
- Iterate through
buy_dict
and send a buy order for each ticker with their corresponding number of shares.
To automate the algorithm, you'll combine all the steps above into one function that be executed automatically with Python's "schedule" module. For more information on this module you can view the docs here.
- Make a function called
trade()
that incorporates all of the steps above. Note: The data cleaning and calculations section from earlier has already been incorpoated. Your task is to complete the function starting where you see# YOUR CODE HERE
. - Import Python's schedule module.
- Use the schedule module to automate the algorithm:
- Clear the schedule with
.clear()
. - Define a schedule to run the trade function every minute at 5 seconds past the minute mark (e.g.
10:31:05
). - Use the Alpaca API to check whether the market is open.
- Use the
run_pending()
function inside schedule to execute the schedule you defined while the market is open.
- Clear the schedule with
Comparing Machine Learning Algorithm Performance
Creating Stock Market Data for ML Algorithms
Experiment with the model architecture and parameters to see which provides the best results, but be sure to use the same architecture and parameters when comparing each model.
-
Complete the starter Jupyter Notebook for the homework and host the notebook on GitHub.
-
Include a README.md that summarizes your homework and include this report in your GitHub repository.
-
Submit the link to your GitHub project to Bootcamp Spot.
- Save a PNG image of the cumulative return plot that shows the actual returns vs. the strategy returns. (8 points)
- Slice the training dataset into different time periods in order to tune the model by adjusting the dataset size. (8 points)
- Change one of both window sizes in order to tune the model by adjusting the SMA input features. (8 points)
- Save a PNG image of the cumulative product of the actual returns vs. the strategy returns. (8 points)
- Import a new classifier. (9 points)
- Fit the new classifier using the original training data. (9 points)
- Backtest the new model. (11 points)
- Save a PNG image of the cumulative product of actual returns vs/ strategy returns. (9 points)
- Place imports at the beginning of the file, just after any module comments and docstrings and before module globals and constants. (3 points)
- Name functions and variables with lowercase characters and with words separated by underscores. (2 points)
- Follow Don't Repeat Yourself (DRY) principles by creating maintainable and reusable code. (3 points)
- Use concise logic and creative engineering where possible. (2 points)
- Submit a link to a GitHub repository that’s cloned to your local machine and contains your files. (5 points)
- Include appropriate commit messages in your files. (5 points)
- Be well commented with concise, relevant notes that other developers can understand. (10 points)
© 2021 Trilogy Education Services, a 2U, Inc. brand. All Rights Reserved.