terterfred / mergerarbitrage Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 0.0 23.82 MB

TeX 3.77% Jupyter Notebook 95.56% Julia 0.67%

mergerarbitrage's People

Contributors

Stargazers

Watchers

mergerarbitrage's Issues

Proposal Peer Review

This project involves predicting the success of a merger as well as the best investment strategy. The project will look at data containing information about previous merger strategies and success, as well as price and valuation data. The goal is to learn what features are predictive of merger success or failure, and further, to use the model to create a strategy for the merger.

Strengths:
-The application of this project is clear; many companies would find great use in this model if successful.
-The extension of determining an optimal trading strategy given the merger success prediction is creative and seems useful.
-Multiple datasets will be used to extract information, which will add to the strength of the model.

Concerns:
-It could have been explained more what the input features from the datasets will be.
-For the second part of the project, I am unsure what exactly "determining the optimal trading strategy" entails.
-Why do you plan on restricting data to only include public entities as target?

Final review by Wenchang Yang

This project is trying to predict whether or not a company's acquisition case will succeed or fail, which according to the report, reduced to a classification problem. This idea is convincing and clearly founded. Potentially, the result of this project could help with the decision making process of a acquisition case.

Looking into the details, however, some parts of the report could be further improved.
First, In the report's explanatory data analysis part, authors put many details into explaining the sources of data sets and missing value problems of the data sets. While the model selection and model interpretation parts did not provide throughout explanation and mathematical interpretation on why they chose such models (Logistic regression, random forest etc.,) to perform analysis and how the models' results reflected the nuances between features. All four models' results were given interpretation of 3-4 sentences without detailed explanation, which made the report a little hard to follow.

Second, the practice of putting model result in the end of the report brought difficulties for readers. I would suggest authors guide readers through the report with both mathematical explanation and text interpretation instead of letting readers jump around the report.

To summarize, this project is built on a enlightening idea and a convincing stepping-point. If authors could give more throughout explanation on models' results, this project would be helpful to acquisition's decision making process.

Peer Review

The goal of this project is to determine a merger’s probability of success and then determine an optimal trading strategy. This involves determining which features contribute to the success or failure of pending mergers.

The project proposal does a great job of outlining the high level attributes of the model and how the group is going to achieve its goal. Model will integrate multiple datasets - I think is a big plus! Overall, application and benefits are clear.

My question to the group is, what is considered as an optimal trading strategy? Furthermore, what are some features you intend on using in your model? The objective and project question are very concrete, but the project data is very vague - especially to someone without a strong background in investment strategies and finance.

Best of luck with your project!
Artina

Midterm Peer Review - kz233

What's working:
This midterm report is very thorough and solid. It hits all most of the required guidelines. I like that you provided good context for the project, concisely summarizing the motivations well, especially for someone with little finance knowledge. You also were good with describing your data sources, both the origin, and all the decisions behind collapsing categorical data categories into fewer levels. Your table in the appendix with the description of each variable was nice and informative. Also you've made thoughtful progress on model development, already having tried both logistic regression, with regularization, and random forests.

Some things to work on:

More informative visuals, to maybe show some relationships between variables, instead of just a histograms that give a high-level overview of how each variable is distributed
You mention "the table below" but I think it got moved to the appendix.
Also, you say how they're separated into three categories, but I don't see that specified.
You're aware of this/make note of it in the report, but maybe there's a better way to address the NaNs?
Maybe include like a performance section for how well your different models are working

Overall this was very nicely done.

Peer Review by Tiansheng Liu

This project involves research on firm merge. They are trying to predict whether a merge will be successful and drive a strategy based on this prediction. They are going to use data about firm financials, stock prices and other relative data from SDC Platinum, CRSP and COMPUSTAT.

Strength:

This is a great potential application of machine learning in financial industry. If the strategy succeed, it will offer profitable result with low risk. It's also highly related with coursework in this class. Some models and methods talking on class may be used in this project, such like linear regression and classification.
2.This project uses multiple database including SDC Platinum, CRSP and COMPUSTAT. It is certainly good for a research to base on more exhaustive data. But this also bring problems of overfitting and choice between more information which make the project more interesting and complex.
3.This proposal itself is well managed and make things they want to do very clear.

Concerns:

There have been a whole bunch of investors and professionals studying this event. Maybe the gap between no arbitrage price and fair value has been too small to generate profitable strategy.
Also, sometimes investment banks earn from event driven event partially because of asymmetric information. They have more resources and information than normal investors. So competing with them is another difficulty in this project.
Personal factor in merge. Sometimes leaders of both participants in a merge are crucial for the merge. After all, final decision are made by them. However, this factor is difficult to be taken into account in the model.

Looking forward to your progress in project. Hope my suggestion will help you a little bit.

Midterm report peer review by jz552

I like this project about using data on target company and acquirer to predict whether a merger will succeed. It is original and the result of this study can be used to predicting stock market price of the target company as shown in section 1. The data the team has collected are from SCD Platinum, a database for corporate finance and there are many methods that can be used to learn about the data. I think the team has a good method for preprocessing the data with simplifying the status column and removing the data with no ‘Price Per Share’. I also like that the team has chosen a benchmark for comparing the models and discard the models that can’t beat the benchmark. Another good point made in this report is the showing which feature is important in the random forest classifier. I think finding the most important features will help in developing more robust models.

The team could improve on dealing with NaN’s. For example, if ‘Percent of Shares Acq.’ is not an important feature, it could be dropped from the X matrix resulting in 28 features, I think this is a good way to deal with having 1600 data points with NaN for ‘Percent of Shares Acq.’. Another place for improvement is the feature importance graph, I think it would be great to see what these features are. Currently, I only know what is the most important feature, but I think the team can include more information on the second through the fifth most important features as they are also interesting to known. Finally, I think the team could explain more in the future steps section. It would be great to known if the team is going to try other models like linear models, cubic models, or neural networks.

Final Report Peer Review - Siyao Gu (sg2238)

By: Siyao Gu, sg2238
The project aims at predicting the outcome of pending mergers in order to capture the opportunity of investing in merger arbitrage. The data came from SDC Platinum, Fama French Database, Compustat and OptionMetrics, ranging from 1990 to 2016. The group made use of this 8963*73 feature matrix by mainly focusing on the “Status” column as dependent variables for binary classification problem. They first tried logistic regression with L1 penalty and eliminate some features by Lasso. They then implemented simple decision tree, random forest and nearest neighbor, while the overall result turned out to be disappointing. However, they believed that sentiment analysis could potentially improve the model and therefore, the topic would be worthy of further research.

Three things I like about the report:
• The group had a deep sense of what was going on in terms of the topic. To predict the outcome of the merger, they considered features in four categories: valuation, deal characteristic, volatility variables and time series variables, which represented both of the nature, volatility and future move of the deals. Also, compared to other groups, they made use of more than 1 databases for data collection. And that significantly decreased the possibility for underfiting issue in their models.
• They made clever approaches to deal with NA values in the data. The reality that 1600 out of 8963 data points included NA values definitely posed a challenge to the group, while they tried 3 useful ways to effectively fill the NA values instead of deleting all of them. And their final decision of filling NAs with 0 to avoid forward-looking issue did financially make sense.
• They closely followed the principal of avoiding forward-looking bias throughout the whole project and that was truly meaningful. And I particularly like their decision when splitting the data. A simple way to separate the data into training and test data would be randomly splitting all the data, while they thought one step forward, considering the nature of their dataset and splitting the data based on timeline to eliminate look-ahead bias. The group did not find a solid model with an extremely high prediction accuracy in the end, while the details revealed the reliability of their research and how they cared about the topic.

Three things that may need improvement:
• The benchmark the group set was of debt. They mentioned in the report that the original success rate of 83.6% was the benchmark they used, which was significantly high and thus, hard to outperform. I get two questions for this. Firstly, I did not understand how the benchmark was generated even after checking the group’s proposal and midterm report. Secondly, given the fact that I carelessly missed the process where the benchmark was generated, I still believe a false-positive rate or a false-negative rate would be a better benchmark since the problem itself, was a classification problem.
• There could be more intuition written in the Model Selection and Analysis section. The group tried 4 models, while I did not get the connection or the intuition behind the 4 models. For instance, the result of logistic regression was not satisfying as mentioned, then what was the idea of using decision trees afterwards, and why the group used nearest neighbor in the end. The report would be better if the group classified the motivation of transferring from one model to another.
• The last issue is the relationship between the deliveries of the report and the group’s overall topic. Since it is also mentioned in the report that the benchmark was really high and it could be tough to beat the market, the opportunity to capture the opportunity of arbitrage might be even slighter than we thought in a real setting. Therefore, I believe developing trading strategy merely by predicting the success/failure of a merger is not enough and even though there is a model with high prediction accuracy, it may not directly lead to a robust trading strategy. But overall, the group has done an excellent job, and the topic is truly worthy of studying.

Final Report Review - David Zhao (dfz8)

Hi,

I think the topic of prediction merger success is very interesting in that both what it reveals about successful/failed mergers and its applications in the field of finance. After reading your report, I have the following feedback:

What you did well:

The paper was neatly sectioned and utilized an appendix to de-clutter the main report with tables and large figures.
I like how you guys tried out a lot of different types models to see which one would work instead of just one model with different parameters and regularizers.

Things to improve on:

Overall the report was very terse and would benefit a lot from explaining more of your approach (e.g why could you group all non "Completed" and "Pending" statuses as "Failed, or why you chose 0 as your NA value). Furthermore, more discussion on why you chose certain models and their parameters can be enlightening and also help the reader try to recreate your results.
Even though you guys made several nice plots, there were no analysis on them aka why was it important for you to include those plots? Even a simple figure label with a short caption would go a long way!
I think that all of the foreshadowing about future work in the paper should all be moved to the final section. That way your main report talks about what you've done without any distractions, and your future work section can be more fleshed out.

Peer Review by Junqing Zou (jz862)

The project aims to create a model to predict whether a pending merger is successful or not. The group used data set from SDC Platinum with years between 1990 and 2016 and only included public entity targets.

Things I like about the report:

The group described clearly how they cleaned the data (Simplifying multiple statuses into only two status, delete samples without or with extremely low share prices, and etc.).
The group used appropriate models to tackle the binary classification problem like logistic regression, simple classification tree, and random forest.
The group set a reasonable benchmark to evaluate the model (comparing with blindly guessing "Completed").

Areas for improvement:

Maybe try interpolation to fill the NaNs rather than filling with zeros.
It would be better if there are more illustrations on the graphs. For example, what are the features in the feature importance plot?

Midterm Peer Review - jz858

Summary:
The purpose of this project is to create a model that can accurately predict the success/failure of pending mergers, so that it can help companies to develop investment strategies. The data set is obtained from SDC Platinum including target data from 1990 to 2016.

Likes:

The introduction is nicely addressed with illustration of the background and purpose of the project.
The model selection and analysis part describes the cons and pros for every model that has been used by the group, which is great.
It is good to have a feature description table in appendix, which makes the data more clear and easier to understand.

Improvements:

The figures are hard to recognize, especially for the decision tree in model selection part. It is better to add a figure legend or a description paragraph under the figures.
It is mentioned in the data cleaning part that all values in "Status" are used as "Failed" except the two labeled as "Completed" and "Pending". But what if some of the cases with status "Intended" or "Part Comp" can be completed? Maybe it is better to prove that the percentage of that is very low so that this cleaning method is supported.
There is no project summary in the report. It is better to have a short paragraph that summarizes the result next time.

Jiang Zhu

Final Peer Review - Meiyi Li (ml2549)

The project aims to predict whether a merger would be successful using merger related data such as implied volatility, offer price, whether it's an experienced merger... etc. The motivation for the project is based on the belief that there is a stock price difference ex-merger and post-merger, and if one can predict merger successful, the person can take position in market ex-merger and exit post-merger to gain from the arbitrage.

What I'm impressed about the project is the way they dealt with data. First of all, data was split into before 2008 and post 2008 as training set and post set. They did this to avoid look-ahead bias, which is the bias that future information may conclude some patterns in the past. While they did a good job to recognize and eliminate this potential bias, a new potential problem was introduced which is that merger behavior changes over time, and past pattern may not be able to generalize future pattern. Besides the way they split data, I'm also impressed by that the project finally picked 0 to fill in all NA data. This made me think that by filling missing data with mean/median/PCA may actually introduce noise into the system. I wonder whether the feature with 0 as missing data was dropped by L1 selection. Last, the report is simple and easy to read with good usage of tables and charts.

There are several things that I have questions about or I'm interested in knowing more about. First, I would like to know about the training result and test result of Logistic regression and decision tree with mean, median, PCA as data filling method. Second, I don't think failing to beat the benchmark means we can't take arbitrage. Does your model always succeed in predicting the mergers that brings the most profit? And does your model always help you prevent large potential loss? Instead of focusing on success rate, the project can focus on success rate of mergers that brings large potential profit/loss. By predicting significant events right, the strategy may still be able to out-beat the strategy that just guess 'success' for every merger. Last, for the report, it would be better if you could write out the methodology for each model you used and provide a brief explanation.

Peer Review

The proposed project is to predict whether a merger will succeed or fail, based on data about the two companies (the acquirer and the target). The goal is to know whether to buy or short the target company's stock. The training data will be sourced from SDC Platinum, CRSP, and COMPUSTAT.

What I liked:

The problem being tackled is clearly interesting. Our investment company would benefit greatly from having a strong predictive model for merger success. We could make a great deal of money from merger arbitrage.
The authors seem well informed about the problem domain. Their summary of the problem was well written and indicates that they have an idea of which features would be especially predictive.
There are multiple proposed databases. This can be helpful for sourcing features from many aspects of the companies involved in the merger.

What can improve:

The proposed datasets are absolutely massive in width. The process of feature engineering from income statements, balance sheets, etc. may take a long time. Perhaps it's worth limiting the scope of the features being looked at, in order to finish the project in a reasonable time.
The proposed datasets are also massive in depth. The number of mergers and companies is huge. It may be worth limiting the types of companies looked at to a smaller scope (e.g. startups only).
The first paragraph mentions that in practice, much predictive value comes from reading the press releases and news around a merger. If these features are not something you intend to pursue, it is worth mentioning that in the writeup.

Peer Review from Zhiwei Zhou

The goal of this project is to predict whether a pending acquisition is going to succeed. So far, they tried Decision Tree model and Random Forest model.

Pros:

They picked the appropriate models, because the outcome is "True or False" variable.
The implied volatility of company's option is a good shot

Advice:

Maybe try interpolation when dealing with missing data
Maybe show some statistics for the fitting model

Final Peer Review (wl596)

The project aims to build a model to predict the success/failure of the pending merger, which can be used in merger arbitrage. The data set comes from a variety of sources including SDC Platinum, Fama French Database, Compustat, and OptionMetrics.

Things I like:

The group made great efforts on data collection, since their data came from a variety of database. As a result, compared to other groups, they had more features to fit their model.
The group did a great job in data preprocessing, like simplifying their problem from a multiclass classification to a binary classification, and removing data points which lacked the Price per Share feature.
The group tried many techniques we learnt from class to build models. And they also had great ideas about future improvement.

Things that may need improvement:

Filling NAs with 0 may be inaccurate, and may waste the values of the non-NAs entry. I do agree with the group's concern about avoiding look-ahead bias, however the group may try to implement imputation and other methods in a way that can avoid look-ahead bias. For example, to predict a NA in time T, use the mean of the non-NA values from time 0 to T-1.
The group has try 4 different models to fit the data, however, I could not quite understand the underlying reasons for choosing the fourth model, i.e. the nearest neighbor classifier. The group may add more explanations about the intuition of choosing this model.
The analysis of the model results may be a little short.

terterfred / mergerarbitrage Goto Github PK

mergerarbitrage's People

Contributors

Stargazers

Watchers

mergerarbitrage's Issues

Recommend Projects

Recommend Topics

Recommend Org