silviaruiz44 / hranalytics Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 0.0 17.81 MB

Jupyter Notebook 99.99% Julia 0.01%

hranalytics's People

Contributors

Watchers

hranalytics's Issues

Midterm peer review zl683

This project aims to predict whether an employee is likely to leave the company using data from three different sources: an employee survey, and employee database and a direct manager survey. The data are from three different departments within the company.

Three things I like about the report:

The topic itself is very realistic, and I can imagine that companies would find this model very useful because recruting cost is very high. report is very well written and detailed. The structure is clear, the layout allows more information, and there are many figures and charts for illustrating the points.
Logistic regression is a suitable model for this project. The whole preliminary analysis process is quite clear and standard: exploratory data analysis, model fitting, train-test split and confusion matrix.
Exploratory data analysis is very detailed. Data types are listed and for each feature there are statistics about it. Also this is used to determined how big a role certain feature is going to play in the prediction.

Three areas for improvement:

A correlation plot could be added to further explore the relations between different features. After looking into each feature one by one, it is necessary to see how they interact with each other. For example if you see too features are highly correlated you may consider drop one of them.
Exploring more models. Since in this stage you are applying preliminary data analysis, you can try more models and pick one with good performance or higher interpretability. For example if we picture using this model in a real world scenario, explaining logistic regression to non-technicians may take some time, and a decision tree could be more easily understood.
Considering applying this model in real life, a false positive and a false negative could mean complete differet moves a company should take, and the cost are different. It’s good that you emphasized this. However the confusion matrix looked a little bit confusing. From the table I cannot tell whether the columns are predictions or the rows are.
Inspired by one of our homeworks, I think comparing department models and a company model would be interesting because models for different departments might be very different.

Peer review - yz2522

Summary:
The team is trying to figure out why people leave their companies. The dataset is from Kaggle, which contains information regarding to job level, work life balance, etc.. The objective of the team is to find out why people leaving their companies and what can be done to make employees stay.
Three things I like
(1) The theme and objective seem close to students' life, especially for those who will graduate soon. This could make students know better about their job choice.
(2) The size of the dataset is pretty huge, which seems quite reliable.
(3)The dataset is listed for reviewer to check.
Three improvements
(1) Better to split different topics as single paragraphs.
(2) Better to add more information about background research.
(3) Better to dig a little bit into the data and find some other interesting objectives.

Peer Review - hg426

Summary: The project team is trying to model the HR aspect of a company, in particular to model the employee turnover using some work place data. They will use a HR Analytics dataset from kaggle, where the dependent variable is whether the employee left the company.

Pros:

Report is very succinct. I got your objective very well.
Interesting topic - could really benefit company in keep their talent.

Cons:

The report lacks a little more background.. why is the research necessary?
Should be more information on the data, how is it obtained?
What is the dependent variable in your research?

Peer Review - fx43

Summary: This project studies the factors that influence the turnover rate in the companies. The data they are using is Kaggle HR-Analytics dataset. Their objective is to predict the to-leave decision of the employers from the given factors within the dataset.

Three things you like about the proposal:

The project has an obvious significant background in studying the factors influencing employees' choices.
The project considers factors from the employees' side.
The project has clear labels so supervised learning will be practical.

Three areas for improvement:

The question the project proposed is too large that the dataset may not cover the entire picture of the question. For example, different industries have different factors influencing the turnover rates. Sometimes, it is the company's performance that makes the turnover rate changes over time. These factors may need to be taken into account.
The result of the project may not reveal the actual factors for why people leave. As the HR may not receive the genuine response/feedback from the previous employees of why they left. The team need to consider the quality of the dataset for the study to make sure it can address the concern.
More specific analysis techniques such as the logistic regression can be mentioned for a baseline case and others as comparison. So that the proposal will become more concrete and well-planned.

Final Report Peer Review

This HR Analytics project attempted to predict the attrition rate of workers based on several features for each employee. Their project was looking to predict a boolean variable for attrition, encoded as -1 or 1. Their initial feature analysis was comprehensive and they did a good job finding the correlation with variables and identifying useful techniques to handle over and underfitting. Even though their project only attempted to predict attrition, it would've been interesting to also see if they could build a model that predicted how long an employee would remain in the company as well. The errors in the model seem to be pretty acceptable, but it is not clear how the test set was created and how certain they were that the model was not overfitting to the test set.
Their recommendations and conclusions are comprehensive. They take into account all the variables and their final product would be quite useful for their client. With some more testing, I would personally recommend this model to their client. It would have also been nice to see al little more interpretation on why the initially chose the models they did, and how they think each regularizer affected the accuracy of each model. Overall, a great project with a very useful premise and good application.

Peer review by zl722

The project seeks to find the reasons behind employees turnovers. The data is "HR Analytics Case Study" from Kaggle.com. The project is very meaningful and useful, since employers can use the result to provide better working environment for employees, and thus reducing turnover rates.

Likes:

The project discusses a contemporary problem, which concerns many millennials like us.
There are multiple tables with multiple different columns, which makes this project interesting. There are many factors/variables that can be worked on.

Improvements:

Employees may leave current companies for different reasons(dislike of current company, better opportunity elsewhere). What do their turnovers indicate?
It is mentioned that "relationship with manager" is part of the predictor variables; how to quantify this?

Final Peer Review-jvg28

The goal of this project is to predict whether an employee is likely to leave a company the following year. The dataset that this team is using is from a Human Resources department where it is made up of 3 different sources including an employee survey, a response variable attrition with general information of an employee, and the results of a manager survey. To get the final dataset that they worked on, the team merged all the data sources by employee ID and left them to evaluate 4,410 observations with 28 features.

First thing that stuck out to me was how nice the format of the project paper was, the first page seemed so official as if it was from Cornell itself. The way that the team visualized the data was done nicely so that the reader can have a better idea of what the data is portraying. The methods that they used to compare the different models that they ran on the data was useful to identify which model classified their data the best, which was the Random Forest model. To conclude, this project was done very nicely and it provided key insights into which factors will help an employee stay in a company.

Final Report Peer Review

The project seeks to classify employers as likely to leave the company based on their employee survey and department reviews. The data comes from one company’s employee surveys, employee demographics, and the results of a manager survey evaluation. The project will help the company identify which features are most important in determining if an employee will stay, so they know what to improve on in the company.

Likes:
-- I think balancing the data set was a good idea to make sure that the model trains on both employees that leave and those that stay. I liked the discussion about how you picked between oversampling and undersampling because it showed that you thought about the consequences of each choice.
-- I like your graphs in the exploratory analysis because it shows the distribution of many features overlayed together, showing how they may affect each other and gives an idea of the importance of different features.
-- I liked that you interpreted each model. I think that this shows the merits of each model and shows that you thought about more than minimizing error when picking a model.

Suggestions:
-- I do not agree that employees that do not answer a question would have given a bad rating. Many people do not rate things because they do not care enough to answer and provide good feedback. I think an external source would have helped me believe this.
-- Labels on the correlation chart would have helped me understand the correlation between features and the value of the correlation.
-- I think that a fairness metric, like parity, could have been useful in your report because it would show the current biases in the model.

Peer Review -sns224

The NYPD Arrests Dataset project proposal examines crime from 2006-2018 in NYC. The dataset is called the NYPD Arrests Data which is owned by NYC OpenData-it is updated every quarter by the NYPD so is a reliable data source with very frequent updates. It contains 18 columns with over 4 million records. The group hopes to find some correlation among economic prosperity, demographics, and crime over the time period since the dataset contains data during the financial crisis up until a year ago, over which they presume New York City has been increasing in prosperity (financial wellness). Overall, they want to analyze crime records and ascertain how the type of crime has changed over the past 13 years.

One thing I liked about the proposal is that there is a lot of data and strong features that provide a ton of room for exploring patterns and hypothesizing. I also like one part of the objective which is exploring the crime demographics to potentially find certain areas getting better or worse, or certain races doing better than others crime wise. Lastly, I like the reliability of the data source since it is professionally maintained by NYPD which adds to the accuracy of findings.

One area for improvement is the clarity of the objective itself. There seems to be a general objective to finding out how crime has changed overall during the time period, but then it also mentions the dataset contains data from 2008 (the financial crash) which can exemplify how an increase in prosperity has changed the type of crime.
Another area for improvement is to identify what features can be used for attacking the objective. Lastly, you should add something about what you are going to try to predict? Besides exploring the data, what specific questions can you explore using an input space and corresponding output?

Final peer review aa2686

The goal of this project is to predict attrition in employees and the factors that most influence an employee leaving the company.

I liked that the analysis identifies the issue of imbalance in the data and uses oversampling, undersampling techniques to address the issue. The methods used for the study have been explained and the results have been presented well. The conclusions section summarized all the findings and the potential for use as a weapon of math destruction and fairness using false positive and false negative rates, which are relevant to the problem.

Overall, it's well-written report.

Midterm Peer Review

This looks to be a very interesting problem, and probably has real world applications. You seem to have already identified a good number of 'predictors' which make sense for your project such as (age, department etc).

You seem to have done an excellent job with initial data exploration, transformation and analysis. Your project objective and the report are also very clear. You also seem to have a strong understanding of what the next steps in your projects are, which is a good sign.

I would be concerned about the amount of data you have. Training data of 3200, examples is not very much and it is very easy to overfit. Also you say that you would like to use simpler models, so that you do not overfit. One concern would be the simpler model, you plan to use, might miss out on many subtle trends. Another concern would be your error reporting. You could, just by predicting that no-one leaves the company attain a high accuracy but what matters is your false positive and false negative rates.

Final Peer Review

Hi guys,

I would like to start off by saying well done! This project was very well executed and the results were displayed in a very professional matter. To start, I love the fact that the report was organized and that the section headings were very clearly labelled and easy to identify. Including an abstract at the beginning of the report helps the reader to know exactly what your research question and expected goals are before they even are even introduced to the technical parts of the report.

The data description section was very detailed and I thought the example provided for the one hot encoding was very useful in understanding the setup of your features. I also thought that the section on imbalanced data was of particular interest to me because I was not familiar with oversampling and under sampling techniques. The initial analysis section had too many visuals and not enough text in my opinion. I think here you could have chosen two of the most interesting visuals because the font size on the four you chose is quite small and you need to zoom in a lot to be able to really read them.

The models that you chose to include were very well developed and tested and I think the choice of error metrics and loss functions was appropriate for each respective model. Finally, the recommendations and fairness considerations were ethical and well-thought out.

Well done!

Final Report Peer Review

HR Analytics

The HR Analytics group was interested in discovering reasons for employees to be leaving a company and identity those employees that are the most likely to leave the company. They divided their dataset of employees into two groups “attritioners” (people who left the company within the year), and non-attritioners. The data used was actually compiled from three different datasets containing employee information. The datasets were compiled using the employee ID.
I really liked that this group spent time and effort identifying the least important features and removing them in order to reduce overfitting of their models. For some removed variables there was the same response for each employee -thus the variable was rendered useless. For other removed variables, however, the group strategically removed those variables with very strong correlation. I didn’t particularly like the group’s handling of the missing data. I think that it was too bold of a decision to replace all missing values in the “environment satisfaction,” “job satisfaction” and “work life balance” with the minimum of that column. I similarly thought it was too bold to replace all missing values in “number of companies worked” and “total working years” by those columns’ median values. One can easily notice in the histograms of these two variables that there are two distinct bins which must represent the medians because of how tall these bins are. I bet that imputing these missing values in such a way had a negative impact on all subsequent models. I think that attempting matrix completion would have been a much safer bet. I was very impressed by the exploratory analysis of this group. I thought it was cool that the group showed the age distribution for attritioners and non-attritioners and discussed the intuition of younger people being more likely to move around. Throughout the exploratory analysis and modeling I kept noticing that the most important predictors were what intuitively made sense (although I wish the group had emphasized this a bit more). For example, years since last promotion was very important in determining if someone was likely to leave. Similarly, I liked that this group repeatedly (through various models) demonstrated the important features. I particularly liked the odds ratio from the logistic regression and the feature importance graph from the random forest. Something that concerned me here, though, is that the top feature from the logistic regression (by a factor of 2) was number 8 from random forest. An explanation of these differences would be useful. In general, I wish that the group wrote a little bit more in their conclusion regarding what to do with the information they gained from models. It is pretty obvious that higher wages and more promotions would lead to less employees leaving -however, these goals aren’t really feasible for a company with limited cash and limited management positions. I would’ve liked if the group identified some variables that would be easier for a company to manipulate. Finally, I think the report could’ve used a couple more proof reads. I found a few grammar mistakes.

silviaruiz44 / hranalytics Goto Github PK

hranalytics's People

Contributors

Watchers

hranalytics's Issues

Midterm peer review zl683

Peer review - yz2522

Peer Review - hg426

Peer Review - fx43

Final Report Peer Review

Peer review by zl722

Final Peer Review-jvg28

Final Report Peer Review

Peer Review -sns224

Final peer review aa2686

Midterm Peer Review

Final Peer Review

Final Report Peer Review

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent