My dataset comes from the field of sport, more specifically in the game of baseball. As many of you may know, statistics play a huge role in evaluating a player’s and team’s progress. Since the flow of a baseball game has natural breaks to it, and normally players act individually rather than performing in clusters, the sport lends itself to easy record-keeping and statistics. Traditionally, statistics such as batting average (the number of hits divided by the number at bats) and earn run average (the average number of earned runs allowed by a pitcher per 9 innings) have dominated attention in the statistical world of baseball. However, the recent advent of sabermetrics has created statistics drawing from a greater breadth of player performance meaures and playing field variables. Sabermetrics and comparative statistics attempt to provide an improved measure of a player’s performance and contributions to his team from year to year, frequently against a statistical performance average.
Throughout modern baseball, a few core statistics have been traditionally referenced - batting average, RBI, and homeruns. For pitchers, wins, ERA, and strikeouts are the most often-cited statistics. General managers and baseball scouts have long used the major statistics, among other factors, to understand player value. Managers, catchers and pitchers use the statistics of batters of opposing teams to develop pitching strategies and set defensive positioning on the field. On the other hand, managers and batters study opposing pitcher performance and motions in attempting to improve hitting.
For this study, I attempt to answer the questions: “What are the significant variables to predict the number of homeruns hit by a baseball player?” In baseball, a homerun is scored when the ball is hit in such a way that the batter is able to circle the bases and reach home safely in one play without any errors being committed by the defensive team in the process. Homeruns are among the most popular aspects of baseball, and as a result, prolific homerun hitters are usually the most popular among fans and consequently the highest paid by teams. Therefore, it is important to study the relationships between the number of homeruns with other statistics so the players can improve their games and managers can have a more holistic evaluation of the players’ batting ability. In my study, I found out that there are 6 significant predictors that have strong connections with the number of home runs a player hits: number of games played, number of hits in which the batter reached 2nd base, number of hits in which the batter reached 3rd base, number of runs batted in, number of stolen bases, and number of base on balls.