Stats in 2015: A sabermetrician’s view

Brandon Phillips hits a two-run single in the first inning against the Atlanta Braves at Turner Field in Atlanta, Georgia, on Friday, July 12, 2013. Phillips recently criticized the use of sabermetrics in baseball. Jason Getz / Atlanta Journal-Constitution / MCT

Brandon Phillips turned heads this past week with his thoughts on analytics departments.

“I feel like all of these stats and all of these geeks upstairs, they’re messing up baseball, they’re just changing the game,” he said. “It’s all about on-base percentage. If you don’t get on base, then you suck. That’s basically what they’re saying. People don’t care about RBIs or scoring runs, it’s all about getting on base.”

Phillips isn’t alone. Charles Barkley made headlines recently by similarly bashing analytics in basketball. Harold Reynolds has outwardly disregarded analytics for a while now. With ESPN recently publishing “The Great Analytics Rankings,” in which they ranked all 122 front offices across baseball, basketball, football and hockey, there is a question worth examining: What relevance does analytics actually have in sports?

Bill James, often considered the father of sabermetrics, first defined sabermetrics in 1980 as “the search for objective knowledge about baseball.” Notice that this definition does not include any mention of statistics or data. The key word is “objective.” We use statistics and data today in order to search for objective knowledge, using unbiased modelling processes with these data to develop advanced statistics. We do this to eliminate as much subjectivity in our analysis as possible.

With this in mind, let’s analyze Phillips’ comments. The first thing to note is that overall, Phillips is right. Baseball is a game about winning, and the way to win is by scoring more runs than the opponent. We can think of runs as the currency that buys wins. Getting more gets you more wins.

By looking solely at runs and RBIs, however, we lose the process of how we added runs to the team. Runs and RBIs merely credit the player who most directly contributed to scoring, but they do not consider the inputs that allowed for that situation to occur. Analyzing players based on just the end product ignores the context, which means there is more subjectivity in the analysis. For this reason, sabermetricians like to look at how an individual event, like a single, double, triple, home run, walk or out, contributes to run scoring. From there, we can analyze which events are most valuable, and specifically, how valuable in terms of runs. It does not stop there though. Sabermetricians often adjust for circumstance, such as in which ballpark the events occurred, or which pitcher was on the mound. These adjust for factors out of the batter’s control, ones that do not measure the batter’s true objective skill. On-base percentage happens to be a statistic that correlates a lot to total team run scoring and is easily adjustable for circumstances like different ballparks. That is why many prefer to look at on-base percentage over RBIs and runs — it just paints a more complete picture.

We should not see “more complete” as “complete,” however. In general, incorporating only one statistic into a model is less accurate than including a few descriptive statistics. Think about buying a house. One does not only consider the number of bedrooms when considering houses. Other variables include the size of the bedrooms, the amount of bathrooms, the quality of the neighborhood, how recently the place has been renovated and perhaps most importantly, the price. In our everyday lives, we take in as much information as possible to try to make the most informed decisions possible, to extract the most objective knowledge possible. Why should this differ for sports analytics?

Statistics in sports generally have two purposes. The first, as mentioned above, is to describe the game. How does on-base percentage relate to scoring runs? The second is predictive. What can we expect from a player who had an OBP of .380 this year at age 27 in Fenway Park in the following season? When a statistic more accurately represents the process of winning or is better at predicting future success, it is of value. The problem is that it is often hard for a statistic to both accurately describe run scoring and to be reliable as a predictor for the future. That is why we like to use multiple statistics in our models to analyze the game, because both aspects are important for gaining objective knowledge.

Events such as the Society for American Baseball Research Analytics Conference in Phoenix and the MIT Sloan Sports Analytics Conference in Boston are designed to get us to continue questioning how we can improve our existing models of the game and what new models we can create to further unlock objective knowledge. We will never completely understand the game, no matter how much data we use, what kinds of methods we use and how often we watch the game. Our goal is to continue closing the gap of subjectivity, and we often employ analytical methods to do so.

So, to answer the question, “What role does analytics actually have in sports?” is most basically to understand the game better. Any player looking at video is just as much a “geek” as any “geek” with a computer open. They are both searching for objective knowledge of the game, and we can learn a lot from putting the two together. Front offices today are already integrating scouting and statistics to paint the more complete picture. We need to as well.