Skip to Content, Navigation, or Footer.
The Tufts Daily
Where you read it first | Tuesday, April 23, 2024

Excited for baseball? It doesn’t matter (yet)

SPORTS_BBA-ANGELS-MARINERS_8_SE
The King's Court cheers for Seattle Mariners pitcher Felix Hernandez as he strikes out ten and the Seattle Mariners defeat the Los Angeles Angels 4-1 for Opening Day on Monday, April 6, 2015, at Safeco Field in Seattle.

After the Cardinals and Cubs kicked things off Sunday night, baseball is officially back. After one of the crazier off-seasons in recent memory, many are closely following how the regular season will play out. People are already monitoring the young Cubs team that has the potential to be one of the best in baseball and the completely rebuilt Padres. Who wouldn’t be excited for the season?

It is important, however, to remember that the baseball season is 162 games. It is very hard to draw any meaningful conclusions from the first few games of the season. Everyone is still in the hunt, and very little can be said about changes in reasonable expectations for individual season performances, given that baseball results do not tend to stabilize quickly. At the beginning of the season, what is it possible to conclude?

Small datasets that have meaning are still datasets that have meaning. Obviously, naively extrapolating a player's performance for the rest of the season based on the first 10 games’ results would cause far more incorrect predictions than correct ones. In reality, people expect each player to have a true talent level, and given enough time, his observed talent level will converge to his true talent level. There is an expectation of each player’s and team’s true talent levels before the season starts, and then that expectation changes as the 2015 season progresses. This sort of thinking can be modelled mathematically, specifically with Bayesian statistics.

Bayesian statistics most basically splits problems into two parts: a prior probability distribution, and a posterior probability distribution. A prior probability distribution for baseball is what is expected to happen before any games are actually played. Different people could have different prior distributions. Some may think Jose Abreu has over a 50 percent chance of hitting 40 home runs going into this season, others may not. The posterior distribution is the changed probability distribution after data has been collected. For instance, if Jose Abreu hit 30 home runs halfway through the season, most posterior probability distributions would be more favorable towards Abreu, since he outperformed most prior probability distributions in the first set of games. The degree to which posterior distributions change depends on how much weight is given to every individual event collected.

The problem can be simplified significantly by assuming that all players and teams are expected to be league average talent-wise, and all deviances from league average in the past were due to randomness. This would be a very simple -- though likely incorrect -- prior probability distribution, which could give some insight into what can be expected from teams.

Sabermetricians like Phil Birnbaum and Tom Tango have done analyses for team winning percentages on this prior distribution.Using Bayes' theorem (a probability rule), one can find that expected true talent winning percentage based on the results from the beginning of the season for teams is approximately equal to (35 + Wins)/(70 + Wins + Losses).

Basically, each team’s new expected talent level is a weighted average between their performance at the beginning of the year and a league average winning percentage (.500). Seventy games into the season, these two portions are weighted equally, meaning that after 70 games, a team is just about as likely to perform at the league average rate as they are to perform at their rate up to that point in the season.For less than 70 games, teams are more likely to perform at league average for the rest of the season, and for more than 70 games, teams are more likely to perform at their current winning percentage rate. These are estimates, so with total games close to 70, current performance and average performance are negligibly different.

However, this is an unlikely prior distribution.Most expect teams like the Dodgers and the Nationals will be dominant this year, and teams like the Braves and the Phillies will be weaker. Using the most advanced projection systems (like PECOTA or Steamer) for our prior probabilities, it is possible to better model the prior distribution. It is still the case that it takes around 70 games for the season performance to be a better indicator of future success than the prior distributions, but with improved models the two probabilities are likely closer to each other, assuming the changed prior distribution is better.

So, while it is certainly exciting that baseball is back, predictions should not be changing until about two months from now. In the meantime, just relax and watch some baseball.