Juice Squeezed: The 2019 Postseason Baseball is Dead
Note: My colleagues and I at DataRobot originally published these findings on the DataRobot AI Blog.
This past week, much has been written about the less-lively baseball in the 2019 Postseason. The observations have ranged from anecdotal, to hearsay, to analytical. However, I feel all of these perspectives are limited by subjective bias, incomplete analysis, or small samples. To test and quantify what is actually happening, I wanted to develop an approach that relied on regular season data, accounted for all variables that might affect ball flight and play outcomes, and do so with a high degree of precision. This would help ensure that we control for as many variables as possible in pinpointing ball-related variables.
To make a long story short, I used AI tools* that ‘learned’ from all 2019 regular season pitches and balls hit into play to make predictions on 2019 Postseason plays. This would carry over 2019 Regular Season characteristics into Postseason predictions on how the baseball should have behaved assuming the same baseball. If the baseball change from the Regular Season to the Postseason, then we’d see variances between the AI predictions and actual results. Specifically, I configured these AI models to predict three variables that may be evidence of a different baseball:
- Pitch movement on the X-axis (left-to-right)**
- Launch speed for balls hit into play
- Hit distance for balls hit into play
Given all other input variables (e.g. pitch speed, spin rate, stadium, pitcher, batter, pitch location, etc.) that may help us predict these variables, the models were trained to give me highly accurate predictions for what should have happened, assuming the Regular Season baseball. We could then compare that to what it actually was and get a signal of possible unaccounted for changes in the playing environment (e.g. the baseball). To help illustrate the approach, the models would work like this:
- Based on data from every ball hit into play in the 2019 Regular Season, and…
- Given a pitch a four-seam fastball is thrown by Justin Verlander to Aaron Judge at Yankee Stadium in the low-and-away sector of the strike zone, and…
- Given Aaron Judge hits the ball with a lunch angle of 24 degrees and launch speed of 112 MPH, then…
- The ball should fly X feet
If everything were the same in the Regular Season and Postseason, then the prediction of ball flight should be extremely accurate relative to actual flight (+/- 0.1%), as well as for the other predictions (launch speed and lateral movement.
However, that is not what we found.
The models trained on 2019 Regular Season data actually generated predictions that were significantly different from actual Postseason results.
- Launch speed after contact was 0.5% higher than expected, indicating a ball with a higher Coefficient of Restitution. However,
- Batted-ball flight was 1.1% shorter than expected, and pitch movement on the lateral (X) axis was 0.7% greater than expected, both indicating a ball with greater drag and more air resistance than expected.
Two effects observed and supported by thousands of Postseason predictions based on Regular Season trends: the ball is livelier off the bat, but that benefit to the hitter is more than cancelled out by increased air resistance and drag, resulting in shorter ball flight. This increase in drag also results in pitches that move more since the effects that ‘grip’ the air and reduce ball flight also help pitches ‘grip’ the air and create break.
We also wanted to ensure this Regular-to-Postseason change wasn’t just a common thing that happens as weaker teams go home and better teams continue playing. To test this, we ran the same batted ball experiments on 2018. The exact same procedures resulted in no meaningful change between the 2018 Regular Season and Postseason. Thus, the changes are truly unique to 2019!
So, what we can conclude is that something changed in the 2019 Postseason – the ball is springer (higher launch speed and CoR), but also creating more drag (more lateral pitch movement and shorter ball flight). The practical effect we can expect to see is more dominant pitching, less offense, and more competitive games. And in fact, this is what is playing out. Average runs scored per game in the 2019 Posteason is 1.8 runs lower than the Regular Season for teams that qualified.
* The method I used was Automated Machine Learning. Using Automated Machine Learning (Auto-ML) from DataRobot, I trained models that predict launch speed off the bat and ball flight distance for balls hit into play. To dramatically simplify a complex process, a Machine Learning model looks at massive amounts of data (e.g. every ball hit into play in the 2019 regular season) and figures out exactly how variables relate to each other so they can make predictions on future hits. Auto-ML builds dozens of models on the same data using different approaches, and figures out which approach is most appropriate and accurate for the given situation.
** I have excluded vertical pitch movement from this analysis as vertical movement, drag, gravity, and pitch type all interact in extremely complex ways that require a physicist to truly understand. I don’t want to oversimplify an extremely complex subject like that here, so I’ve left it out. However, for what it’s worth, I found Postseason pitches had 0.5% less movement than Regular Season pitches.