Article written by Rob Engel, VP of Software Engineering, MLB. 

A batter will dig into the box, stare the pitcher down, wait for a pitch that they can groove and then — twitch-whack-crack â€” hit the ball a long, long way.

Knocking a line drive or a home run isn't just a matter of a player's power, their otherworldly hand-eye coordination, or maybe some luck. In today's game of baseball, it takes thorough preparation and studying reams of data in addition to natural skill. That way, when the opportunity comes, the player is ready to pounce.

As an organization, Major League Baseball operates much in the same spirit as a batter at the plate. We've spent years building the right infrastructure, then adding layers of capability and innovation over time, so that when the next big opportunity comes, the league has the ability to seize it for a game-winning play.

Our next at-bat is AI — and we're ready for it.

For the past decade, MLB has been building a robust apparatus for capturing every detail of the game on the field: Statcast. Introduced in 2015 and migrated to Google Cloud in 2020, Statcast is an umbrella term for how we pull all the data from the game, capturing from sensors all over the ballpark, pulling it into our databases, and then delivering that burst of information in near real-time via APIs.

It's through these APIs that we can pull those stats instantaneously into our own digital products, such as the MLB app, MLB.com, MLB.TV, and Gameday. The data is also sent to the 30 clubs for data analysis and roster improvements; to scoreboards in the stadium; to our broadcast partners who do great visualizations, telling different stories about everything that happens on the field.

The next step for Statcast is to integrate all the new innovations in AI to be able to present new and engaging experiences to fans watching the game, whether that's in the stadium, on television, or on their phones.

New Way Now: MLB knocks the fan experience out of the park

Applying AI to the national pastime

AI provides a massive opportunity for uncovering data that was difficult to access in the past. AI can now go through the entire corpus of our data set and really make it pop, helping highlight what's relevant or important.

We have so many interesting opportunities in how we could use AI to uncover some diamonds in the rough within our dataset. Like a prospect in the minor leagues waiting to burst into stardom, we have a reservoir of untapped potential here, which can power how the league and our partners are able to get stories to the fans that are more entertaining and interesting in real time.

Google Cloud's suite of AI tools empowers us to take troves of ball-, player-, and pose-tracking data and build predictive modeling over top of it. This allows us to show things such as how difficult it was for a player to make a catch based on the probability of it being caught, or in how many ballparks a batted ball would have cleared the fence for a home run, or even the likelihood a baserunner would successfully steal a base.

These same tools also allow us to build models to describe data after the fact. We are able to leverage many data points about the pitched ball to train custom models and neural networks for each pitcher to know their pitch repertoire, and to classify each of their pitch types.

AI and the fan experience

While untapped potential and opportunity are great in theory, the practical application of technology is what really defines what we're trying to do with the combination of AI and Statcast. The driving motivation is to enhance the game, improve how our organization and that of all 30 of our Clubs perform, and give fans new and novel experiences.

There's so much data within our archives and in Statcast that it's almost impossible for a human to go through and find out what is really meaningful. Leveraging AI allows us to understand what's actually interesting.

https://storage.googleapis.com/gweb-cloudblog-publish/images/mlb-statcast-ai-fan-experience-team-analyt.max-1500x1500_IiHG00Q.jpg

For instance, a home run that is a statistical outlier, off the bat at a scorching 120 miles per hour. Or maybe it was the hardest home run for a rookie in their debut, or this is the first time somebody's hit three home runs of 450 feet or more. It used to take a human many hours to sift through the data and find these stories. Now we can use AI to understand when this happens nearly instantaneously, and give this information to our own content team and broadcasters so they could highlight these stories to the fans as they happen.

Overall, we want to give our fans the big picture. We want to say here's how the game has evolved over the last 10 years. We're seeing higher launch angles for batted balls, we're seeing home runs hit further, and we're seeing that this data correlates to bat speed. We're seeing more spin on the ball, we're seeing faster pitch velocities. These are all great things to inspire our fans directly through our media channels or those tuning into other feeds.

-Rob Engel

It's really about unlocking new ways to study the game, not only for the fans, but also for our clubs, since Statcast and all the data that MLB captures is also available to all 30 clubs. Getting all of the Statcast data into our database through our APIs into a BigQuery environment for clubs to run their analysis on, it gives all 30 clubs equal footing to go and then build proprietary algorithms of their own to really try and separate them from the other clubs.

The MLB technological journey has been fascinating to watch unfold over the last decade. From being one of the first organizations to bring streaming video to the masses, to being pioneers in using on-field sensors to capture data, and then what we've been able to do with that data through Statcast, the cloud, and AI. I don't think any of these things would be possible without this great partnership and the technology that Google Cloud really unlocks for us.

Learn more about AI Solutions & Google Cloud Contact a WWT Expert 

Technologies