MLB pitchers and pitch types

Can MLB pitching statistics be used to visualize differences in pitch types?

Grades 6-12

Major League Baseball (MLB) teams can have up to 13 pitchers on their roster during the regular season, and four to six of those are usually starting pitchers. Our goal is to use MLB data to see if there are numerical differences between different types of pitches and pitchers.

Specifically, we wondered if we could look at MLB pitching statistics to visualize the differences in the types of pitches thrown by two successful starting pitchers, Yu Darvish and Jon Lester.

As background, MLB pitchers will specialize in different types of pitches thrown (e.g. fastball, slider, curveball, etc.). These pitch types are achieved in part by the different speeds at which the baseball is thrown, and the rate of spin (“spin rate”) of the baseball.

To answer our question we:

  • Used data from a GitHub user who gathered pitching data for the 2012-20 baseball seasons from the website baseball-reference.com.
  • Created scatter plot graphs — a type of data visualization that plots points as the relationship between two variables (or sets of data) using Cartesian coordinates.

In the first plot, we can see that Yu Darvish has thrown 10 different pitch types.  These vary with respect to both speed and spin rate, with the fastest pitch (two seam and four seam fastball types) reaching 98.8 miles per hour (MPH), and the highest spin rate for a pitch (change-up type) reaching 3,042 revolutions per minute (RPM).

In the second plot, we can see that Jon Lester has thrown five different pitch types. He doesn’t throw quite as hard as Yu Darvish, with his four-seam fastball pitch type reaching 96.1 MPH. The maximum spin rate achieved by a Jon Lester pitch is 3,482 RPM. This is higher than any of Yu Darvish’s pitches, but seems to be abnormal and may be an outlier.

Reflect on what you see

Look and interact with the scatter plot graphs above. When you mouse-over the scatter plots, you’ll notice more information appears.

Think about the following questions:

  • What do you notice about the scatter plot graphs?
  • What do you wonder about the data?

Use the fill-in-the-blank prompts to summarize your thoughts:

  • “I used to think_______”
  • “Now I think_______”
  • “I wish I knew more about_______”
  • “These data visualizations remind me of _______”

Learn how we visualized the data

Go to our walk-through (in Jupyter notebook format) to see how we used the data science process to create the line graph. (This process included formulating the question, gathering the data, analyzing the data with code, and creating the visualizations.)