Baseball Statistics and Tableau

Example Tableau visualizations using baseball statistical data


Project maintained by dgwartney Hosted on GitHub Pages — Theme by mattgraham

Welcome to Baseball Statistics and Tableau.

This page showcases calculation and visualization within Tableau using a subset of baseball statistical data obtain from SeahLahman.com. Tableau provides a public site that allows sharing of visualizations created using Tableau Public along with ability to embed these visualizations within your own pages. In the following sections visualizations from Tableau Public are shown with a brief description of data used within them. All of the example Tableau visualization workbooks, data extractions, and raw data are publicly available on GitHub.

Preparing the Data

Baseball statistic data mentioned above comes in three different formats: Access Database, SQL DDL and DML, and files in CSV format. I choose the later, but usually I would load the data sets of this size directly into a relational database such as MySQL or PostgreSQL and use the open source ETL tool Pentaho Data Integration or by its open source name Kettle to clean and normalize the data and then the normalized data would be loaded into a relation cube consisting of a star schema with appropriate dimensions and measures.

Using CSV files as the source of data was a chance to show examples of how Tableau can be used to prepare data directly. For the example visualizations that follow, only three files from the set of data were used. Specifically a single data source was defined in Tableau that joined the following files:batting.csv,master.csv, andteams.csv. Which fields to join on each of the files is described in text file that is included in the baseball data distribution.

Sections below give a quick overview of the dashboards created in Tableau using the baseball statistical data.

Player Batting Comparison

Below, a Tableau dashboard is displayed that permits comparison of different players, from Major League Baseball (MLB), batting statistics over a contiguous or dis-contiguous number of Seasons.

At the top of the dashboard a table displays the raw batting statistics (Hits, At Bats, etc), for each player and season, which are used to compute such statistics as On Base Percentage and Slugging Percentage. Statistical analysis of On-Base Percentage and Slugging Percentage has demonstrated that these metrics are a far superior indicators of offensive success than their 19th century counterparts which traditionaly include: Stolen Bases, Runs Batted In (RBI), Batting Average. Moneyball:The Art of Winning an Unfair Game, a book by Michael Lewis, chronicles the general manager Billy Bean's sabermetric approach in forming a competitive team with uncompetitive revenue during the Oakland Athletic's 2003 season, which relies heavily on on-base percentage and slugging percentage.

To perform a comparative analysis of batting metrics between players, first, use the Season and Team drop down menus to narrow down the players of interest. Second use the Players drop down menu to select the set of players that you want to compare. Comparison of the selected players (lines) batting metrics (y-axis), over the select seasons (x-axis) will be displayed in the graph in the lower half of the dashboard.

Note: The default values when initially visiting the page are intentional since it shows the two players Mark McGwire, and Sammy Sosa in the period during the 1998 Major League Baseball home run record chase.

Learn About Tableau

Team Batting Comparison

A comparison of MLB baseball teams traditional offensive metrics (Hits, Runs, RBIs) is shown in the Team Batting Comparison dashboard. These metrics were choosen for comparison because they are Additive. Team metrics for On-base Percentage, and Slugging Percentage cannot be calculated from each players respective values. Tableau can compute such metric using a Table Calculation, but this was out of scope for this exercise.

Metrics are shown rolled up by League (American, National), Team, and which side the player bats. Using the Season drop you can select a single year for which to compare teams for the aggregated values.

Learn About Tableau

Player's Birth Country

Tableau has built-in map capability and if a field in a data source represents a city or country, or geographic role, Tableau will compute the latitude and longitude from its online databases and visualize in an electronic map. Tableau is also able to pull in census data from various sources to display as well.

Details on what map data is available can be found on a page at the Tableau website that describes their map data partners. Tableau also supports Web Map Services, if Tableau's online map data is insufficient.

This last dashboard shows the number of players born in each country, in filled map format, for players who were active in the MLB from the 1985 through 2012 season. You can drill down and get the details of the specific players in each country by hovering over, or selecting a specific country so that the tool tip is displayed. Once the tool tip is displayed you can then click on the icon that resembles a table to get the underlying rows that sum to give the count of the players from a selected country.

Controls in the upper left of the map allow zooming in/out, zooming by selecting an area, and returning to the home view.

Learn About Tableau