Create stem and leaf plots from your data. Visualize distributions, identify patterns, and find statistics like median, mode, and range.
A stem and leaf plot (also called a stemplot or stem-and-leaf diagram) is a method for organizing and displaying numerical data. It works by splitting each data value into two parts: a "stem" (the leading digit or digits) and a "leaf" (the trailing digit). The stems are listed vertically in ascending order, and the leaves are written horizontally next to their corresponding stems.
This visualization technique was popularized by statistician John Tukey in the 1970s as part of exploratory data analysis. Unlike histograms, which group data into bins and lose individual values, stem and leaf plots retain the original data while still showing the shape of the distribution.
For example, the number 73 would be split into a stem of 7 and a leaf of 3. When you have multiple values sharing the same stem, their leaves appear together on the same row, making it easy to see how data clusters around certain values.
Creating a stem and leaf plot involves several straightforward steps:
Before building the plot, arrange your data in ascending order. This makes it easier to place leaves correctly and ensures the final plot is properly organized.
Determine how to split your numbers into stems and leaves. For two-digit numbers, the tens digit typically becomes the stem. For three-digit numbers, you might use the hundreds digit or the first two digits as the stem.
List all possible stems from the minimum to maximum in your data, even if some stems have no corresponding leaves. Write them vertically with a vertical line to the right.
For each data point, write the leaf digit next to its corresponding stem. Keep leaves in order from smallest to largest within each row.
Always include a key that shows how to interpret the plot. For example, "7 | 3 = 73" tells readers that a stem of 7 and a leaf of 3 represents the value 73.
Once you've created a stem and leaf plot, you can extract valuable information about your data:
Every leaf represents an actual data point. To read a value, combine the stem with the leaf. A row showing "4 | 2 5 7 9" represents the values 42, 45, 47, and 49.
Look at the overall pattern of leaves. A symmetric distribution will have similar numbers of leaves on both ends. A skewed distribution will have more leaves concentrated on one side.
The range is the difference between the largest and smallest values. Find the first stem with leaves (minimum) and the last stem with leaves (maximum).
Since the data is already sorted, count to the middle value(s) to find the median. For an odd number of data points, it's the middle value. For an even number, average the two middle values.
The mode is the most frequently occurring value. Look for leaves that repeat within the same stem row, or compare leaf counts across rows.
Outliers appear as isolated stems with few leaves, separated from the main cluster of data. These extreme values stand out visually from the rest of the distribution.
Stem and leaf plots offer several benefits over other visualization methods:
Unlike histograms or box plots, stem and leaf plots retain every individual data value. You can reconstruct the entire dataset from the plot, which is impossible with many other summaries.
For small to medium datasets, you can create a stem and leaf plot by hand in just a few minutes. No special software or precise measurements are needed.
The visual pattern of leaves immediately shows whether data is symmetric, skewed left, skewed right, or multimodal. You can see clusters, gaps, and spread at a glance.
Back-to-back stem and leaf plots allow direct comparison of two datasets. The shared stem column sits in the middle, with leaves extending left for one group and right for the other.
You can find the minimum, maximum, median, mode, range, and quartiles directly from the plot without additional calculations.
While useful, stem and leaf plots have some drawbacks:
When you have hundreds or thousands of data points, stem and leaf plots become unwieldy. The rows become too long to read easily, and the plot loses its visual clarity.
Stem and leaf plots work best with whole numbers or numbers with one decimal place. Data with many decimal places requires rounding, which loses precision.
For numbers that span multiple orders of magnitude, deciding on the stem unit isn't always obvious. Different choices lead to different visual presentations of the same data.
Very wide ranges of data create many stem rows, some of which may be empty. This can make the plot take up more space than a histogram covering the same data.
Several variations of the basic stem and leaf plot exist:
When too many leaves cluster on one stem, you can split it into two rows. For example, a stem of 5 might have two rows: one for leaves 0-4 and another for leaves 5-9. This creates more detail in the plot.
To compare two related datasets, create a back-to-back plot. The stems appear in a central column, with leaves for one dataset extending left and leaves for the other extending right.
For data with decimals, you can truncate rather than round. A value of 3.78 would have a stem of 3 and a leaf of 7, ignoring the 8. This is faster but less precise than rounding.
Some practitioners use two-digit leaves for greater precision. The value 347 might have a stem of 3 and a leaf of 47. The key must clearly explain this format.
Stem and leaf plots appear in many real-world contexts:
Teachers use stem and leaf plots to display test scores and grades. Students can quickly see how the class performed and where their own score falls in the distribution.
Manufacturers analyze measurements from production processes. Stem and leaf plots reveal whether output is centered on target values and how much variation exists.
Researchers visualize experimental results before conducting formal analysis. The plots help identify unusual observations that might indicate measurement errors.
Analysts display player performance metrics like batting averages, completion percentages, or race times. Comparisons between players or seasons are straightforward.
Companies examine sales figures, customer counts, or response times. The plots quickly show typical values and unusual outliers that might warrant investigation.
Understanding when to use stem and leaf plots versus alternatives helps you choose the right tool:
| Display | Best for | Limitations |
|---|---|---|
| Stem and leaf | Small datasets (10-50 values), preserving exact values | Unwieldy for large data |
| Histogram | Large datasets, continuous data | Loses individual values |
| Box plot | Comparing groups, showing quartiles | Hides distribution shape details |
| Dot plot | Very small datasets, discrete data | Cluttered with many points |
Stem and leaf plots fill a niche between showing raw data and summarizing it. They're ideal when you want both the distribution shape and access to individual values.
Follow these guidelines for the best results:
The stem unit should produce between 5 and 20 rows for most datasets. Too few rows hide patterns; too many create sparse, hard-to-read plots.
Leaves should always be single digits (0-9). If your data requires two-digit leaves, reconsider your stem unit or use split stems.
Always arrange leaves in ascending order within each row. This makes finding specific values and calculating statistics much easier.
If some stems have no leaves, include them anyway. Gaps in the stem sequence can hide important information about the distribution.
Every stem and leaf plot needs a key showing how to read values. Also include a title describing what the data represents.
You can derive many descriptive statistics directly from a stem and leaf plot:
Simply count all the leaves. Each leaf represents one data point, so the total number of leaves equals the sample size.
The minimum is the first leaf on the lowest stem. The maximum is the last leaf on the highest stem.
Subtract the minimum from the maximum. You can read both values directly from the plot.
Count to the middle position. For n data points, the median is at position (n+1)/2. If n is even, average the two middle values.
The first quartile (Q1) is the median of the lower half of data. The third quartile (Q3) is the median of the upper half. These positions are straightforward to find in the sorted leaves.
Look for the most frequently occurring leaf value on each stem. The mode is the value that appears most often across the entire plot.
Watch out for these errors when creating stem and leaf plots:
Without a key, readers can't interpret the plot correctly. Always specify what the stems and leaves represent.
If you split stems, apply the same split to all stems. Mixing split and unsplit stems creates confusion.
Leaves should align vertically so you can compare counts across rows. Use spaces to keep leaves evenly spaced.
Including gaps where no data exists is important for showing the true distribution. Empty stems reveal low-density regions.
Stems must go from smallest to largest, and leaves within each row must also be ordered. Random ordering defeats the purpose of the plot.
Stem and leaf plots provide a simple yet powerful way to organize and visualize numerical data. They preserve individual values while revealing the overall distribution, making them invaluable for exploratory data analysis. Though they work best with smaller datasets, their ease of construction and interpretation ensures they remain a fundamental tool in statistics education and practical data analysis.
When working with quantitative data, consider whether a stem and leaf plot might offer insights that other visualizations would miss. The combination of exact values and distributional overview makes this technique uniquely valuable for understanding your data.