What is a stem and leaf plot?

A stem and leaf plot (also called a stemplot or stem-and-leaf diagram) is a method for organizing and displaying numerical data. It works by splitting each data value into two parts: a "stem" (the leading digit or digits) and a "leaf" (the trailing digit). The stems are listed vertically in ascending order, and the leaves are written horizontally next to their corresponding stems.

This visualization technique was popularized by statistician John Tukey in the 1970s as part of exploratory data analysis. Unlike histograms, which group data into bins and lose individual values, stem and leaf plots retain the original data while still showing the shape of the distribution.

For example, the number 73 would be split into a stem of 7 and a leaf of 3. When you have multiple values sharing the same stem, their leaves appear together on the same row, making it easy to see how data clusters around certain values.

How to create a stem and leaf plot

Creating a stem and leaf plot involves several straightforward steps:

Step 1: Sort your data

Before building the plot, arrange your data in ascending order. This makes it easier to place leaves correctly and ensures the final plot is properly organized.

Step 2: Identify the stems

Determine how to split your numbers into stems and leaves. For two-digit numbers, the tens digit typically becomes the stem. For three-digit numbers, you might use the hundreds digit or the first two digits as the stem.

Step 3: Draw the stem column

List all possible stems from the minimum to maximum in your data, even if some stems have no corresponding leaves. Write them vertically with a vertical line to the right.

Step 4: Add the leaves

For each data point, write the leaf digit next to its corresponding stem. Keep leaves in order from smallest to largest within each row.

Step 5: Include a key

Always include a key that shows how to interpret the plot. For example, "7 | 3 = 73" tells readers that a stem of 7 and a leaf of 3 represents the value 73.

Reading and interpreting stem and leaf plots

Once you've created a stem and leaf plot, you can extract valuable information about your data:

Finding individual values

Every leaf represents an actual data point. To read a value, combine the stem with the leaf. A row showing "4 | 2 5 7 9" represents the values 42, 45, 47, and 49.

Identifying the shape

Look at the overall pattern of leaves. A symmetric distribution will have similar numbers of leaves on both ends. A skewed distribution will have more leaves concentrated on one side.

Finding the range

The range is the difference between the largest and smallest values. Find the first stem with leaves (minimum) and the last stem with leaves (maximum).

Locating the median

Since the data is already sorted, count to the middle value(s) to find the median. For an odd number of data points, it's the middle value. For an even number, average the two middle values.

Spotting the mode

The mode is the most frequently occurring value. Look for leaves that repeat within the same stem row, or compare leaf counts across rows.

Detecting outliers

Outliers appear as isolated stems with few leaves, separated from the main cluster of data. These extreme values stand out visually from the rest of the distribution.

Advantages of stem and leaf plots

Stem and leaf plots offer several benefits over other visualization methods:

Preserves original data

Unlike histograms or box plots, stem and leaf plots retain every individual data value. You can reconstruct the entire dataset from the plot, which is impossible with many other summaries.

Quick construction

For small to medium datasets, you can create a stem and leaf plot by hand in just a few minutes. No special software or precise measurements are needed.

Reveals distribution shape

The visual pattern of leaves immediately shows whether data is symmetric, skewed left, skewed right, or multimodal. You can see clusters, gaps, and spread at a glance.

Easy comparison

Back-to-back stem and leaf plots allow direct comparison of two datasets. The shared stem column sits in the middle, with leaves extending left for one group and right for the other.

Shows all statistics

You can find the minimum, maximum, median, mode, range, and quartiles directly from the plot without additional calculations.

Limitations of stem and leaf plots

While useful, stem and leaf plots have some drawbacks:

Not ideal for large datasets

When you have hundreds or thousands of data points, stem and leaf plots become unwieldy. The rows become too long to read easily, and the plot loses its visual clarity.

Limited to certain data types

Stem and leaf plots work best with whole numbers or numbers with one decimal place. Data with many decimal places requires rounding, which loses precision.

Choosing the stem can be tricky

For numbers that span multiple orders of magnitude, deciding on the stem unit isn't always obvious. Different choices lead to different visual presentations of the same data.

Space requirements

Very wide ranges of data create many stem rows, some of which may be empty. This can make the plot take up more space than a histogram covering the same data.

Variations and extensions

Several variations of the basic stem and leaf plot exist:

Split stems

When too many leaves cluster on one stem, you can split it into two rows. For example, a stem of 5 might have two rows: one for leaves 0-4 and another for leaves 5-9. This creates more detail in the plot.

Back-to-back plots

To compare two related datasets, create a back-to-back plot. The stems appear in a central column, with leaves for one dataset extending left and leaves for the other extending right.

Truncated leaves

For data with decimals, you can truncate rather than round. A value of 3.78 would have a stem of 3 and a leaf of 7, ignoring the 8. This is faster but less precise than rounding.

Multiple leaf digits

Some practitioners use two-digit leaves for greater precision. The value 347 might have a stem of 3 and a leaf of 47. The key must clearly explain this format.

Practical applications

Stem and leaf plots appear in many real-world contexts:

Education

Teachers use stem and leaf plots to display test scores and grades. Students can quickly see how the class performed and where their own score falls in the distribution.

Quality control

Manufacturers analyze measurements from production processes. Stem and leaf plots reveal whether output is centered on target values and how much variation exists.

Scientific research

Researchers visualize experimental results before conducting formal analysis. The plots help identify unusual observations that might indicate measurement errors.

Sports statistics

Analysts display player performance metrics like batting averages, completion percentages, or race times. Comparisons between players or seasons are straightforward.

Business analytics

Companies examine sales figures, customer counts, or response times. The plots quickly show typical values and unusual outliers that might warrant investigation.

Comparison with other displays

Understanding when to use stem and leaf plots versus alternatives helps you choose the right tool:

Display	Best for	Limitations
Stem and leaf	Small datasets (10-50 values), preserving exact values	Unwieldy for large data
Histogram	Large datasets, continuous data	Loses individual values
Box plot	Comparing groups, showing quartiles	Hides distribution shape details
Dot plot	Very small datasets, discrete data	Cluttered with many points

Stem and leaf plots fill a niche between showing raw data and summarizing it. They're ideal when you want both the distribution shape and access to individual values.

Tips for creating effective plots

Follow these guidelines for the best results:

Choose appropriate stem units

The stem unit should produce between 5 and 20 rows for most datasets. Too few rows hide patterns; too many create sparse, hard-to-read plots.

Keep leaves single digits

Leaves should always be single digits (0-9). If your data requires two-digit leaves, reconsider your stem unit or use split stems.

Order leaves consistently

Always arrange leaves in ascending order within each row. This makes finding specific values and calculating statistics much easier.

Include empty stems

If some stems have no leaves, include them anyway. Gaps in the stem sequence can hide important information about the distribution.

Add clear labels

Every stem and leaf plot needs a key showing how to read values. Also include a title describing what the data represents.

Calculating statistics from the plot

You can derive many descriptive statistics directly from a stem and leaf plot:

Count

Simply count all the leaves. Each leaf represents one data point, so the total number of leaves equals the sample size.

Minimum and maximum

The minimum is the first leaf on the lowest stem. The maximum is the last leaf on the highest stem.

Range

Subtract the minimum from the maximum. You can read both values directly from the plot.

Median

Count to the middle position. For n data points, the median is at position (n+1)/2. If n is even, average the two middle values.

Quartiles

The first quartile (Q1) is the median of the lower half of data. The third quartile (Q3) is the median of the upper half. These positions are straightforward to find in the sorted leaves.

Mode

Look for the most frequently occurring leaf value on each stem. The mode is the value that appears most often across the entire plot.

Common mistakes to avoid

Watch out for these errors when creating stem and leaf plots:

Forgetting the key

Without a key, readers can't interpret the plot correctly. Always specify what the stems and leaves represent.

Using inconsistent splits

If you split stems, apply the same split to all stems. Mixing split and unsplit stems creates confusion.

Misaligning leaves

Leaves should align vertically so you can compare counts across rows. Use spaces to keep leaves evenly spaced.

Skipping empty stems

Including gaps where no data exists is important for showing the true distribution. Empty stems reveal low-density regions.

Wrong ordering

Stems must go from smallest to largest, and leaves within each row must also be ordered. Random ordering defeats the purpose of the plot.

Conclusion

Stem and leaf plots provide a simple yet powerful way to organize and visualize numerical data. They preserve individual values while revealing the overall distribution, making them invaluable for exploratory data analysis. Though they work best with smaller datasets, their ease of construction and interpretation ensures they remain a fundamental tool in statistics education and practical data analysis.

When working with quantitative data, consider whether a stem and leaf plot might offer insights that other visualizations would miss. The combination of exact values and distributional overview makes this technique uniquely valuable for understanding your data.

Stem and Leaf Plot Generator