Lately, I’ve been using Python’s matplotlib plotting library to generate a lot of figures, such as, for instance, the bar charts I showed in this talk. Show To improve readability, I like to put a number label at the top of each bar that gives the quantity that that bar represents. When I realized I wanted to add these labels to my charts, the first thing I did was look at this example from the matplotlib documentation, which seemed to be doing something a lot like what I wanted: In the code that generates this figure, this little
The
But what if we try to use the same code with some different data?
Here’s what the plot looks like now: Oh, dear. Now we’ve got ‘1300’, and to a lesser extent ‘1250’ and ‘1145’, just hanging out up there in space. Meanwhile, ‘15’ and ‘10’ are crowding the columns that they’re supposed to be above. How did that happen? Looking again at
1 to determine where to put the text label that goes with a given rectangle of height
2. So, If
2 varies more than a little from bar to bar, then multiplying that small number and
2 will produce gaps of awkwardly varying size. It’s only the fact that the bar heights in the original example only vary from 20 to 35 that stop it from looking terrible. In fact, now having realized that the gap sizes depend on the data, we can see it there, too: the gap between ‘20’ and the top of its column is noticeably smaller than the gap just below ‘35’. That’s no good. First attempt at a fix: add, don’t multiplyOne way to fix this would be to add a suitable number to the column height, instead of multiplying, and use the result to determine where to put the label text. That is, instead of writing
1, we can write
7, or something like that. Indeed, people who answer questions about such things on Stack Overflow have already arrived at this solution. Using
7 in our own code, we get: Alas, this approach isn’t robust, either. This is what it looks like when we try to go back and plot our original data: Oh, no! Now our gaps, although all the same size, are way too big. Most of the labels are actually off the chart. In order to get this right, we’d have to change
7 to something smaller that would look nice with this data, like
0: That’s better. But having to pick a different number for every figure we plan to generate sounds about as fun as cleaning up my cat’s puke. Is there an approach to label placement that will work regardless of what the data looks like? A more robust fix: scale according to the height of the axisWhy does adding a constant like 10 to
2 not work the way we want it to? The problem is that
2 is not in units of centimeters, or furlongs, or any unit of distance that would be consistent from one figure to the next; it’s in “axis points”, the same units as the actual data being plotted! For instance, the bar furthest to the left in the plot of our first set of data is 20 axis points tall, while the leftmost bar in the plot of our second set of data is 860 axis points tall. A gap of height 10 next to a column of height 20 is different from a gap of height 10 next to a column of height 860. What we really want is to scale the height of the label gaps to whatever is reasonable for our figure. The trick to doing this is to look at the height – given in axis points – of the y-axis of the plot. For instance, with our first set of data, the range of the y-axis is
3, so it has a height of 40 axis points, while in the second set, the y-axis range is
4, so, a height of 1400 axis points. If we can find out what the height of the y-axis is in axis points, we can have the label gaps be a fixed fraction of that height. We still have to decide what that fraction will be – but we only have to do that once, and then we’ll have proportionally-sized gaps in every figure we generate. To do this, we can call matplotlib’s method on an
6 object to get the y-axis range. In the case of our example code, we even already have an
6 object, called
8, which we can pass to Here’s what the revised code looks like, where I’ve chosen 0.01 as the number to multiply the axis height by.
Choosing 0.01 will give us gaps of 0.4 axis points for our first set of data, and 14 axis points for our second set. Here’s what it looks like when we plot the first data set: And the second: Much better! One more thingThere’s also one last refinement that I made for my own plotting. As we saw above, sometimes bar labels run over the top edge of the figure, and it can happen even if we’re using our axis-height-based approach. For example, if we change ‘1300’ to ‘1350’ in the data, the above plot turns into this: Not so nice. But we can have
Now our plot looks like this: And that’s it! It’s also possible to have the labels go inside the bars by default, except in cases where the bars are too short to accommodate them, and that’s an easy change to the above code, left as an exercise to the reader. Another, more challenging exercise for the reader is dealing with the situation where the y-axis uses a log scale. I have a hacky solution for this, but it’s not very robust. A version of How do you add labels to bars in Python?Adding value labels on a matplotlib bar chart. Make a list of years.. Make a list of populations in that year.. Get the number of labels using np. ... . Set the width of the bars.. Create fig and ax variables using subplots() method, where default nrows and ncols are 1.. Set the Y-axis label of the figure using set_ylabel().. How do I add labels to a bar chart?Add data labelsClick the chart, and then click the Chart Design tab. Click Add Chart Element and select Data Labels, and then select a location for the data label option. How do I add text to the top of a bar in Matplotlib?Create a figure and a set of subplots using subplots() method. Set ylabels, title, xtickas and xticklabels. Plot the bars using bar() method with x, population and width data. Iterate the bar patches and place text at the top of the bars using text() method. |