It was drawn to my attention that the baseline_images with the expected images can gobble up a lot of repository space, so it is best to squash commits to avoid submitting multiple changes to the baseline_images.
Looking at all the tests that use image_comparison, almost all of them go with the default value of "tol = 0", in other words anything but an exact match generates an error.
If we only cared about spotting changes then for these tests we could just store some sort of checksum on the image data and use that to spot when the image has changed. Updating a checkum only adds a few bytes to the repository.
The down side of this approach (and it is a big downside) is that when an image changes you would no longer have the expected image and the diff available to look at.
The only solution I can think to that is to store the expected images somewhere else, with the ability to retrieve an image given the checksum, thus keeping them out of the main repository.