[Matplotlib-devel] On testing with other FreeType versions

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

[Matplotlib-devel] On testing with other FreeType versions

Elliott Sales de Andrade
Hi all,

Downstream in Fedora (and maybe Debian), they are running into issues with testing and text. Fedora 26 has FreeType 2.7.1 and Fedora 27 & Rawhide has FreeType 2.8. Fedora 25 uses 2.6.5, but it will be EOL in the next week. Many other distros are also transitioning to these newer FreeType as well [1] and I think anaconda recently added 2.8 too.

With 2.7.1, a few tests fail (rms < 1) and it is straightforward to patch that [2]. With 2.8 though, over 800 tests fail [3] ranging up to ~80 rms [4]. This is a bit harder to paper over.

I see a few ways to mitigate the problem, with varying advantages/disadvantages:

1. Bundle the older version in the Matplotlib package like we do with tests. I don't really believe this to be a viable option for downstream, but I'm just mentioning it to be thorough. There are already a few (minor) security issues in the one we test against.
2. Inject older FreeType just to run tests on the package. Again I don't like this idea. The point of running tests is to be sure that the version in a distro works *in that distro*. Testing with something a user could never install seems useless.
3. Re-create all our current and future test images with 2.8. While this is most future-proof, adding over 800 images is going to bloat the repo quite a bit.
4. Create some sort of side repo with test images for other FreeType releases. This would reduce bloat in the main repo but be somewhat more work. Thus I'd only suggest doing so for tags.

I dislike the first two options as they would be repetitive across distros (unless they just stopped testing altogether), but the last two are not without work for us.

Opinions? Alternative ideas?


--
Elliott

_______________________________________________
Matplotlib-devel mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/matplotlib-devel
Reply | Threaded
Open this post in threaded view
|

Re: On testing with other FreeType versions

Antony Lee
Hi,

1. Sticking to testing with the old FreeType.

Injecting an older FreeType version is "relatively" easy to do (the question is whether you want to do it...).  Naively, one could just set LD_PRELOAD to /path/to/libfreetype.so, but that will also affect subprocesses such as imagemagick, which (IIRC) don't like that, so instead the correct way is to ensure that the Python process calls `dlopen('/path/to/libfreetype.so', RTLD_GLOBAL)` which forces symbol resolution *in this process* to first check the given path, but does not affect subprocess (alternatively, one could remove LD_PRELOAD from the environment before calling the subprocess but that seems messier).  Fortunately, dlopen is "effectively" available under the name of `ctypes.CDLL` in Python.

I have a proof of principle somewhere that patches the testing framework to 1) ensure that an old freetype is built (basically moving the local_freetype implemenation from setupext to the main lib), and 2) loads it as above.

Another relevant issue is the manylinux wheels, which must somehow embed a libfreetype.  Currently I believe this is done via static linking.  This is not so great if you also want to load freetype for other reasons; for example mplcairo (which loads freetype via cairo) currently cannot work with local_freetype builds due to symbol conflicts.  I believe that switching to the standard manylinux approach (which is to include the shared object in a hidden folder and set RPATH appropriately) would work better (and allow us to strip out the static linking code).

2. Switching to newer FreeTypes.

I don't think committing all test images to the main repo is really a viable option: FreeType is also making new releases every once in a while and different Linuxes have different versions (https://pkgs.org/download/freetype gives 2.8.1 (Arch, Debian Sid), 2.8 (Fedora 27, Ubuntu 17.10), 2.6.3 (Debian 9, OpenSUSE 42.3), 2.6.1 (Ubuntu 16.04 LTS) and that's only a few).

I do believe that adding tooling that generates the test images to a side repo for each tag + FreeType version (say, using the FT versions of the major distros at the time of the tag) may be reasonable.

3. Side note.

If #9763 (or #5414) gets accepted (new FT wrappers), they will also require a new generation of the test images: ft2font currently generates "wiggly baselines" in certain cases (see example in #5414), and try as I might (i.e. not so much) I could not reproduce them in the new wrapper :-)

Antony

2017-12-10 21:44 GMT-08:00 Elliott Sales de Andrade <[hidden email]>:
Hi all,

Downstream in Fedora (and maybe Debian), they are running into issues with testing and text. Fedora 26 has FreeType 2.7.1 and Fedora 27 & Rawhide has FreeType 2.8. Fedora 25 uses 2.6.5, but it will be EOL in the next week. Many other distros are also transitioning to these newer FreeType as well [1] and I think anaconda recently added 2.8 too.

With 2.7.1, a few tests fail (rms < 1) and it is straightforward to patch that [2]. With 2.8 though, over 800 tests fail [3] ranging up to ~80 rms [4]. This is a bit harder to paper over.

I see a few ways to mitigate the problem, with varying advantages/disadvantages:

1. Bundle the older version in the Matplotlib package like we do with tests. I don't really believe this to be a viable option for downstream, but I'm just mentioning it to be thorough. There are already a few (minor) security issues in the one we test against.
2. Inject older FreeType just to run tests on the package. Again I don't like this idea. The point of running tests is to be sure that the version in a distro works *in that distro*. Testing with something a user could never install seems useless.
3. Re-create all our current and future test images with 2.8. While this is most future-proof, adding over 800 images is going to bloat the repo quite a bit.
4. Create some sort of side repo with test images for other FreeType releases. This would reduce bloat in the main repo but be somewhat more work. Thus I'd only suggest doing so for tags.

I dislike the first two options as they would be repetitive across distros (unless they just stopped testing altogether), but the last two are not without work for us.

Opinions? Alternative ideas?


--
Elliott

_______________________________________________
Matplotlib-devel mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/matplotlib-devel



_______________________________________________
Matplotlib-devel mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/matplotlib-devel
Reply | Threaded
Open this post in threaded view
|

Re: On testing with other FreeType versions

tcaswell
There may also be an interesting machine learning problem here to use a more intelligent criteria for determining if an image has failed.

By changing the freetype version we have a bunch of images that do fail pixel comparison that should not and by slightly modifying tests or shuffling the test <-> result image  mapping we can generate as many do fail and should fail cases as we need.

Tom

On Mon, Dec 11, 2017 at 1:40 AM Antony Lee <[hidden email]> wrote:
Hi,

1. Sticking to testing with the old FreeType.

Injecting an older FreeType version is "relatively" easy to do (the question is whether you want to do it...).  Naively, one could just set LD_PRELOAD to /path/to/libfreetype.so, but that will also affect subprocesses such as imagemagick, which (IIRC) don't like that, so instead the correct way is to ensure that the Python process calls `dlopen('/path/to/libfreetype.so', RTLD_GLOBAL)` which forces symbol resolution *in this process* to first check the given path, but does not affect subprocess (alternatively, one could remove LD_PRELOAD from the environment before calling the subprocess but that seems messier).  Fortunately, dlopen is "effectively" available under the name of `ctypes.CDLL` in Python.

I have a proof of principle somewhere that patches the testing framework to 1) ensure that an old freetype is built (basically moving the local_freetype implemenation from setupext to the main lib), and 2) loads it as above.

Another relevant issue is the manylinux wheels, which must somehow embed a libfreetype.  Currently I believe this is done via static linking.  This is not so great if you also want to load freetype for other reasons; for example mplcairo (which loads freetype via cairo) currently cannot work with local_freetype builds due to symbol conflicts.  I believe that switching to the standard manylinux approach (which is to include the shared object in a hidden folder and set RPATH appropriately) would work better (and allow us to strip out the static linking code).

2. Switching to newer FreeTypes.

I don't think committing all test images to the main repo is really a viable option: FreeType is also making new releases every once in a while and different Linuxes have different versions (https://pkgs.org/download/freetype gives 2.8.1 (Arch, Debian Sid), 2.8 (Fedora 27, Ubuntu 17.10), 2.6.3 (Debian 9, OpenSUSE 42.3), 2.6.1 (Ubuntu 16.04 LTS) and that's only a few).

I do believe that adding tooling that generates the test images to a side repo for each tag + FreeType version (say, using the FT versions of the major distros at the time of the tag) may be reasonable.

3. Side note.

If #9763 (or #5414) gets accepted (new FT wrappers), they will also require a new generation of the test images: ft2font currently generates "wiggly baselines" in certain cases (see example in #5414), and try as I might (i.e. not so much) I could not reproduce them in the new wrapper :-)

Antony

2017-12-10 21:44 GMT-08:00 Elliott Sales de Andrade <[hidden email]>:
Hi all,

Downstream in Fedora (and maybe Debian), they are running into issues with testing and text. Fedora 26 has FreeType 2.7.1 and Fedora 27 & Rawhide has FreeType 2.8. Fedora 25 uses 2.6.5, but it will be EOL in the next week. Many other distros are also transitioning to these newer FreeType as well [1] and I think anaconda recently added 2.8 too.

With 2.7.1, a few tests fail (rms < 1) and it is straightforward to patch that [2]. With 2.8 though, over 800 tests fail [3] ranging up to ~80 rms [4]. This is a bit harder to paper over.

I see a few ways to mitigate the problem, with varying advantages/disadvantages:

1. Bundle the older version in the Matplotlib package like we do with tests. I don't really believe this to be a viable option for downstream, but I'm just mentioning it to be thorough. There are already a few (minor) security issues in the one we test against.
2. Inject older FreeType just to run tests on the package. Again I don't like this idea. The point of running tests is to be sure that the version in a distro works *in that distro*. Testing with something a user could never install seems useless.
3. Re-create all our current and future test images with 2.8. While this is most future-proof, adding over 800 images is going to bloat the repo quite a bit.
4. Create some sort of side repo with test images for other FreeType releases. This would reduce bloat in the main repo but be somewhat more work. Thus I'd only suggest doing so for tags.

I dislike the first two options as they would be repetitive across distros (unless they just stopped testing altogether), but the last two are not without work for us.

Opinions? Alternative ideas?


--
Elliott

_______________________________________________
Matplotlib-devel mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/matplotlib-devel


_______________________________________________
Matplotlib-devel mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/matplotlib-devel

_______________________________________________
Matplotlib-devel mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/matplotlib-devel
Reply | Threaded
Open this post in threaded view
|

Re: On testing with other FreeType versions

Sandro Tosi-4
Debian is also interested in this duscyssuib, we have a local patch to
increase the RMS of several tests, but i agree with Elliot the number
of failing tests is getting bigger and bigger and just bumping the
threshold is not always the best way (as we risk to make the testsuite
pass just to not feel bad, without actually spotting real issues).

I remember a similar conversation in the past, and the idea of
providing multiple sets of reference images, built with different
freetype versions, and to release them as an additional tarball the
downstream distribution can download and bundle up with the python
code release (remember debian dont want to download stuff during the
build, which is where we run the test suite).

i also like a lot Thomas' idea of having AI/ML actually inspect the
image and say if they are alike "enough" for the test to pass instead
of a pixel-by-pixel comparison, but it may be a long time effort (GSOC
maybe?) and we should also keep an eye on how long the test suite run
time it will be (mpl is already long enough to build as is lol)

On Mon, Dec 11, 2017 at 11:32 AM, Thomas Caswell <[hidden email]> wrote:

> There may also be an interesting machine learning problem here to use a more
> intelligent criteria for determining if an image has failed.
>
> By changing the freetype version we have a bunch of images that do fail
> pixel comparison that should not and by slightly modifying tests or
> shuffling the test <-> result image  mapping we can generate as many do fail
> and should fail cases as we need.
>
> Tom
>
> On Mon, Dec 11, 2017 at 1:40 AM Antony Lee <[hidden email]> wrote:
>>
>> Hi,
>>
>> 1. Sticking to testing with the old FreeType.
>>
>> Injecting an older FreeType version is "relatively" easy to do (the
>> question is whether you want to do it...).  Naively, one could just set
>> LD_PRELOAD to /path/to/libfreetype.so, but that will also affect
>> subprocesses such as imagemagick, which (IIRC) don't like that, so instead
>> the correct way is to ensure that the Python process calls
>> `dlopen('/path/to/libfreetype.so', RTLD_GLOBAL)` which forces symbol
>> resolution *in this process* to first check the given path, but does not
>> affect subprocess (alternatively, one could remove LD_PRELOAD from the
>> environment before calling the subprocess but that seems messier).
>> Fortunately, dlopen is "effectively" available under the name of
>> `ctypes.CDLL` in Python.
>>
>> I have a proof of principle somewhere that patches the testing framework
>> to 1) ensure that an old freetype is built (basically moving the
>> local_freetype implemenation from setupext to the main lib), and 2) loads it
>> as above.
>>
>> Another relevant issue is the manylinux wheels, which must somehow embed a
>> libfreetype.  Currently I believe this is done via static linking.  This is
>> not so great if you also want to load freetype for other reasons; for
>> example mplcairo (which loads freetype via cairo) currently cannot work with
>> local_freetype builds due to symbol conflicts.  I believe that switching to
>> the standard manylinux approach (which is to include the shared object in a
>> hidden folder and set RPATH appropriately) would work better (and allow us
>> to strip out the static linking code).
>>
>> 2. Switching to newer FreeTypes.
>>
>> I don't think committing all test images to the main repo is really a
>> viable option: FreeType is also making new releases every once in a while
>> and different Linuxes have different versions
>> (https://pkgs.org/download/freetype gives 2.8.1 (Arch, Debian Sid), 2.8
>> (Fedora 27, Ubuntu 17.10), 2.6.3 (Debian 9, OpenSUSE 42.3), 2.6.1 (Ubuntu
>> 16.04 LTS) and that's only a few).
>>
>> I do believe that adding tooling that generates the test images to a side
>> repo for each tag + FreeType version (say, using the FT versions of the
>> major distros at the time of the tag) may be reasonable.
>>
>> 3. Side note.
>>
>> If #9763 (or #5414) gets accepted (new FT wrappers), they will also
>> require a new generation of the test images: ft2font currently generates
>> "wiggly baselines" in certain cases (see example in #5414), and try as I
>> might (i.e. not so much) I could not reproduce them in the new wrapper :-)
>>
>> Antony
>>
>> 2017-12-10 21:44 GMT-08:00 Elliott Sales de Andrade
>> <[hidden email]>:
>>>
>>> Hi all,
>>>
>>> Downstream in Fedora (and maybe Debian), they are running into issues
>>> with testing and text. Fedora 26 has FreeType 2.7.1 and Fedora 27 & Rawhide
>>> has FreeType 2.8. Fedora 25 uses 2.6.5, but it will be EOL in the next week.
>>> Many other distros are also transitioning to these newer FreeType as well
>>> [1] and I think anaconda recently added 2.8 too.
>>>
>>> With 2.7.1, a few tests fail (rms < 1) and it is straightforward to patch
>>> that [2]. With 2.8 though, over 800 tests fail [3] ranging up to ~80 rms
>>> [4]. This is a bit harder to paper over.
>>>
>>> I see a few ways to mitigate the problem, with varying
>>> advantages/disadvantages:
>>>
>>> 1. Bundle the older version in the Matplotlib package like we do with
>>> tests. I don't really believe this to be a viable option for downstream, but
>>> I'm just mentioning it to be thorough. There are already a few (minor)
>>> security issues in the one we test against.
>>> 2. Inject older FreeType just to run tests on the package. Again I don't
>>> like this idea. The point of running tests is to be sure that the version in
>>> a distro works *in that distro*. Testing with something a user could never
>>> install seems useless.
>>> 3. Re-create all our current and future test images with 2.8. While this
>>> is most future-proof, adding over 800 images is going to bloat the repo
>>> quite a bit.
>>> 4. Create some sort of side repo with test images for other FreeType
>>> releases. This would reduce bloat in the main repo but be somewhat more
>>> work. Thus I'd only suggest doing so for tags.
>>>
>>> I dislike the first two options as they would be repetitive across
>>> distros (unless they just stopped testing altogether), but the last two are
>>> not without work for us.
>>>
>>> Opinions? Alternative ideas?
>>>
>>> [1] https://repology.org/metapackage/freetype/versions
>>> [2]
>>> https://github.com/QuLogic/matplotlib/commit/cfdc835923407810bd087f60332cdc8cdcb23f05
>>> [3]
>>> https://kojipkgs.fedoraproject.org//work/tasks/3137/23623137/build.log
>>> [4] https://gist.github.com/QuLogic/477055a847a44cd444a0932432acffd1
>>>
>>> --
>>> Elliott
>>>
>>> _______________________________________________
>>> Matplotlib-devel mailing list
>>> [hidden email]
>>> https://mail.python.org/mailman/listinfo/matplotlib-devel
>>>
>>
>> _______________________________________________
>> Matplotlib-devel mailing list
>> [hidden email]
>> https://mail.python.org/mailman/listinfo/matplotlib-devel
>
>
> _______________________________________________
> Matplotlib-devel mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/matplotlib-devel
>



--
Sandro "morph" Tosi
My website: http://sandrotosi.me/
Me at Debian: http://wiki.debian.org/SandroTosi
G+: https://plus.google.com/u/0/+SandroTosi
_______________________________________________
Matplotlib-devel mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/matplotlib-devel
Reply | Threaded
Open this post in threaded view
|

Re: On testing with other FreeType versions

Jody Klymak
In reply to this post by Elliott Sales de Andrade

I see a few ways to mitigate the problem, with varying advantages/disadvantages:

1. Bundle the older version in the Matplotlib package like we do with tests. I don't really believe this to be a viable option for downstream, but I'm just mentioning it to be thorough. There are already a few (minor) security issues in the one we test against.
2. Inject older FreeType just to run tests on the package. Again I don't like this idea. The point of running tests is to be sure that the version in a distro works *in that distro*. Testing with something a user could never install seems useless.
3. Re-create all our current and future test images with 2.8. While this is most future-proof, adding over 800 images is going to bloat the repo quite a bit.
4. Create some sort of side repo with test images for other FreeType releases. This would reduce bloat in the main repo but be somewhat more work. Thus I'd only suggest doing so for tags.

I dislike the first two options as they would be repetitive across distros (unless they just stopped testing altogether), but the last two are not without work for us.

I guess something like 4 makes sense to me.  You guys have more experience than I do, but… it seems testing *most* of the repo with a fixed FreeType would be fine.  The point of most of the tests is to catch Matplotlib bugs, and the exact font being rendered doesn’t matter too much.  Then there could be a much smaller separate repo for the tests that depend on the font being rendered, and to test that downstream distributions work.  

Cheers,   Jody



_______________________________________________
Matplotlib-devel mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/matplotlib-devel

--
Jody Klymak    






_______________________________________________
Matplotlib-devel mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/matplotlib-devel
Reply | Threaded
Open this post in threaded view
|

Re: On testing with other FreeType versions

Elliott Sales de Andrade
In reply to this post by Elliott Sales de Andrade
On 11 December 2017 at 00:44, Elliott Sales de Andrade <[hidden email]> wrote:
Hi all,

Downstream in Fedora (and maybe Debian), they are running into issues with testing and text. Fedora 26 has FreeType 2.7.1 and Fedora 27 & Rawhide has FreeType 2.8. Fedora 25 uses 2.6.5, but it will be EOL in the next week. Many other distros are also transitioning to these newer FreeType as well [1] and I think anaconda recently added 2.8 too.

With 2.7.1, a few tests fail (rms < 1) and it is straightforward to patch that [2]. With 2.8 though, over 800 tests fail [3] ranging up to ~80 rms [4]. This is a bit harder to paper over.

I see a few ways to mitigate the problem, with varying advantages/disadvantages:

1. Bundle the older version in the Matplotlib package like we do with tests. I don't really believe this to be a viable option for downstream, but I'm just mentioning it to be thorough. There are already a few (minor) security issues in the one we test against.
2. Inject older FreeType just to run tests on the package. Again I don't like this idea. The point of running tests is to be sure that the version in a distro works *in that distro*. Testing with something a user could never install seems useless.
3. Re-create all our current and future test images with 2.8. While this is most future-proof, adding over 800 images is going to bloat the repo quite a bit.
4. Create some sort of side repo with test images for other FreeType releases. This would reduce bloat in the main repo but be somewhat more work. Thus I'd only suggest doing so for tags.


One more thing I forgot to mention is ImageHash [5] which is used by Iris for image tests using the following strategy [6]:
  • using a perceptual 'image hash' of the outputs as the basis for checking test results.
  • storing the hashes of 'known accepted results' for each test in a database in the repo
  • storing associated reference images for each hash value in a separate public repository, allowing human-eye judgement of 'valid equivalent' results.
  • a new version of the 'iris/tests/idiff.py' assists in comparing proposed new 'correct' result images with the existing accepted ones.
While this does reduce load in the main repo itself, it does increase the cognitive load for developers. Iris has a small core group of developers and much fewer drive-by contributions compared to Matplotlib, so I'm not sure we want to be doing this full idea. (Note also their repo is LGPL3, so please don't copy anything from there.) Using ImageHash might still be useful instead of RMS, though it may generalize things too much.

I dislike the first two options as they would be repetitive across distros (unless they just stopped testing altogether), but the last two are not without work for us.

Opinions? Alternative ideas?


--
Elliott

_______________________________________________
Matplotlib-devel mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/matplotlib-devel
Reply | Threaded
Open this post in threaded view
|

Re: On testing with other FreeType versions

Jody Klymak
Another idea might be to make most text a unique color that then could be not included in some image diffs.  Sure there may be the occasional image that uses that colour but a few missed pixels won’t matter too much. 

Sent from my iPhone

On Dec 11, 2017, at 6:20 PM, Elliott Sales de Andrade <[hidden email]> wrote:

On 11 December 2017 at 00:44, Elliott Sales de Andrade <[hidden email]> wrote:
Hi all,

Downstream in Fedora (and maybe Debian), they are running into issues with testing and text. Fedora 26 has FreeType 2.7.1 and Fedora 27 & Rawhide has FreeType 2.8. Fedora 25 uses 2.6.5, but it will be EOL in the next week. Many other distros are also transitioning to these newer FreeType as well [1] and I think anaconda recently added 2.8 too.

With 2.7.1, a few tests fail (rms < 1) and it is straightforward to patch that [2]. With 2.8 though, over 800 tests fail [3] ranging up to ~80 rms [4]. This is a bit harder to paper over.

I see a few ways to mitigate the problem, with varying advantages/disadvantages:

1. Bundle the older version in the Matplotlib package like we do with tests. I don't really believe this to be a viable option for downstream, but I'm just mentioning it to be thorough. There are already a few (minor) security issues in the one we test against.
2. Inject older FreeType just to run tests on the package. Again I don't like this idea. The point of running tests is to be sure that the version in a distro works *in that distro*. Testing with something a user could never install seems useless.
3. Re-create all our current and future test images with 2.8. While this is most future-proof, adding over 800 images is going to bloat the repo quite a bit.
4. Create some sort of side repo with test images for other FreeType releases. This would reduce bloat in the main repo but be somewhat more work. Thus I'd only suggest doing so for tags.


One more thing I forgot to mention is ImageHash [5] which is used by Iris for image tests using the following strategy [6]:
  • using a perceptual 'image hash' of the outputs as the basis for checking test results.
  • storing the hashes of 'known accepted results' for each test in a database in the repo
  • storing associated reference images for each hash value in a separate public repository, allowing human-eye judgement of 'valid equivalent' results.
  • a new version of the 'iris/tests/idiff.py' assists in comparing proposed new 'correct' result images with the existing accepted ones.
While this does reduce load in the main repo itself, it does increase the cognitive load for developers. Iris has a small core group of developers and much fewer drive-by contributions compared to Matplotlib, so I'm not sure we want to be doing this full idea. (Note also their repo is LGPL3, so please don't copy anything from there.) Using ImageHash might still be useful instead of RMS, though it may generalize things too much.

I dislike the first two options as they would be repetitive across distros (unless they just stopped testing altogether), but the last two are not without work for us.

Opinions? Alternative ideas?


--
Elliott
_______________________________________________
Matplotlib-devel mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/matplotlib-devel

_______________________________________________
Matplotlib-devel mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/matplotlib-devel