Running matplotlib on massively parallel compute resources

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Running matplotlib on massively parallel compute resources

Kevin Buckley
We've recently seen an issue where someone running multiple instances
of jobs on our supercomputer, all of which have a matplotlib component
that thus runs on the compute nodes, rather than as part of any
post-processing on our anciliary services.

Some of these jobs ended up hanging and, in a number of cases, we have
observed that the hanging process is what we belive to be the matplotlib-
spawned

   fc-list --format=%{file}\n

Is there anything, in the way that matplotlib is written, that might
see race conditions, around access to the per-user font cache, or
other matplotlib data, being created?

Furthermore, is there a way that our users could define a per-job font
cache directory, by using the job-ID, and thereby explcitly avoiding
any inter-job interference resulting from their "massively parallel"
matplotlib invocations?

Here's hoping that matplotlib is the cause, and, if so, that there's an
easy solution, when you know how to use matplotlib.
_______________________________________________
Matplotlib-users mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/matplotlib-users
Reply | Threaded
Open this post in threaded view
|

Re: Running matplotlib on massively parallel compute resources

Antony Lee-3
I think setting the MPLCONFIGDIR environment variable to a user-writable directory should be good enough.  There's an open PR (https://github.com/matplotlib/matplotlib/pull/15933) which adds a warning in the case where that's needed.
Antony

On Thu, Jan 16, 2020 at 4:41 AM Kevin Buckley <[hidden email]> wrote:
We've recently seen an issue where someone running multiple instances
of jobs on our supercomputer, all of which have a matplotlib component
that thus runs on the compute nodes, rather than as part of any
post-processing on our anciliary services.

Some of these jobs ended up hanging and, in a number of cases, we have
observed that the hanging process is what we belive to be the matplotlib-
spawned

   fc-list --format=%{file}\n

Is there anything, in the way that matplotlib is written, that might
see race conditions, around access to the per-user font cache, or
other matplotlib data, being created?

Furthermore, is there a way that our users could define a per-job font
cache directory, by using the job-ID, and thereby explcitly avoiding
any inter-job interference resulting from their "massively parallel"
matplotlib invocations?

Here's hoping that matplotlib is the cause, and, if so, that there's an
easy solution, when you know how to use matplotlib.
_______________________________________________
Matplotlib-users mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/matplotlib-users

_______________________________________________
Matplotlib-users mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/matplotlib-users
Reply | Threaded
Open this post in threaded view
|

Re: Running matplotlib on massively parallel compute resources

Kevin Buckley
On 2020/01/17 17:39, Antony Lee wrote:
> I think setting the MPLCONFIGDIR environment variable to a user-writable
> directory should be good enough.  There's an open PR
> (https://github.com/matplotlib/matplotlib/pull/15933)
> which adds a warning in the case where that's needed.
> Antony

Cheers for that, Antony: seems to do what's needed.

I had seen references to MPLCONFIGDIR in a number of threads relating
to matplotlib but hadn't quite worked out it it was what we needed.

Kevin

_______________________________________________
Matplotlib-users mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/matplotlib-users