Plotting Lists of Strings has high CPU

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Plotting Lists of Strings has high CPU

Douglas Clowes
> Strings are now treated as “categories” rather than cast to floats,  and plotted in the order received.

> https://matplotlib.org/gallery/lines_bars_and_markers/categorical_variables.html

> Cheers,   Jody

Thanks for that Jody, I did just "get lucky".

Some assessment of this shows the high CPU associated with this operation is at least partially avoidable.

The majority of the CPU time, according to:
  python3 -m cProfile -s time plotit.py -s|head -n20
is in or under StrCategoryFormatter._text which seems to be getting called exponentially more times than I would expect. Of the order number of categories squared in my samples, with 40K calls for 100 categories and 4M for 1000 on mpl 2.2 amd 6M on mpl 3.0. Seems high.

Within the _text function in 2.2, the most expensive operation is the constant test of the numpy version. This can be significantly reduced by moving the constant expression with a simple change like:

diff --git a/lib/matplotlib/category.py b/lib/matplotlib/category.py
index b135bff1c..89b1c5bd9 100644
--- a/lib/matplotlib/category.py
+++ b/lib/matplotlib/category.py
@@ -28,6 +28,8 @@ import matplotlib.ticker as ticker
 # np 1.6/1.7 support
 from distutils.version import LooseVersion
 
+NP_PRE_1_7_0 = LooseVersion(np.__version__) < LooseVersion('1.7.0')
+
 VALID_TYPES = tuple(set(six.string_types +
                         (bytes, six.text_type, np.str_, np.bytes_)))
 
@@ -158,7 +160,7 @@ class StrCategoryFormatter(ticker.Formatter):
     def _text(value):
         """Converts text values into `utf-8` or `ascii` strings
         """
-        if LooseVersion(np.__version__) < LooseVersion('1.7.0'):
+        if NP_PRE_1_7_0:
             if (isinstance(value, (six.text_type, np.unicode))):
                 value = value.encode('utf-8', 'ignore').decode('utf-8')
         if isinstance(value, (np.bytes_, six.binary_type)):



_______________________________________________
Matplotlib-users mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/matplotlib-users
Reply | Threaded
Open this post in threaded view
|

Re: Plotting Lists of Strings has high CPU

Jody Klymak
Perhaps not surprising that hasn’t been optimized, because most folks don’t have that many categories.  If you have an actual use-case for that many categories, submitting a bug report on Github would be great.  

Cheers,   Jody

On Oct 25, 2018, at  16:47 PM, Douglas Clowes <[hidden email]> wrote:

> Strings are now treated as “categories” rather than cast to floats,  and plotted in the order received.

> https://matplotlib.org/gallery/lines_bars_and_markers/categorical_variables.html

> Cheers,   Jody

Thanks for that Jody, I did just "get lucky".

Some assessment of this shows the high CPU associated with this operation is at least partially avoidable.

The majority of the CPU time, according to:
  python3 -m cProfile -s time plotit.py -s|head -n20
is in or under StrCategoryFormatter._text which seems to be getting called exponentially more times than I would expect. Of the order number of categories squared in my samples, with 40K calls for 100 categories and 4M for 1000 on mpl 2.2 amd 6M on mpl 3.0. Seems high.

Within the _text function in 2.2, the most expensive operation is the constant test of the numpy version. This can be significantly reduced by moving the constant expression with a simple change like:

diff --git a/lib/matplotlib/category.py b/lib/matplotlib/category.py
index b135bff1c..89b1c5bd9 100644
--- a/lib/matplotlib/category.py
+++ b/lib/matplotlib/category.py
@@ -28,6 +28,8 @@ import matplotlib.ticker as ticker
 # np 1.6/1.7 support
 from distutils.version import LooseVersion
 
+NP_PRE_1_7_0 = LooseVersion(np.__version__) < LooseVersion('1.7.0')
+
 VALID_TYPES = tuple(set(six.string_types +
                         (bytes, six.text_type, np.str_, np.bytes_)))
 
@@ -158,7 +160,7 @@ class StrCategoryFormatter(ticker.Formatter):
     def _text(value):
         """Converts text values into `utf-8` or `ascii` strings
         """
-        if LooseVersion(np.__version__) < LooseVersion('1.7.0'):
+        if NP_PRE_1_7_0:
             if (isinstance(value, (six.text_type, np.unicode))):
                 value = value.encode('utf-8', 'ignore').decode('utf-8')
         if isinstance(value, (np.bytes_, six.binary_type)):


_______________________________________________
Matplotlib-users mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/matplotlib-users


_______________________________________________
Matplotlib-users mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/matplotlib-users