[Matplotlib-devel] 2017-11-20 notes

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

[Matplotlib-devel] 2017-11-20 notes

tcaswell
Folks,

Notes from today's phone call.  There are 3 things left for 2.1.1: categorical changes, 'fuzzy' images when mostly invalid, and the appevyor cleanup

Tom

Ryan May, Eric Firing, Thomas Caswell

** categorical

 - everyone on board with not sorting categorical values
 - everyone on board with only accepting strings as categories
 - some concern about supporting `np.nan`
   - if the first entry in `nan`, will miss units
   - do not want a python loop that chceks until it finds a not-nan
   - defer nan handling

Tom's job to get all of the PRs collected an into one

Plan going forward to support mixed types, missing data, and explicit
ordering between categories:
 - write a category class
 - write a handler for pandas categorical

** Tom's pre-meeting notes 

*** do not sort values

On one hand, sorting the values make sense as 

#+BEGIN_SRC python
  fig, (ax1, ax2) = plt.subplots(2, 1)
  ax1.scatter([1, 2], [1, 2])
  ax2.scatter([2, 1], [2, 1])

#+END_SRC

should produce visually identical plots so Tom thinks


#+BEGIN_SRC python
  fig, (ax1, ax2) = plt.subplots(2, 1)
  ax1.scatter(['a', 'b'], [1, 2])
  ax2.scatter(['b', 'a'], [2, 1])

#+END_SRC

should as well (but Tom is wrong).

On the other hand, user may expect that there is some semantics in the
order they pass the data in in

#+BEGIN_SRC python
  plt.bar(['first', 'second', 'third'], [1, 2, 3])

#+END_SRC

and blindly sorting alphabetically gives them no escape hatch.

practicality over purity, drop the sorting.

*** supporting non-string values

In the original implementation a cast through numpy was used which
converted all non-string values to strings so things like

#+BEGIN_SRC python
  plt.bar([1, 2, 'apple'], [1, 2, 3])

#+END_SRC

would work.  However this lead to the =2= and ='2'= being treated as
the same (which seems less than great).  Supporting them as different
is possible, but is a fair bit of work because a number of places the
unit framework assumes that 'plain' numbers will pass through
un-changed.

A more worrying concern is that

#+BEGIN_SRC python
  x = [52, 32, 'a', 'b']
  y = [0, 1, 2, 3]

  fig, (ax1, ax2) = plt.subplots()
  ax1.plot(x, y, 'o')
  ax2.plot(x[:2], y[:2], 'x')

#+END_SRC

in the first case the ints are treated as categoricals and in the
second they are not.  If we want to support mixed types like this then
we need to make a special class (or use pandas categorical) which does
not have to guess the type on the fly.

requiring if the categorical unit handling is triggered, then all of the values
must be string-like seems like the safest approach.  

*** support for nan

Most of matptollib accepts `np.nan` as 'missing' data which is then
dropped as part of the draw process.  This makes less sense with `bar` but makes
lots of sense with `scatter`.

We should special-case allowing `np.nan` in as a 'string' and map it
map it to it's self.

*** special containers 

It was proposed to look for objects arrays as a marker for catagorical
instead of the type of the data.  Do not think we should do this as we try to
be as agnostic about the container as possible everywhere else.


** appveyor
 - drop building conda package
 - remove conda recipe from the repo

Ryan is taking care of this

** set_theta_grid(frac)
 - merged, improvement over current behavior, raising seems too
   aggressive

** #8947
 - ringing with lots of nans

this is Tom's job to investigate

** talked about traits / traitlets and friends

** major funding
 - get mplot3D 'right'
   - same interface
   - uses real 3D tools

_______________________________________________
Matplotlib-devel mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/matplotlib-devel
Reply | Threaded
Open this post in threaded view
|

Re: 2017-11-20 notes

Adrien VINCENT
Thank you Tom for the summary (a well as Ryan and Eric for the meeting
of course).

Just being curious, does the following point (at the very end of your
email) mean that there is some plan to perform kind of a big overhaul of
mplot3d?

 > ** major funding
 >   - get mplot3D 'right'
 >     - same interface
 >     - uses real 3D tools

I am asking mainly because I cannot find a MEP about that in the
documentation, even though it seems like a rather major goal/news to me
:). But I may have misunderstood something about what is stated above...

Best,
Adrien


On 11/20/2017 12:06 PM, Thomas Caswell wrote:

> Folks,
>
> Notes from today's phone call.  There are 3 things left for 2.1.1:
> categorical changes, 'fuzzy' images when mostly invalid, and the
> appevyor cleanup
>
> Tom
>
> Ryan May, Eric Firing, Thomas Caswell
>
> ** categorical
>
>   - everyone on board with not sorting categorical values
>   - everyone on board with only accepting strings as categories
>   - some concern about supporting `np.nan`
>     - if the first entry in `nan`, will miss units
>     - do not want a python loop that chceks until it finds a not-nan
>     - defer nan handling
>
> Tom's job to get all of the PRs collected an into one
>
> Plan going forward to support mixed types, missing data, and explicit
> ordering between categories:
>   - write a category class
>   - write a handler for pandas categorical
>
> ** Tom's pre-meeting notes
>
> *** do not sort values
>
> On one hand, sorting the values make sense as
>
> #+BEGIN_SRC python
>    fig, (ax1, ax2) = plt.subplots(2, 1)
>    ax1.scatter([1, 2], [1, 2])
>    ax2.scatter([2, 1], [2, 1])
>
> #+END_SRC
>
> should produce visually identical plots so Tom thinks
>
>
> #+BEGIN_SRC python
>    fig, (ax1, ax2) = plt.subplots(2, 1)
>    ax1.scatter(['a', 'b'], [1, 2])
>    ax2.scatter(['b', 'a'], [2, 1])
>
> #+END_SRC
>
> should as well (but Tom is wrong).
>
> On the other hand, user may expect that there is some semantics in the
> order they pass the data in in
>
> #+BEGIN_SRC python
>    plt.bar(['first', 'second', 'third'], [1, 2, 3])
>
> #+END_SRC
>
> and blindly sorting alphabetically gives them no escape hatch.
>
> practicality over purity, drop the sorting.
>
> *** supporting non-string values
>
> In the original implementation a cast through numpy was used which
> converted all non-string values to strings so things like
>
> #+BEGIN_SRC python
>    plt.bar([1, 2, 'apple'], [1, 2, 3])
>
> #+END_SRC
>
> would work.  However this lead to the =2= and ='2'= being treated as
> the same (which seems less than great).  Supporting them as different
> is possible, but is a fair bit of work because a number of places the
> unit framework assumes that 'plain' numbers will pass through
> un-changed.
>
> A more worrying concern is that
>
> #+BEGIN_SRC python
>    x = [52, 32, 'a', 'b']
>    y = [0, 1, 2, 3]
>
>    fig, (ax1, ax2) = plt.subplots()
>    ax1.plot(x, y, 'o')
>    ax2.plot(x[:2], y[:2], 'x')
>
> #+END_SRC
>
> in the first case the ints are treated as categoricals and in the
> second they are not.  If we want to support mixed types like this then
> we need to make a special class (or use pandas categorical) which does
> not have to guess the type on the fly.
>
> requiring if the categorical unit handling is triggered, then all of the
> values
> must be string-like seems like the safest approach.
>
> *** support for nan
>
> Most of matptollib accepts `np.nan` as 'missing' data which is then
> dropped as part of the draw process.  This makes less sense with `bar`
> but makes
> lots of sense with `scatter`.
>
> We should special-case allowing `np.nan` in as a 'string' and map it
> map it to it's self.
>
> *** special containers
>
> It was proposed to look for objects arrays as a marker for catagorical
> instead of the type of the data.  Do not think we should do this as we
> try to
> be as agnostic about the container as possible everywhere else.
>
>
> ** appveyor
>   - drop building conda package
>   - remove conda recipe from the repo
>
> Ryan is taking care of this
>
> ** set_theta_grid(frac)
>   - merged, improvement over current behavior, raising seems too
>     aggressive
>
> ** #8947
>   - ringing with lots of nans
>
> this is Tom's job to investigate
>
> ** talked about traits / traitlets and friends
>
> ** major funding
>   - get mplot3D 'right'
>     - same interface
>     - uses real 3D tools
>
>
> _______________________________________________
> Matplotlib-devel mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/matplotlib-devel
>

_______________________________________________
Matplotlib-devel mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/matplotlib-devel
Reply | Threaded
Open this post in threaded view
|

Re: 2017-11-20 notes

tcaswell
Improved 3D support has been an ask as long as I have been involved with Matplotlib (and I suspect for as long as the 3D support has existed) and was discussed at scipy 2 years ago, but I don't think there is a specific MEP for it.  That note was more meant as "Eric suggested that a thing to ask for money to do would be..." rather than any sort of formal plan or commitment!

Tom

On Mon, Nov 20, 2017 at 3:25 PM [hidden email] <[hidden email]> wrote:
Thank you Tom for the summary (a well as Ryan and Eric for the meeting
of course).

Just being curious, does the following point (at the very end of your
email) mean that there is some plan to perform kind of a big overhaul of
mplot3d?

 > ** major funding
 >   - get mplot3D 'right'
 >     - same interface
 >     - uses real 3D tools

I am asking mainly because I cannot find a MEP about that in the
documentation, even though it seems like a rather major goal/news to me
:). But I may have misunderstood something about what is stated above...

Best,
Adrien


On 11/20/2017 12:06 PM, Thomas Caswell wrote:
> Folks,
>
> Notes from today's phone call.  There are 3 things left for 2.1.1:
> categorical changes, 'fuzzy' images when mostly invalid, and the
> appevyor cleanup
>
> Tom
>
> Ryan May, Eric Firing, Thomas Caswell
>
> ** categorical
>
>   - everyone on board with not sorting categorical values
>   - everyone on board with only accepting strings as categories
>   - some concern about supporting `np.nan`
>     - if the first entry in `nan`, will miss units
>     - do not want a python loop that chceks until it finds a not-nan
>     - defer nan handling
>
> Tom's job to get all of the PRs collected an into one
>
> Plan going forward to support mixed types, missing data, and explicit
> ordering between categories:
>   - write a category class
>   - write a handler for pandas categorical
>
> ** Tom's pre-meeting notes
>
> *** do not sort values
>
> On one hand, sorting the values make sense as
>
> #+BEGIN_SRC python
>    fig, (ax1, ax2) = plt.subplots(2, 1)
>    ax1.scatter([1, 2], [1, 2])
>    ax2.scatter([2, 1], [2, 1])
>
> #+END_SRC
>
> should produce visually identical plots so Tom thinks
>
>
> #+BEGIN_SRC python
>    fig, (ax1, ax2) = plt.subplots(2, 1)
>    ax1.scatter(['a', 'b'], [1, 2])
>    ax2.scatter(['b', 'a'], [2, 1])
>
> #+END_SRC
>
> should as well (but Tom is wrong).
>
> On the other hand, user may expect that there is some semantics in the
> order they pass the data in in
>
> #+BEGIN_SRC python
>    plt.bar(['first', 'second', 'third'], [1, 2, 3])
>
> #+END_SRC
>
> and blindly sorting alphabetically gives them no escape hatch.
>
> practicality over purity, drop the sorting.
>
> *** supporting non-string values
>
> In the original implementation a cast through numpy was used which
> converted all non-string values to strings so things like
>
> #+BEGIN_SRC python
>    plt.bar([1, 2, 'apple'], [1, 2, 3])
>
> #+END_SRC
>
> would work.  However this lead to the =2= and ='2'= being treated as
> the same (which seems less than great).  Supporting them as different
> is possible, but is a fair bit of work because a number of places the
> unit framework assumes that 'plain' numbers will pass through
> un-changed.
>
> A more worrying concern is that
>
> #+BEGIN_SRC python
>    x = [52, 32, 'a', 'b']
>    y = [0, 1, 2, 3]
>
>    fig, (ax1, ax2) = plt.subplots()
>    ax1.plot(x, y, 'o')
>    ax2.plot(x[:2], y[:2], 'x')
>
> #+END_SRC
>
> in the first case the ints are treated as categoricals and in the
> second they are not.  If we want to support mixed types like this then
> we need to make a special class (or use pandas categorical) which does
> not have to guess the type on the fly.
>
> requiring if the categorical unit handling is triggered, then all of the
> values
> must be string-like seems like the safest approach.
>
> *** support for nan
>
> Most of matptollib accepts `np.nan` as 'missing' data which is then
> dropped as part of the draw process.  This makes less sense with `bar`
> but makes
> lots of sense with `scatter`.
>
> We should special-case allowing `np.nan` in as a 'string' and map it
> map it to it's self.
>
> *** special containers
>
> It was proposed to look for objects arrays as a marker for catagorical
> instead of the type of the data.  Do not think we should do this as we
> try to
> be as agnostic about the container as possible everywhere else.
>
>
> ** appveyor
>   - drop building conda package
>   - remove conda recipe from the repo
>
> Ryan is taking care of this
>
> ** set_theta_grid(frac)
>   - merged, improvement over current behavior, raising seems too
>     aggressive
>
> ** #8947
>   - ringing with lots of nans
>
> this is Tom's job to investigate
>
> ** talked about traits / traitlets and friends
>
> ** major funding
>   - get mplot3D 'right'
>     - same interface
>     - uses real 3D tools
>
>
> _______________________________________________
> Matplotlib-devel mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/matplotlib-devel
>

_______________________________________________
Matplotlib-devel mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/matplotlib-devel

_______________________________________________
Matplotlib-devel mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/matplotlib-devel
Reply | Threaded
Open this post in threaded view
|

Re: 2017-11-20 notes

Adrien VINCENT
Thanks Tom for clarifying this.

Fair enough :).

Adrien

On 11/20/2017 12:42 PM, Thomas Caswell wrote:

> Improved 3D support has been an ask as long as I have been involved with
> Matplotlib (and I suspect for as long as the 3D support has existed) and
> was discussed at scipy 2 years ago, but I don't think there is a
> specific MEP for it.  That note was more meant as "Eric suggested that a
> thing to ask for money to do would be..." rather than any sort of formal
> plan or commitment!
>
> Tom
>
> On Mon, Nov 20, 2017 at 3:25 PM [hidden email]
> <mailto:[hidden email]> <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Thank you Tom for the summary (a well as Ryan and Eric for the meeting
>     of course).
>
>     Just being curious, does the following point (at the very end of your
>     email) mean that there is some plan to perform kind of a big overhaul of
>     mplot3d?
>
>       > ** major funding
>       >   - get mplot3D 'right'
>       >     - same interface
>       >     - uses real 3D tools
>
>     I am asking mainly because I cannot find a MEP about that in the
>     documentation, even though it seems like a rather major goal/news to me
>     :). But I may have misunderstood something about what is stated above...
>
>     Best,
>     Adrien
>
>
>     On 11/20/2017 12:06 PM, Thomas Caswell wrote:
>      > Folks,
>      >
>      > Notes from today's phone call.  There are 3 things left for 2.1.1:
>      > categorical changes, 'fuzzy' images when mostly invalid, and the
>      > appevyor cleanup
>      >
>      > Tom
>      >
>      > Ryan May, Eric Firing, Thomas Caswell
>      >
>      > ** categorical
>      >
>      >   - everyone on board with not sorting categorical values
>      >   - everyone on board with only accepting strings as categories
>      >   - some concern about supporting `np.nan`
>      >     - if the first entry in `nan`, will miss units
>      >     - do not want a python loop that chceks until it finds a not-nan
>      >     - defer nan handling
>      >
>      > Tom's job to get all of the PRs collected an into one
>      >
>      > Plan going forward to support mixed types, missing data, and explicit
>      > ordering between categories:
>      >   - write a category class
>      >   - write a handler for pandas categorical
>      >
>      > ** Tom's pre-meeting notes
>      >
>      > *** do not sort values
>      >
>      > On one hand, sorting the values make sense as
>      >
>      > #+BEGIN_SRC python
>      >    fig, (ax1, ax2) = plt.subplots(2, 1)
>      >    ax1.scatter([1, 2], [1, 2])
>      >    ax2.scatter([2, 1], [2, 1])
>      >
>      > #+END_SRC
>      >
>      > should produce visually identical plots so Tom thinks
>      >
>      >
>      > #+BEGIN_SRC python
>      >    fig, (ax1, ax2) = plt.subplots(2, 1)
>      >    ax1.scatter(['a', 'b'], [1, 2])
>      >    ax2.scatter(['b', 'a'], [2, 1])
>      >
>      > #+END_SRC
>      >
>      > should as well (but Tom is wrong).
>      >
>      > On the other hand, user may expect that there is some semantics
>     in the
>      > order they pass the data in in
>      >
>      > #+BEGIN_SRC python
>      >    plt.bar(['first', 'second', 'third'], [1, 2, 3])
>      >
>      > #+END_SRC
>      >
>      > and blindly sorting alphabetically gives them no escape hatch.
>      >
>      > practicality over purity, drop the sorting.
>      >
>      > *** supporting non-string values
>      >
>      > In the original implementation a cast through numpy was used which
>      > converted all non-string values to strings so things like
>      >
>      > #+BEGIN_SRC python
>      >    plt.bar([1, 2, 'apple'], [1, 2, 3])
>      >
>      > #+END_SRC
>      >
>      > would work.  However this lead to the =2= and ='2'= being treated as
>      > the same (which seems less than great).  Supporting them as different
>      > is possible, but is a fair bit of work because a number of places the
>      > unit framework assumes that 'plain' numbers will pass through
>      > un-changed.
>      >
>      > A more worrying concern is that
>      >
>      > #+BEGIN_SRC python
>      >    x = [52, 32, 'a', 'b']
>      >    y = [0, 1, 2, 3]
>      >
>      >    fig, (ax1, ax2) = plt.subplots()
>      >    ax1.plot(x, y, 'o')
>      >    ax2.plot(x[:2], y[:2], 'x')
>      >
>      > #+END_SRC
>      >
>      > in the first case the ints are treated as categoricals and in the
>      > second they are not.  If we want to support mixed types like this
>     then
>      > we need to make a special class (or use pandas categorical) which
>     does
>      > not have to guess the type on the fly.
>      >
>      > requiring if the categorical unit handling is triggered, then all
>     of the
>      > values
>      > must be string-like seems like the safest approach.
>      >
>      > *** support for nan
>      >
>      > Most of matptollib accepts `np.nan` as 'missing' data which is then
>      > dropped as part of the draw process.  This makes less sense with
>     `bar`
>      > but makes
>      > lots of sense with `scatter`.
>      >
>      > We should special-case allowing `np.nan` in as a 'string' and map it
>      > map it to it's self.
>      >
>      > *** special containers
>      >
>      > It was proposed to look for objects arrays as a marker for
>     catagorical
>      > instead of the type of the data.  Do not think we should do this
>     as we
>      > try to
>      > be as agnostic about the container as possible everywhere else.
>      >
>      >
>      > ** appveyor
>      >   - drop building conda package
>      >   - remove conda recipe from the repo
>      >
>      > Ryan is taking care of this
>      >
>      > ** set_theta_grid(frac)
>      >   - merged, improvement over current behavior, raising seems too
>      >     aggressive
>      >
>      > ** #8947
>      >   - ringing with lots of nans
>      >
>      > this is Tom's job to investigate
>      >
>      > ** talked about traits / traitlets and friends
>      >
>      > ** major funding
>      >   - get mplot3D 'right'
>      >     - same interface
>      >     - uses real 3D tools
>      >
>      >
>      > _______________________________________________
>      > Matplotlib-devel mailing list
>      > [hidden email] <mailto:[hidden email]>
>      > https://mail.python.org/mailman/listinfo/matplotlib-devel
>      >
>
>     _______________________________________________
>     Matplotlib-devel mailing list
>     [hidden email] <mailto:[hidden email]>
>     https://mail.python.org/mailman/listinfo/matplotlib-devel
>

_______________________________________________
Matplotlib-devel mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/matplotlib-devel
Reply | Threaded
Open this post in threaded view
|

Re: 2017-11-20 notes

Jody Klymak
In reply to this post by tcaswell
First, thanks so much for the careful notes and the useful examples!

*** supporting non-string values

In the original implementation a cast through numpy was used which
converted all non-string values to strings so things like

#+BEGIN_SRC python
  plt.bar([1, 2, 'apple'], [1, 2, 3])

#+END_SRC

would work.  However this lead to the =2= and ='2'= being treated as
the same (which seems less than great).  Supporting them as different
is possible, but is a fair bit of work because a number of places the
unit framework assumes that 'plain' numbers will pass through
un-changed.

I don’t think there are *many* places in the unit framework where plain numbers are assumed to be OK to pass through.  

locks out changing converters, even if the first converter is for floats (I defined a DefaultConverter for numbers).  So you can’t plot a date object on a float axis, or vice versa.  

It only fails 14 tests where the test tried to pass a hard-coded already-converted value to the axis.  i.e. in some `dates` tests and some `jpl` tests:

```python
 def test_single_date():
        time1 = [721964.0]
        data1 = [-65.54]
    
        fig = plt.figure()
        plt.subplot(211)
        plt.plot_date(time1, data1, 'o', color='r’)
```

But it otherwise passes everything.  I’d need to look at the JPL tests more carefully, but the date tests could easily be rewritten to actually pass date objects if thats the way we want to go.  

The point is that being able to pass plain numbers after the converter is set is *not* widely *needed*.  The only place I had to change the code was `cla` where the x/y-limits are reset to (0,1) which we didn’t want triggering the converter lock.    However, I can appreciate the argument that it may be *desired* to pass floats.

A more worrying concern is that

#+BEGIN_SRC python
  x = [52, 32, 'a', 'b']
  y = [0, 1, 2, 3]

  fig, (ax1, ax2) = plt.subplots()
  ax1.plot(x, y, 'o')
  ax2.plot(x[:2], y[:2], 'x')

#+END_SRC

in the first case the ints are treated as categoricals and in the
second they are not.  If we want to support mixed types like this then
we need to make a special class (or use pandas categorical) which does
not have to guess the type on the fly.

requiring if the categorical unit handling is triggered, then all of the values
must be string-like seems like the safest approach.  

An alternative to locking the converter is just letting the first real call to the axis *set* the converter, and then let the converter to decide if it will deal with arbitrary values.  i.e. the first call above would set the converter to “CategoricalConverter” and the second call would be sent to CategoricalConverter.  Of course calling in the opposite order would send to the DefaultConverter, and it would TypeError on the second call.  

If we went this route, then the DateConverter could allow plain floats and those would be treated as converted datenums. 

Cheers,   Jody  





*** support for nan

Most of matptollib accepts `np.nan` as 'missing' data which is then
dropped as part of the draw process.  This makes less sense with `bar` but makes
lots of sense with `scatter`.

We should special-case allowing `np.nan` in as a 'string' and map it
map it to it's self.

*** special containers 

It was proposed to look for objects arrays as a marker for catagorical
instead of the type of the data.  Do not think we should do this as we try to
be as agnostic about the container as possible everywhere else.


** appveyor
 - drop building conda package
 - remove conda recipe from the repo

Ryan is taking care of this

** set_theta_grid(frac)
 - merged, improvement over current behavior, raising seems too
   aggressive

** #8947
 - ringing with lots of nans

this is Tom's job to investigate

** talked about traits / traitlets and friends

** major funding
 - get mplot3D 'right'
   - same interface
   - uses real 3D tools
_______________________________________________
Matplotlib-devel mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/matplotlib-devel

--
Jody Klymak    






_______________________________________________
Matplotlib-devel mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/matplotlib-devel
Reply | Threaded
Open this post in threaded view
|

Re: 2017-11-20 notes

Antony Lee
In reply to this post by Adrien VINCENT
A bit late to the party, but here are some (not very deep) thoughts regarding 3D support.

1. I believe that writing a new 3D projection class from scratch, properly integrated with the transforms system (so with support for log-scale, etc.), should be a "relatively easy" (if somewhat tedious and lengthy...) task, for someone who knows Matplotlib's internals well (probably easier than trying to forcefully modernize mplot3d).  All the machinery is already there.  Given the way projections work, this can be started as a third-party project (requiring explicit registration by import, just like mplot3d does now) and later be integrated into matplotlib's core.

2. More difficult to resolve is z-buffering (i.e., support for overlapping polygons that cannot be sorted unambiguously using a single z-value).  As far as I can tell, this would require adding a new method on the renderer classes (because there is simply no way to pass z-information to the renderer right now -- z-sorting occurs before data reaches the renderer), and, well, implementing it.

Antony


2017-11-20 12:25 GMT-08:00 [hidden email] <[hidden email]>:
Thank you Tom for the summary (a well as Ryan and Eric for the meeting of course).

Just being curious, does the following point (at the very end of your email) mean that there is some plan to perform kind of a big overhaul of mplot3d?

> ** major funding
>   - get mplot3D 'right'
>     - same interface
>     - uses real 3D tools

I am asking mainly because I cannot find a MEP about that in the documentation, even though it seems like a rather major goal/news to me :). But I may have misunderstood something about what is stated above...

Best,
Adrien



On 11/20/2017 12:06 PM, Thomas Caswell wrote:
Folks,

Notes from today's phone call.  There are 3 things left for 2.1.1: categorical changes, 'fuzzy' images when mostly invalid, and the appevyor cleanup

Tom

Ryan May, Eric Firing, Thomas Caswell

** categorical

  - everyone on board with not sorting categorical values
  - everyone on board with only accepting strings as categories
  - some concern about supporting `np.nan`
    - if the first entry in `nan`, will miss units
    - do not want a python loop that chceks until it finds a not-nan
    - defer nan handling

Tom's job to get all of the PRs collected an into one

Plan going forward to support mixed types, missing data, and explicit
ordering between categories:
  - write a category class
  - write a handler for pandas categorical

** Tom's pre-meeting notes

*** do not sort values

On one hand, sorting the values make sense as

#+BEGIN_SRC python
   fig, (ax1, ax2) = plt.subplots(2, 1)
   ax1.scatter([1, 2], [1, 2])
   ax2.scatter([2, 1], [2, 1])

#+END_SRC

should produce visually identical plots so Tom thinks


#+BEGIN_SRC python
   fig, (ax1, ax2) = plt.subplots(2, 1)
   ax1.scatter(['a', 'b'], [1, 2])
   ax2.scatter(['b', 'a'], [2, 1])

#+END_SRC

should as well (but Tom is wrong).

On the other hand, user may expect that there is some semantics in the
order they pass the data in in

#+BEGIN_SRC python
   plt.bar(['first', 'second', 'third'], [1, 2, 3])

#+END_SRC

and blindly sorting alphabetically gives them no escape hatch.

practicality over purity, drop the sorting.

*** supporting non-string values

In the original implementation a cast through numpy was used which
converted all non-string values to strings so things like

#+BEGIN_SRC python
   plt.bar([1, 2, 'apple'], [1, 2, 3])

#+END_SRC

would work.  However this lead to the =2= and ='2'= being treated as
the same (which seems less than great).  Supporting them as different
is possible, but is a fair bit of work because a number of places the
unit framework assumes that 'plain' numbers will pass through
un-changed.

A more worrying concern is that

#+BEGIN_SRC python
   x = [52, 32, 'a', 'b']
   y = [0, 1, 2, 3]

   fig, (ax1, ax2) = plt.subplots()
   ax1.plot(x, y, 'o')
   ax2.plot(x[:2], y[:2], 'x')

#+END_SRC

in the first case the ints are treated as categoricals and in the
second they are not.  If we want to support mixed types like this then
we need to make a special class (or use pandas categorical) which does
not have to guess the type on the fly.

requiring if the categorical unit handling is triggered, then all of the values
must be string-like seems like the safest approach.

*** support for nan

Most of matptollib accepts `np.nan` as 'missing' data which is then
dropped as part of the draw process.  This makes less sense with `bar` but makes
lots of sense with `scatter`.

We should special-case allowing `np.nan` in as a 'string' and map it
map it to it's self.

*** special containers

It was proposed to look for objects arrays as a marker for catagorical
instead of the type of the data.  Do not think we should do this as we try to
be as agnostic about the container as possible everywhere else.


** appveyor
  - drop building conda package
  - remove conda recipe from the repo

Ryan is taking care of this

** set_theta_grid(frac)
  - merged, improvement over current behavior, raising seems too
    aggressive

** #8947
  - ringing with lots of nans

this is Tom's job to investigate

** talked about traits / traitlets and friends

** major funding
  - get mplot3D 'right'
    - same interface
    - uses real 3D tools


_______________________________________________
Matplotlib-devel mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/matplotlib-devel


_______________________________________________
Matplotlib-devel mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/matplotlib-devel


_______________________________________________
Matplotlib-devel mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/matplotlib-devel
Reply | Threaded
Open this post in threaded view
|

Re: 2017-11-20 notes

Ryan May-3
Doing #2 properly requires passing depth information fully down into the rasterizer (e.g. Agg), and having it compute z for each pixel (fragment). That’s a ton of work, and GPUs already do this.

Ryan

On Tue, Nov 28, 2017 at 3:02 AM Antony Lee <[hidden email]> wrote:
A bit late to the party, but here are some (not very deep) thoughts regarding 3D support.

1. I believe that writing a new 3D projection class from scratch, properly integrated with the transforms system (so with support for log-scale, etc.), should be a "relatively easy" (if somewhat tedious and lengthy...) task, for someone who knows Matplotlib's internals well (probably easier than trying to forcefully modernize mplot3d).  All the machinery is already there.  Given the way projections work, this can be started as a third-party project (requiring explicit registration by import, just like mplot3d does now) and later be integrated into matplotlib's core.

2. More difficult to resolve is z-buffering (i.e., support for overlapping polygons that cannot be sorted unambiguously using a single z-value).  As far as I can tell, this would require adding a new method on the renderer classes (because there is simply no way to pass z-information to the renderer right now -- z-sorting occurs before data reaches the renderer), and, well, implementing it.

Antony


2017-11-20 12:25 GMT-08:00 [hidden email] <[hidden email]>:
Thank you Tom for the summary (a well as Ryan and Eric for the meeting of course).

Just being curious, does the following point (at the very end of your email) mean that there is some plan to perform kind of a big overhaul of mplot3d?

> ** major funding
>   - get mplot3D 'right'
>     - same interface
>     - uses real 3D tools

I am asking mainly because I cannot find a MEP about that in the documentation, even though it seems like a rather major goal/news to me :). But I may have misunderstood something about what is stated above...

Best,
Adrien



On 11/20/2017 12:06 PM, Thomas Caswell wrote:
Folks,

Notes from today's phone call.  There are 3 things left for 2.1.1: categorical changes, 'fuzzy' images when mostly invalid, and the appevyor cleanup

Tom

Ryan May, Eric Firing, Thomas Caswell

** categorical

  - everyone on board with not sorting categorical values
  - everyone on board with only accepting strings as categories
  - some concern about supporting `np.nan`
    - if the first entry in `nan`, will miss units
    - do not want a python loop that chceks until it finds a not-nan
    - defer nan handling

Tom's job to get all of the PRs collected an into one

Plan going forward to support mixed types, missing data, and explicit
ordering between categories:
  - write a category class
  - write a handler for pandas categorical

** Tom's pre-meeting notes

*** do not sort values

On one hand, sorting the values make sense as

#+BEGIN_SRC python
   fig, (ax1, ax2) = plt.subplots(2, 1)
   ax1.scatter([1, 2], [1, 2])
   ax2.scatter([2, 1], [2, 1])

#+END_SRC

should produce visually identical plots so Tom thinks


#+BEGIN_SRC python
   fig, (ax1, ax2) = plt.subplots(2, 1)
   ax1.scatter(['a', 'b'], [1, 2])
   ax2.scatter(['b', 'a'], [2, 1])

#+END_SRC

should as well (but Tom is wrong).

On the other hand, user may expect that there is some semantics in the
order they pass the data in in

#+BEGIN_SRC python
   plt.bar(['first', 'second', 'third'], [1, 2, 3])

#+END_SRC

and blindly sorting alphabetically gives them no escape hatch.

practicality over purity, drop the sorting.

*** supporting non-string values

In the original implementation a cast through numpy was used which
converted all non-string values to strings so things like

#+BEGIN_SRC python
   plt.bar([1, 2, 'apple'], [1, 2, 3])

#+END_SRC

would work.  However this lead to the =2= and ='2'= being treated as
the same (which seems less than great).  Supporting them as different
is possible, but is a fair bit of work because a number of places the
unit framework assumes that 'plain' numbers will pass through
un-changed.

A more worrying concern is that

#+BEGIN_SRC python
   x = [52, 32, 'a', 'b']
   y = [0, 1, 2, 3]

   fig, (ax1, ax2) = plt.subplots()
   ax1.plot(x, y, 'o')
   ax2.plot(x[:2], y[:2], 'x')

#+END_SRC

in the first case the ints are treated as categoricals and in the
second they are not.  If we want to support mixed types like this then
we need to make a special class (or use pandas categorical) which does
not have to guess the type on the fly.

requiring if the categorical unit handling is triggered, then all of the values
must be string-like seems like the safest approach.

*** support for nan

Most of matptollib accepts `np.nan` as 'missing' data which is then
dropped as part of the draw process.  This makes less sense with `bar` but makes
lots of sense with `scatter`.

We should special-case allowing `np.nan` in as a 'string' and map it
map it to it's self.

*** special containers

It was proposed to look for objects arrays as a marker for catagorical
instead of the type of the data.  Do not think we should do this as we try to
be as agnostic about the container as possible everywhere else.


** appveyor
  - drop building conda package
  - remove conda recipe from the repo

Ryan is taking care of this

** set_theta_grid(frac)
  - merged, improvement over current behavior, raising seems too
    aggressive

** #8947
  - ringing with lots of nans

this is Tom's job to investigate

** talked about traits / traitlets and friends

** major funding
  - get mplot3D 'right'
    - same interface
    - uses real 3D tools


_______________________________________________
Matplotlib-devel mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/matplotlib-devel


_______________________________________________
Matplotlib-devel mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/matplotlib-devel

_______________________________________________
Matplotlib-devel mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/matplotlib-devel
--
Ryan May


_______________________________________________
Matplotlib-devel mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/matplotlib-devel