plotting distributions, direct input of histogram

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

plotting distributions, direct input of histogram

Neal Becker
I'm frequently plotting distributions using e.g., boxplot, violinplot.   But
I've already binned my data using my own histogram class.  So I already have
an array of bins, and array of counts for each bin.

I don't see any way to directly input this data to plotting routines such as
boxplot or violinplot.  What I've been doing is using collections.Counter to
convert this into a single array, for example if the value '10' occurs
'1000' times, I produce an array with [10]*1000.  Obviously, this doesn't
scale to 10's of millions of samples.

Is there any way to input my data that already has been binned and counted?

Thanks,
Neal

(Also, I really wish the same for seaborn)

_______________________________________________
Matplotlib-users mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/matplotlib-users
Reply | Threaded
Open this post in threaded view
|

Re: plotting distributions, direct input of histogram

Elan Ernest
For boxplots with predefined statistics consider the `ax.bxp` function,

https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.axes.Axes.bxp.html

For violinplots, one can use `ax.violin`,

https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.axes.Axes.violin.html

however, you would need to have calculated the kernel density estimate
yourself, which is in general impossible with already aggregated statistics.


Am 02.08.2019 um 13:32 schrieb Neal Becker:

> I'm frequently plotting distributions using e.g., boxplot, violinplot.   But
> I've already binned my data using my own histogram class.  So I already have
> an array of bins, and array of counts for each bin.
>
> I don't see any way to directly input this data to plotting routines such as
> boxplot or violinplot.  What I've been doing is using collections.Counter to
> convert this into a single array, for example if the value '10' occurs
> '1000' times, I produce an array with [10]*1000.  Obviously, this doesn't
> scale to 10's of millions of samples.
>
> Is there any way to input my data that already has been binned and counted?
>
> Thanks,
> Neal
>
> (Also, I really wish the same for seaborn)
>
> _______________________________________________
> Matplotlib-users mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/matplotlib-users
>
>
_______________________________________________
Matplotlib-users mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/matplotlib-users
Reply | Threaded
Open this post in threaded view
|

Re: plotting distributions, direct input of histogram

Paul Hobson-2
I don't see how a binned histogram results are compatible with a boxplot, which directly computes the quartiles and fences from raw data.

I don't understand how we'd begin to infer what those value are.
-paul

On Fri, Aug 2, 2019 at 1:36 PM Elan Ernest <[hidden email]> wrote:
For boxplots with predefined statistics consider the `ax.bxp` function,

https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.axes.Axes.bxp.html

For violinplots, one can use `ax.violin`,

https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.axes.Axes.violin.html

however, you would need to have calculated the kernel density estimate
yourself, which is in general impossible with already aggregated statistics.


Am 02.08.2019 um 13:32 schrieb Neal Becker:
> I'm frequently plotting distributions using e.g., boxplot, violinplot.   But
> I've already binned my data using my own histogram class.  So I already have
> an array of bins, and array of counts for each bin.
>
> I don't see any way to directly input this data to plotting routines such as
> boxplot or violinplot.  What I've been doing is using collections.Counter to
> convert this into a single array, for example if the value '10' occurs
> '1000' times, I produce an array with [10]*1000.  Obviously, this doesn't
> scale to 10's of millions of samples.
>
> Is there any way to input my data that already has been binned and counted?
>
> Thanks,
> Neal
>
> (Also, I really wish the same for seaborn)
>
> _______________________________________________
> Matplotlib-users mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/matplotlib-users
>
>
_______________________________________________
Matplotlib-users mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/matplotlib-users

_______________________________________________
Matplotlib-users mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/matplotlib-users
Reply | Threaded
Open this post in threaded view
|

Re: plotting distributions, direct input of histogram

Neal Becker
Binning the data will of course results in some quantization error, but if
the bins are small enough that would be acceptable in my application.

Paul Hobson wrote:

> I don't see how a binned histogram results are compatible with a boxplot,
> which directly computes the quartiles and fences from raw data.
>
> I don't understand how we'd begin to infer what those value are.
> -paul
>
> On Fri, Aug 2, 2019 at 1:36 PM Elan Ernest
> <[hidden email]> wrote:
>
>> For boxplots with predefined statistics consider the `ax.bxp` function,
>>
>> https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.axes.Axes.bxp.html
>>
>> For violinplots, one can use `ax.violin`,
>>
>> https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.axes.Axes.violin.html
>>
>> however, you would need to have calculated the kernel density estimate
>> yourself, which is in general impossible with already aggregated
>> statistics.
>>
>>
>> Am 02.08.2019 um 13:32 schrieb Neal Becker:
>> > I'm frequently plotting distributions using e.g., boxplot, violinplot.
>>  But
>> > I've already binned my data using my own histogram class.  So I already
>> have
>> > an array of bins, and array of counts for each bin.
>> >
>> > I don't see any way to directly input this data to plotting routines
>> such as
>> > boxplot or violinplot.  What I've been doing is using
>> collections.Counter to
>> > convert this into a single array, for example if the value '10' occurs
>> > '1000' times, I produce an array with [10]*1000.  Obviously, this
>> > doesn't scale to 10's of millions of samples.
>> >
>> > Is there any way to input my data that already has been binned and
>> counted?
>> >
>> > Thanks,
>> > Neal
>> >
>> > (Also, I really wish the same for seaborn)
>> >
>> > _______________________________________________
>> > Matplotlib-users mailing list
>> > [hidden email]
>> > https://mail.python.org/mailman/listinfo/matplotlib-users
>> >
>> >
>> _______________________________________________
>> Matplotlib-users mailing list
>> [hidden email]
>> https://mail.python.org/mailman/listinfo/matplotlib-users
>>


_______________________________________________
Matplotlib-users mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/matplotlib-users
Reply | Threaded
Open this post in threaded view
|

Re: plotting distributions, direct input of histogram

Paul Hobson-2
In that case, I think you should take Elan's advice, compute the box stats from your histogram data however you feel is appropriate, and then feed that to Axes.bxp, which expects a list of dictionaries.

we split up boxplot into the cbook.boxplot_stats and Axes.bxp for uses cases that we couldn't anticipate.
-paul

On Mon, Aug 5, 2019 at 12:00 PM Neal Becker <[hidden email]> wrote:
Binning the data will of course results in some quantization error, but if
the bins are small enough that would be acceptable in my application.

Paul Hobson wrote:

> I don't see how a binned histogram results are compatible with a boxplot,
> which directly computes the quartiles and fences from raw data.
>
> I don't understand how we'd begin to infer what those value are.
> -paul
>
> On Fri, Aug 2, 2019 at 1:36 PM Elan Ernest
> <[hidden email]> wrote:
>
>> For boxplots with predefined statistics consider the `ax.bxp` function,
>>
>> https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.axes.Axes.bxp.html
>>
>> For violinplots, one can use `ax.violin`,
>>
>> https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.axes.Axes.violin.html
>>
>> however, you would need to have calculated the kernel density estimate
>> yourself, which is in general impossible with already aggregated
>> statistics.
>>
>>
>> Am 02.08.2019 um 13:32 schrieb Neal Becker:
>> > I'm frequently plotting distributions using e.g., boxplot, violinplot.
>>  But
>> > I've already binned my data using my own histogram class.  So I already
>> have
>> > an array of bins, and array of counts for each bin.
>> >
>> > I don't see any way to directly input this data to plotting routines
>> such as
>> > boxplot or violinplot.  What I've been doing is using
>> collections.Counter to
>> > convert this into a single array, for example if the value '10' occurs
>> > '1000' times, I produce an array with [10]*1000.  Obviously, this
>> > doesn't scale to 10's of millions of samples.
>> >
>> > Is there any way to input my data that already has been binned and
>> counted?
>> >
>> > Thanks,
>> > Neal
>> >
>> > (Also, I really wish the same for seaborn)
>> >
>> > _______________________________________________
>> > Matplotlib-users mailing list
>> > [hidden email]
>> > https://mail.python.org/mailman/listinfo/matplotlib-users
>> >
>> >
>> _______________________________________________
>> Matplotlib-users mailing list
>> [hidden email]
>> https://mail.python.org/mailman/listinfo/matplotlib-users
>>


_______________________________________________
Matplotlib-users mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/matplotlib-users

_______________________________________________
Matplotlib-users mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/matplotlib-users