Reshaping Tabular Data to Column Data

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Reshaping Tabular Data to Column Data

Benson, Randall

Hi –

I seem to be stuck here trying to convert downloaded data that is in tabular form since I need to analyze it in single column format. The data is from here and I’m using read_csv as shown below –

 

pnadata1= pd.read_csv('https://www.esrl.noaa.gov/psd/data/correlation/pna.data')

 

The downloaded data looks like this with the values in months 1-12 across the columns below for each year 1948-present –

 

                                    1948        2020

0   1948 -99.90 -99.90 -99.90 -99.90 -99.90 -99.9...

1   1949 -99.90 -99.90 -99.90 -99.90 -99.90 -99.9...

2   1950  -3.65  -1.69  -0.06  -0.23  -0.40  -1.9...

3   1951  -1.18  -2.11  -1.09   0.47   1.19  -0.5...

4   1952  -3.19   0.08  -0.69   0.89  -0.79   0.5...

 

And, I need it to be reshaped to look like this in yyyymm values format so I can compare it easily to other downloaded data in the column format –

 

194801 -99.90

194802 -99.90

194803 -99.90

...

202001 -3.45

202002  0.87

 

I’ve tried working with “melt” and “group” in different combinations but cannot figure it out given the online references. Thank you,

Randall

 

 

Description: cid:image001.jpg@01D34737.23261CB0

Randall P. Benson, PhD
Wind Asset Meteorologist, Energy Resource - Onshore

1125 NW Couch Street, Suite 700

Portland, Oregon, USA 97209
Telephone  503-796-7129
Cell  971-227-2477
[hidden email]


Description: cid:image002.png@01D34737.23261CB0


In the interest of the environment,
please print only if necessary and recycle.

 

 

 

==============================================================

Please consider the environment before printing this email.

If you have received this message in error, please notify the sender and immediately delete this message and any attachment hereto and/or copy hereof, as such message contains confidential information intended solely for the individual or entity to whom it is addressed. The use or disclosure of such information to third parties is prohibited by law and may give rise to civil or criminal liability.

The views presented in this message are solely those of the author(s) and do not necessarily represent the opinion of Avangrid Renewables, LLC. or any company of its group. Neither Avangrid Renewables, LLC. nor any company of its group guarantees the integrity, security or proper receipt of this message. Likewise, neither Avangrid Renewables, LLC. nor any company of its group accepts any liability whatsoever for any possible damages arising from, or in connection with, data interception, software viruses or manipulation by third parties.

 ==============================================================

_______________________________________________
Matplotlib-users mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/matplotlib-users
Reply | Threaded
Open this post in threaded view
|

Re: Reshaping Tabular Data to Column Data

matplotlib - users mailing list
Does this do what you want?

# read data from noaa, set index to first column and column names to '00' - '12'
df = pd.read_csv('https://www.esrl.noaa.gov/psd/data/correlation/pna.data',
                 skiprows=1,
                 skipfooter=3,
                 index_col=[0],
                 header=None,
                 names=['{:02d}'.format(i) for i in range(13)],
                 delim_whitespace=True)
# unstack the dataframe to get a Series with  and a multiindex
# consisting of the date and column name
df = df.unstack()
# sort index by [date, column name] to "group" the dates
df = df.sort_index(level=[1,0])
# create a new index using the existing multiindex levels
ndx = [int(''.join([y,c]))
       for y in df.index.levels[1].astype(str)
       for c in df.index.levels[0]]
df.index = ndx
df.head(28)

194801   -99.90
194802   -99.90
194803   -99.90
...
194911   -99.90
194912   -99.90
195001    -3.65
195002    -1.69
195003    -0.06
195004    -0.23

df.tail(12)

202001    -0.95
202002    -0.07
202003   -99.90
202004   -99.90
202005   -99.90
202006   -99.90
202007   -99.90
202008   -99.90
202009   -99.90
202010   -99.90
202011   -99.90
202012   -99.90

Note that the 202001 and 202002 values do not agree with the values in your email message but do agree with the values on the website.

Hth,
Scott


> On Mar 24, 2020, at 6:12 PM, Benson, Randall <[hidden email]> wrote:
>
> Hi –
> I seem to be stuck here trying to convert downloaded data that is in tabular form since I need to analyze it in single column format. The data is from here and I’m using read_csv as shown below –
>  
> pnadata1= pd.read_csv('https://www.esrl.noaa.gov/psd/data/correlation/pna.data')
>  
> The downloaded data looks like this with the values in months 1-12 across the columns below for each year 1948-present –
>  
>                                     1948        2020
> 0   1948 -99.90 -99.90 -99.90 -99.90 -99.90 -99.9...
> 1   1949 -99.90 -99.90 -99.90 -99.90 -99.90 -99.9...
> 2   1950  -3.65  -1.69  -0.06  -0.23  -0.40  -1.9...
> 3   1951  -1.18  -2.11  -1.09   0.47   1.19  -0.5...
> 4   1952  -3.19   0.08  -0.69   0.89  -0.79   0.5...
>  
> And, I need it to be reshaped to look like this in yyyymm values format so I can compare it easily to other downloaded data in the column format –
>  
> 194801 -99.90
> 194802 -99.90
> 194803 -99.90
> ...
> 202001 -3.45
> 202002  0.87
>  
> I’ve tried working with “melt” and “group” in different combinations but cannot figure it out given the online references. Thank you,
> Randall
>  
>  
> <image001.jpg>
> Randall P. Benson, PhD
> Wind Asset Meteorologist, Energy Resource - Onshore
>
> 1125 NW Couch Street, Suite 700
> Portland, Oregon, USA 97209
> Telephone  503-796-7129
> Cell  971-227-2477
> [hidden email]
>
> <image002.png>
>
> In the interest of the environment,
> please print only if necessary and recycle.
>  
>  
>  
> ==============================================================
>
> Please consider the environment before printing this email.
>
> If you have received this message in error, please notify the sender and immediately delete this message and any attachment hereto and/or copy hereof, as such message contains confidential information intended solely for the individual or entity to whom it is addressed. The use or disclosure of such information to third parties is prohibited by law and may give rise to civil or criminal liability.
>
> The views presented in this message are solely those of the author(s) and do not necessarily represent the opinion of Avangrid Renewables, LLC. or any company of its group. Neither Avangrid Renewables, LLC. nor any company of its group guarantees the integrity, security or proper receipt of this message. Likewise, neither Avangrid Renewables, LLC. nor any company of its group accepts any liability whatsoever for any possible damages arising from, or in connection with, data interception, software viruses or manipulation by third parties.
>
>  ==============================================================
>
> _______________________________________________
> Matplotlib-users mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/matplotlib-users

_______________________________________________
Matplotlib-users mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/matplotlib-users
Reply | Threaded
Open this post in threaded view
|

Re: Reshaping Tabular Data to Column Data

matplotlib - users mailing list
Here's a version without the unnecessary string manipulations

# read data from noaa, set the index to the first column
df = pd.read_csv('https://www.esrl.noaa.gov/psd/data/correlation/pna.data',
                 skiprows=1,
                 skipfooter=3,
                 index_col=[0],
                 header=None,
                 engine='python', # c engine doesn't have skipfooter
                 delim_whitespace=True)
# unstack the dataframe to get a Series with a multiindex
# consisting of the column name and date
df = df.unstack()
# create a new index using the existing multiindex levels
ndx = [y * 100 + c
       for (c, y) in df.index.to_flat_index()]
df.index = ndx
df = df.sort_index()



> On Mar 24, 2020, at 9:28 PM, Scott Lasley via Matplotlib-users <[hidden email]> wrote:
>
> Does this do what you want?
>
> # read data from noaa, set index to first column and column names to '00' - '12'
> df = pd.read_csv('https://www.esrl.noaa.gov/psd/data/correlation/pna.data',
>                 skiprows=1,
>                 skipfooter=3,
>                 index_col=[0],
>                 header=None,
>                 names=['{:02d}'.format(i) for i in range(13)],
>                 delim_whitespace=True)
> # unstack the dataframe to get a Series with  and a multiindex
> # consisting of the date and column name
> df = df.unstack()
> # sort index by [date, column name] to "group" the dates
> df = df.sort_index(level=[1,0])
> # create a new index using the existing multiindex levels
> ndx = [int(''.join([y,c]))
>       for y in df.index.levels[1].astype(str)
>       for c in df.index.levels[0]]
> df.index = ndx
> df.head(28)
>
> 194801   -99.90
> 194802   -99.90
> 194803   -99.90
> ...
> 194911   -99.90
> 194912   -99.90
> 195001    -3.65
> 195002    -1.69
> 195003    -0.06
> 195004    -0.23
>
> df.tail(12)
>
> 202001    -0.95
> 202002    -0.07
> 202003   -99.90
> 202004   -99.90
> 202005   -99.90
> 202006   -99.90
> 202007   -99.90
> 202008   -99.90
> 202009   -99.90
> 202010   -99.90
> 202011   -99.90
> 202012   -99.90
>
> Note that the 202001 and 202002 values do not agree with the values in your email message but do agree with the values on the website.
>
> Hth,
> Scott
>
>
>> On Mar 24, 2020, at 6:12 PM, Benson, Randall <[hidden email]> wrote:
>>
>> Hi –
>> I seem to be stuck here trying to convert downloaded data that is in tabular form since I need to analyze it in single column format. The data is from here and I’m using read_csv as shown below –
>>
>> pnadata1= pd.read_csv('https://www.esrl.noaa.gov/psd/data/correlation/pna.data')
>>
>> The downloaded data looks like this with the values in months 1-12 across the columns below for each year 1948-present –
>>
>>                                    1948        2020
>> 0   1948 -99.90 -99.90 -99.90 -99.90 -99.90 -99.9...
>> 1   1949 -99.90 -99.90 -99.90 -99.90 -99.90 -99.9...
>> 2   1950  -3.65  -1.69  -0.06  -0.23  -0.40  -1.9...
>> 3   1951  -1.18  -2.11  -1.09   0.47   1.19  -0.5...
>> 4   1952  -3.19   0.08  -0.69   0.89  -0.79   0.5...
>>
>> And, I need it to be reshaped to look like this in yyyymm values format so I can compare it easily to other downloaded data in the column format –
>>
>> 194801 -99.90
>> 194802 -99.90
>> 194803 -99.90
>> ...
>> 202001 -3.45
>> 202002  0.87
>>
>> I’ve tried working with “melt” and “group” in different combinations but cannot figure it out given the online references. Thank you,
>> Randall
>>
>>
>> <image001.jpg>
>> Randall P. Benson, PhD
>> Wind Asset Meteorologist, Energy Resource - Onshore
>>
>> 1125 NW Couch Street, Suite 700
>> Portland, Oregon, USA 97209
>> Telephone  503-796-7129
>> Cell  971-227-2477
>> [hidden email]
>>
>> <image002.png>
>>
>> In the interest of the environment,
>> please print only if necessary and recycle.
>>
>>
>>
>> ==============================================================
>>
>> Please consider the environment before printing this email.
>>
>> If you have received this message in error, please notify the sender and immediately delete this message and any attachment hereto and/or copy hereof, as such message contains confidential information intended solely for the individual or entity to whom it is addressed. The use or disclosure of such information to third parties is prohibited by law and may give rise to civil or criminal liability.
>>
>> The views presented in this message are solely those of the author(s) and do not necessarily represent the opinion of Avangrid Renewables, LLC. or any company of its group. Neither Avangrid Renewables, LLC. nor any company of its group guarantees the integrity, security or proper receipt of this message. Likewise, neither Avangrid Renewables, LLC. nor any company of its group accepts any liability whatsoever for any possible damages arising from, or in connection with, data interception, software viruses or manipulation by third parties.
>>
>> ==============================================================
>>
>> _______________________________________________
>> Matplotlib-users mailing list
>> [hidden email]
>> https://mail.python.org/mailman/listinfo/matplotlib-users
>
> _______________________________________________
> Matplotlib-users mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/matplotlib-users

_______________________________________________
Matplotlib-users mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/matplotlib-users