15 MultiIndex

MultiIndex

  • MultiIndex는 A multi-level, or hierarchical, index object로 정의된다.MultiIndex Docs
  • MultiIndex 중요 세요소로는
    • levels : 계층에 대한 이름(The unique labels for each level)
    • labels : 각 이름별 계층 위치 (Integers for each level designating which label at each location)
    • names : 레벨에 대한 이름( Names for each of the index levels.)
  • MultiIndex 생성하는 메소드는 다음과 같음
    • from_arrays(arrays[, sortorder, names])
    • from_tuples(tuples[, sortorder, names])
    • from_product(iterables[, sortorder, names])
  • unstack()을 사용하여 columns을 index의 변경하여 MultiIndex를 생성할 수 있음 .
In [1]:
import pandas as pd
import numpy as np

arrays = [['Arizona','Boston','Chicago','Detroit', 'Arizona','Boston','Chicago','Detroit']
         ,['First','Second','First','Second','First','Second','First','Second']]
In [2]:
# from_arrays로 MulitIndex 
index = pd.MultiIndex.from_arrays(arrays, names=('Team','Season'))
#level, label, names의 3요소가 생성이 됨
index
Out[2]:
MultiIndex(levels=[['Arizona', 'Boston', 'Chicago', 'Detroit'], ['First', 'Second']],
           labels=[[0, 1, 2, 3, 0, 1, 2, 3], [0, 1, 0, 1, 0, 1, 0, 1]],
           names=['Team', 'Season'])
In [3]:
# MulitIndex로 Series를 생성함
srz = pd.Series(np.random.randn(8), index=index)
srz
Out[3]:
Team     Season
Arizona  First     0.621056
Boston   Second    0.109952
Chicago  First     0.205089
Detroit  Second    0.000748
Arizona  First     1.037440
Boston   Second   -1.348015
Chicago  First     1.217670
Detroit  Second   -0.118575
dtype: float64
In [4]:
#MulitIndex로 DataFrame을 생성함
df = pd.DataFrame(np.random.randn(8, 2), index=index)
df
Out[4]:
01
TeamSeason
ArizonaFirst1.3045400.906655
BostonSecond0.848561-1.176608
ChicagoFirst-1.1935410.129032
DetroitSecond0.909402-1.267693
ArizonaFirst0.612744-1.042755
BostonSecond0.105090-0.947009
ChicagoFirst0.906220-0.760342
DetroitSecond0.5861930.372422
In [5]:
# from_tuples로 MultiIndex 객채를 생성
tuples = list(zip(*arrays))  #* unpack operator
print(tuples)
index = pd.MultiIndex.from_tuples(tuples, names=['Team','Season'])
index
[('Arizona', 'First'), ('Boston', 'Second'), ('Chicago', 'First'), ('Detroit', 'Second'), ('Arizona', 'First'), ('Boston', 'Second'), ('Chicago', 'First'), ('Detroit', 'Second')]
Out[5]:
MultiIndex(levels=[['Arizona', 'Boston', 'Chicago', 'Detroit'], ['First', 'Second']],
           labels=[[0, 1, 2, 3, 0, 1, 2, 3], [0, 1, 0, 1, 0, 1, 0, 1]],
           names=['Team', 'Season'])
In [11]:
# from_product로  MultiIndex 객채를 생성
Team = ['Arizona', 'Boston', 'Chicago', 'Detroit']
Season = ['First', 'Second']
index = pd.MultiIndex.from_product([Team, Season ], names=['Team','Season'])
index
Out[11]:
MultiIndex(levels=[['Arizona', 'Boston', 'Chicago', 'Detroit'], ['First', 'Second']],
           labels=[[0, 0, 1, 1, 2, 2, 3, 3], [0, 1, 0, 1, 0, 1, 0, 1]],
           names=['Team', 'Season'])
In [7]:
import pandas as pd
import numpy as np
data  = np.random.randn(3, 4)
index = [2002, 2003, 2004 ]
columns = ['Apple','Lemon', 'Orange','Tomato']
df = pd.DataFrame(data = data, index = index, columns = columns)
df
Out[7]:
AppleLemonOrangeTomato
20021.3950250.486376-0.647918-0.831599
20030.5612232.0935700.605734-0.329335
20040.744365-1.1859491.210367-0.706483
In [8]:
#unstack()을 통해서 DataFrame의 Columns이 index로 변경됨
df.unstack()
Out[8]:
Apple   2002    1.395025
        2003    0.561223
        2004    0.744365
Lemon   2002    0.486376
        2003    2.093570
        2004   -1.185949
Orange  2002   -0.647918
        2003    0.605734
        2004    1.210367
Tomato  2002   -0.831599
        2003   -0.329335
        2004   -0.706483
dtype: float64
In [9]:
df = df.unstack()
In [10]:
print(type(df))
df.index
<class 'pandas.core.series.Series'>
Out[10]:
MultiIndex(levels=[['Apple', 'Lemon', 'Orange', 'Tomato'], [2002, 2003, 2004]],
           labels=[[0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3], [0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2]])


© 2017. All rights reserved.

Powered by ZooFighter v0.12