15 MultiIndex

11 Oct 2017 | 6 Minute Read on Pandas

MultiIndex¶

MultiIndex는 A multi-level, or hierarchical, index object로 정의된다.MultiIndex Docs
MultiIndex 중요 세요소로는
- levels : 계층에 대한 이름(The unique labels for each level)
- labels : 각 이름별 계층 위치 (Integers for each level designating which label at each location)
- names : 레벨에 대한 이름( Names for each of the index levels.)
MultiIndex 생성하는 메소드는 다음과 같음
- from_arrays(arrays[, sortorder, names])
- from_tuples(tuples[, sortorder, names])
- from_product(iterables[, sortorder, names])
unstack()을 사용하여 columns을 index의 변경하여 MultiIndex를 생성할 수 있음 .

In [1]:

import pandas as pd
import numpy as np

arrays = [['Arizona','Boston','Chicago','Detroit', 'Arizona','Boston','Chicago','Detroit']
         ,['First','Second','First','Second','First','Second','First','Second']]

In [2]:

# from_arrays로 MulitIndex 
index = pd.MultiIndex.from_arrays(arrays, names=('Team','Season'))
#level, label, names의 3요소가 생성이 됨
index

Out[2]:

MultiIndex(levels=[['Arizona', 'Boston', 'Chicago', 'Detroit'], ['First', 'Second']],
           labels=[[0, 1, 2, 3, 0, 1, 2, 3], [0, 1, 0, 1, 0, 1, 0, 1]],
           names=['Team', 'Season'])

In [3]:

# MulitIndex로 Series를 생성함
srz = pd.Series(np.random.randn(8), index=index)
srz

Out[3]:

Team     Season
Arizona  First     0.621056
Boston   Second    0.109952
Chicago  First     0.205089
Detroit  Second    0.000748
Arizona  First     1.037440
Boston   Second   -1.348015
Chicago  First     1.217670
Detroit  Second   -0.118575
dtype: float64

In [4]:

#MulitIndex로 DataFrame을 생성함
df = pd.DataFrame(np.random.randn(8, 2), index=index)
df

Out[4]:

		0	1
Team	Season
Arizona	First	1.304540	0.906655
Boston	Second	0.848561	-1.176608
Chicago	First	-1.193541	0.129032
Detroit	Second	0.909402	-1.267693
Arizona	First	0.612744	-1.042755
Boston	Second	0.105090	-0.947009
Chicago	First	0.906220	-0.760342
Detroit	Second	0.586193	0.372422

In [5]:

# from_tuples로 MultiIndex 객채를 생성
tuples = list(zip(*arrays))  #* unpack operator
print(tuples)
index = pd.MultiIndex.from_tuples(tuples, names=['Team','Season'])
index

[('Arizona', 'First'), ('Boston', 'Second'), ('Chicago', 'First'), ('Detroit', 'Second'), ('Arizona', 'First'), ('Boston', 'Second'), ('Chicago', 'First'), ('Detroit', 'Second')]

Out[5]:

MultiIndex(levels=[['Arizona', 'Boston', 'Chicago', 'Detroit'], ['First', 'Second']],
           labels=[[0, 1, 2, 3, 0, 1, 2, 3], [0, 1, 0, 1, 0, 1, 0, 1]],
           names=['Team', 'Season'])

In [11]:

# from_product로  MultiIndex 객채를 생성
Team = ['Arizona', 'Boston', 'Chicago', 'Detroit']
Season = ['First', 'Second']
index = pd.MultiIndex.from_product([Team, Season ], names=['Team','Season'])
index

Out[11]:

MultiIndex(levels=[['Arizona', 'Boston', 'Chicago', 'Detroit'], ['First', 'Second']],
           labels=[[0, 0, 1, 1, 2, 2, 3, 3], [0, 1, 0, 1, 0, 1, 0, 1]],
           names=['Team', 'Season'])

In [7]:

import pandas as pd
import numpy as np
data  = np.random.randn(3, 4)
index = [2002, 2003, 2004 ]
columns = ['Apple','Lemon', 'Orange','Tomato']
df = pd.DataFrame(data = data, index = index, columns = columns)
df

Out[7]:

	Apple	Lemon	Orange	Tomato
2002	1.395025	0.486376	-0.647918	-0.831599
2003	0.561223	2.093570	0.605734	-0.329335
2004	0.744365	-1.185949	1.210367	-0.706483

In [8]:

#unstack()을 통해서 DataFrame의 Columns이 index로 변경됨
df.unstack()

Out[8]:

Apple   2002    1.395025
        2003    0.561223
        2004    0.744365
Lemon   2002    0.486376
        2003    2.093570
        2004   -1.185949
Orange  2002   -0.647918
        2003    0.605734
        2004    1.210367
Tomato  2002   -0.831599
        2003   -0.329335
        2004   -0.706483
dtype: float64

In [9]:

df = df.unstack()

In [10]:

print(type(df))
df.index

<class 'pandas.core.series.Series'>

Out[10]:

MultiIndex(levels=[['Apple', 'Lemon', 'Orange', 'Tomato'], [2002, 2003, 2004]],
           labels=[[0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3], [0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2]])