13 Apply Method

11 Oct 2017 | 6 Minute Read on Pandas

Apply 함수¶

Apply 메소드는 DataFrame의 rows 나 columns(즉 1차원 어레이)을 인풋으로 하여 집합(그룹함수)를 처리할 수 있게 해 준다. Apply docs. 참조
- 기본적으로 DataFrame의 축의 방향에 따라서 함수에 적용되는 인풋이 결정됨(즉. 열이나 행을 파라미터 값을 전달)(Applies function along input axis of DataFrame.)
- 아웃풋은 적용되는 함수(집합함수)에 결정됨( Return type depends on whether passed function aggregates)
- 참고로 R의 경우에는 apply() 함수는 배열 또는 행렬에 주어진 함수를 적용한 뒤 그 결과를 벡터, 배열 또는 리스트로 반환

In [1]:

import pandas as pd
import numpy as np
#data  = np.random.randn(12).reshape((3, 4))
data  = np.arange(12).reshape((-1, 4))
columns = ['Arizona','Boston', 'Chicago','Detroit']
df = pd.DataFrame(data = data, columns = columns)
print(df)

   Arizona  Boston  Chicago  Detroit
0        0       1        2        3
1        4       5        6        7
2        8       9       10       11

In [2]:

# apply가 DataFrame이 row의 array에 적용됨
df.apply(lambda x: max(x) - min(x))

Out[2]:

Arizona    8
Boston     8
Chicago    8
Detroit    8
dtype: int64

In [3]:

# axis = 1를 사용함으로 columns의 array에 적용됨
df.apply(lambda x: max(x) - min(x), axis = 1)

Out[3]:

0    3
1    3
2    3
dtype: int64

In [4]:

print(df.apply(lambda x:x + 1)) # 모든 rows 에 적용
print(df.apply(lambda rows:rows[0] + 1))   # 첫번째 row에만 적용  
print(df.apply(lambda columns:columns[len(df.columns) - 1] + 1, axis =1))  # 마지막  columns 에만 적용

   Arizona  Boston  Chicago  Detroit
0        1       2        3        4
1        5       6        7        8
2        9      10       11       12
Arizona    1
Boston     2
Chicago    3
Detroit    4
dtype: int64
0     4
1     8
2    12
dtype: int64

In [5]:

# row 또는 columns별로 합계를 계산
def add_all(row):
    total = 0
    for cell in row:
        total += cell
    return total

print(df.apply(add_all))
df.apply(add_all, axis = 1)

Arizona    12
Boston     15
Chicago    18
Detroit    21
dtype: int64

Out[5]:

0     6
1    22
2    38
dtype: int64

In [6]:

# DataFrame을 컬럼을 선택하여 범위를 정할 수 있음
df[['Boston', 'Chicago']].apply(add_all, axis = 1)

Out[6]:

0     3
1    11
2    19
dtype: int64

In [7]:

# 결과 값으로  Array를 반환
def add_plus_one(row):return [cell + 1  for cell in row]
df.apply(add_plus_one)

Out[7]:

	Arizona	Boston	Chicago	Detroit
0	1	2	3	4
1	5	6	7	8
2	9	10	11	12

In [8]:

# row나 columns을 인풋으로 하여 사용자 함수에서 부분집합을 계산함
def add_two(x):    
    return x[0] + x[1]  
df.apply(add_two, axis = 1)

Out[8]:

0     1
1     9
2    17
dtype: int64

In [9]:

# lambda를 사용하여 row나 columns 부분 집합을 인풋으로 사용함
def add_xy(x, y):
    return x + y
df.apply(lambda row: add_xy(row['Arizona'], row['Chicago']), axis=1)

Out[9]:

0     2
1    10
2    18
dtype: int64