Dummy Classifier

IT/파이썬

Dummy Classifier

sarah0518 2020. 12. 23. 10:47

728x90

기본적으로 dummy classifier는 모델 성능을 비교할 때,

base line을 명시해 줌으로써

어느 정도의 성능이 나오는지 비교하기 위해 사용합니다.

https://goleansixsigma.com/baseline-measures/

DummyClassifier 라이브러리를 import 하기 전에,

우선 설명력을 쉽게 print할 수 있는 코드를 먼저 함수로 정의해 볼게요.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

from sklearn.model_selection import cross_val_score
def cv_recall(model, x, y):
    scores = cross_val_score(model, x, y, cv=5, scoring='recall'); scores
    #print("Mean: {:.3f}\nStd: {:.3f}\nMin: {:.3f}\nMax: {:.3f}".format(scores.mean(), scores.std(), scores.min(), scores.max()))
    print("Recall: {:.3f}".format(scores.mean()))
    
def cv_f1(model, x, y):
    scores = cross_val_score(model, x, y, cv=5, scoring='f1'); scores
    #print("Mean: {:.3f}\nStd: {:.3f}\nMin: {:.3f}\nMax: {:.3f}".format(scores.mean(), scores.std(), scores.min(), scores.max()))
    print("f1: {:.3f}".format(scores.mean()))
    
def cv_precision(model, x, y):
    scores = cross_val_score(model, x, y, cv=5, scoring='precision'); scores
    #print("Mean: {:.3f}\nStd: {:.3f}\nMin: {:.3f}\nMax: {:.3f}".format(scores.mean(), scores.std(), scores.min(), scores.max()))
    print("Precision: {:.3f}".format(scores.mean()))
    
def cv_accuracy(model, x, y):
    scores = cross_val_score(model, x, y, cv=5, scoring='accuracy'); scores
    #print("Mean: {:.3f}\nStd: {:.3f}\nMin: {:.3f}\nMax: {:.3f}".format(scores.mean(), scores.std(), scores.min(), scores.max()))
    print("Accuracy: {:.3f}".format(scores.mean()))
    
def cv_roc(model, x, y):
    scores = cross_val_score(model, x, y, cv=5, scoring='roc_auc'); scores
    #print("Mean: {:.3f}\nStd: {:.3f}\nMin: {:.3f}\nMax: {:.3f}".format(scores.mean(), scores.std(), scores.min(), scores.max()))
    print("ROC: {:.3f}".format(scores.mean()))
Colored by Color Scripter

cs

Dummy Classifier를 작성하는 코드는 매우 간단합니다.

단, 제가 다루는 데이터는 imbalanced 데이터셋이라,

strategy에 "stratified"를 명시해줬습니다.

그리고 다음 분석때도 동일한 결과를 얻고자,

random_state도 임의의 값이 99로 명시해주었습니다.

1
2
3
4
5
6
7
8
9
10

from sklearn.dummy import DummyClassifier
dummy = DummyClassifier(strategy='stratified', random_state=99)
dummy.fit(x,y)
 
cv_recall(dummy, x, y ),
cv_f1(dummy, x, y ),
cv_precision(dummy, x, y ),
cv_accuracy(dummy, x, y ),
cv_roc(dummy, x, y )
 
Colored by Color Scripter

cs

[output]

Recall: 0.377
f1: 0.418
Precision: 0.480
Accuracy: 0.671
ROC: 0.605

보통 dummy classifier의 설명력은 매우 낮은 값으로 나타나는데요.

실제 모델링한 알고리즘들이 위의 base line값보다 더 크다면

어느정도의 신뢰성은 확보할 수 있다고 판단하시면 됩니다.

728x90

'IT > 파이썬' 카테고리의 다른 글

kkma를 활용한 word cloud 그리기 (0)	2021.01.03
Permutation Feature Importance(변수중요도)를 통한 feature selection (0)	2020.12.24
Lasso regression(라쏘 회귀분석) (0)	2020.12.22
GridSearchCV 그리드서치 1탄 (0)	2020.12.21
의사결정나무(Decision Tree) 그래프 그리기 (0)	2020.12.10

현재글Dummy Classifier

sarah0518