码迷,mamicode.com
首页 > 其他好文 > 详细

解决sklearn 随机森林数据不平衡的方法

时间:2019-02-27 10:24:53      阅读:1012      评论:0      收藏:0      [点我收藏+]

标签:other   main   nal   not   creat   ica   stat   obj   ace   

Handle Imbalanced Classes In Random Forest

 

Preliminaries

# Load libraries
from sklearn.ensemble import RandomForestClassifier
import numpy as np
from sklearn import datasets

Load Iris Flower Dataset

# Load data
iris = datasets.load_iris()
X = iris.data
y = iris.target

Adjust Iris Dataset To Make Classes Imbalanced

# Make class highly imbalanced by removing first 40 observations
X = X[40:,:]
y = y[40:]

# Create target vector indicating if class 0, otherwise 1
y = np.where((y == 0), 0, 1)

Train Random Forest While Balancing Classes

When using RandomForestClassifier a useful setting is class_weight=balanced wherein classes are automatically weighted inversely proportional to how frequently they appear in the data. Specifically:

wj=n/knj

where wj is the weight to class jnn is the number of observations, nj is the number of observations in class j, and k is the total number of classes.

# Create decision tree classifer object
clf = RandomForestClassifier(random_state=0, n_jobs=-1, class_weight="balanced")

# Train model
model = clf.fit(X, y)

https://chrisalbon.com/machine_learning/trees_and_forests/handle_imbalanced_classes_in_random_forests/



类别不平衡处理方法:
https://segmentfault.com/a/1190000015248984

解决sklearn 随机森林数据不平衡的方法

标签:other   main   nal   not   creat   ica   stat   obj   ace   

原文地址:https://www.cnblogs.com/Allen-rg/p/10441792.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!