标签:comm rac standards 训练 机器学习算法 for spl 细节问题 scale
1. 思考极度简单 2. 应用数学只是少 3. 效果好 4. 可以解释机器学习算法使用过程中的很多细节问题 5. 更完整的刻画机器学习应用的流程 6. kNN 算法可以被认为是没有模型的算法 7. 也可以认为训练集本身就是模型本身
1. 效率低下,m个样本,n个特征,计算每一个新数据都需要O(n*m) 2. 高度数据相关性 3. 预测结果不具有可解释性 4. 维数灾难,随着维数增加,看着相近的两个点距离越来越大
from sklearn.neighbors import KNeighborsClassifier from sklearn.model_selection import train_test_split from sklearn import datasets # 生成训练数据和测试数据 iris = datasets.load_iris() x = iris.data y = iris.target trainX,testX,trainY,testY = train_test_split(x,y,test_size=0.2) # 流程 Knn = KNeighborsClassifier(n_neighbors=3) Knn.fit(trainX,trainY) Knn.predict(testX)
class KNeighborsClassifier:
def __init__(self,k):
"""初始化分类器"""
assert k>=1,"k must be valid"
self._k = k
self._trainX = None
self._trainY = None
def __repr__(self):
return "Knn=%s"%(self._k)
def fit(self,trainX,trainY):
"""训练数据集"""
self._trainX = trainX
self._trainY = trainY
return self
def _predict(self,x):
"""给单个预测数据x,返回预测结果"""
distances = [np.sqrt(np.sum(np.power((train_x-x),2))) for train_x in self._trainX]
index = np.argsort(distances)[:self._k]
pred = [self._trainY[i] for i in index]
cls = Counter(pred).most_common(1)
return cls[0][0]
def predict(self,testX):
"""给定待测数据集,返回结果向量"""
assert self._trainX is not None and self._trainY is not None,"must fit before predict"
predictY = [self._predict(i) for i in testX]
return np.array(predictY)
Knn = KNeighborsClassifier(3) Knn.fit(trainX,trainY) y_predict = Knn.predict(testX) # 测量准确度 acc = np.sum(testY == y_predict)/len(testY) from sklearn.metrics import accuracy_score accuracy_score(y_predict,testY)
超参数:在算法运行前需要决定的参数 模型参数:算法过程中学习的参数 Knn算法没有模型参数 Knn中的k是典型的超参数
如何寻找好的超参数
1. 领域知识 2. 经验数值 3. 实验搜索

当p的值为1时,则是曼哈顿距离,当p为2时则是欧拉距离
最值归一化:把所有数据映射到0-1之间,适用于有明显分布边界的情况 import numpy as np x = np.random.randint(0,100,size=100) s = np.divide((x-np.min(x)),np.subtract(np.max(x),np.min(x))) s
均值方差归一化:把所有数据归一到均值为0方差为1的分布中,适用于数据没有明显分布边界,可能存在极端数据值 import numpy as np x = np.random.randint(0,100,size=100) s = np.divide(np.subtract(x,np.mean(x)),np.std(x)) print(s)

from sklearn.preprocessing import StandardScaler standardScaler = StandardScaler() standardScaler.fit(trainX) trainX = standardScaler.transform(trainX) testX = standardScaler.transform(testX)
class StandardScaler:
def __init__(self):
self.mean_ = None
self.var_ = None
def fit(self,X):
"""根据训练集获得训练集均值和方差"""
assert X.ndim == 2, "The dimension of X must be 2"
self.mean_ = np.mean(X,axis=0)
self.var_ = np.std(X,axis=0)
return self
def transform(self,X):
"""进行均值方差归一化处理"""
assert X.ndim == 2, "The dimension of X must be 2"
assert self.mean_ is not None and self.var_ is not None,"must fit before transform"
assert X.shape[1]==len(self.mean_),"must be equal "
reX = np.empty(X.shape,dtype=np.float32)
for col in range(X.shape[1]):
reX[:,col] = (X[:,col]-self.mean_[col])/self.var_[col]
return reX
标签:comm rac standards 训练 机器学习算法 for spl 细节问题 scale
原文地址:https://www.cnblogs.com/zenan/p/9253702.html