标签:href optimizer port 代码 -- turn normal float ams
import torch
import numpy as np
import sys
sys.path.append('..')
import d2lzh_pytorch as d2l
我们仍然使用Fashion_MNIST数据集,使用多层感知机对图像进行分类
batch_size = 256
train_iter,test_iter = d2l.get_fahsion_mnist(batch_size)
Fashion_MNIST数据集汇总的图形的形状为28x28,类别数为10,本节我们依然使用长度为28x28=784的向量表示一张图像,因此输入个数为784,输出个数为10。设置超参数隐藏单元个数为256。
num_inputs, num_outputs,num_hiddens = 784,10,256
W1 = torch.tensor(np.random.normal(0,0.01,(num_inputs,num_hiddens)),dtype=torch.float32
)
b1 = torch.zeros(num_hiddens,dtype=torch.float32)
W2 = torch.tensor(np.random.normal(0,0.01,(num_hiddens,num_outputs)),dtype=torch.float32
)
b2 = torch.zeros(num_outputs,dtype=torch.float32)
params = [W1,b1,W2,b2]
for param in params:
param.requires_grad_(requires_grad=True)
我们使用max函数来实现ReLU,不是直接调用relu函数
def relu(X):
return torch.max(input=X, other=torch.tensor(0.0))
同softmax回归一样,我们通过view函数将每张原始的图像改成长度为num_inputs的向量。然后我们将实现上一节中多层感知机的计算表达式
def net(X):
X = X.view((-1, num_inputs))
H = relu(torch.matmul(X, W1) + b1)
return torch.matmul(H, W2) + b2
def sgd(params,lr,batch_size):
for param in params:
# param.data -=lr* param.grad/batch_size
param.data-= lr* param.grad # 计算loss使用的是pytorch的交叉熵
# 这个梯度可以不用除以batch_size,pytorch 在计算loss的时候已经除过一次了,
'''
mxnet中的softmaxCrossEntropyLoss在反向传播的时候相对于延batch维度求和,
而pytorch默认的是求平均,所以用pytorch计算得到的loss也比mxnet小很多
大概得到的mxnet计算得到的1/batch_sie这个量级的,所以反向传播得到的梯度也小得多
为了得到跟原书差不多的效果,应该吧学习率调成batch_size倍,原书的学习率为0.5,
设置是100,
pytorch在计算loss的时候已经除过一次了,这里个的sgd不用除了
'''
loss = torch.nn.CrossEntropyLoss()
def evaluate_accuracy(data_iter, net):
acc_sum, n = 0.0, 0
for X, y in data_iter:
acc_sum += (net(X).argmax(dim=1) == y).float().sum().item()
n += y.shape[0]
return acc_sum / n
def train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size,
params=None, lr=None, optimizer=None):
for epoch in range(num_epochs):
train_l_sum, train_acc_sum, n = 0.0, 0.0, 0
for X, y in train_iter:
y_hat = net(X)
l = loss(y_hat, y).sum()
# 梯度清零
if optimizer is not None:
optimizer.zero_grad()
elif params is not None and params[0].grad is not None:
for param in params:
param.grad.data.zero_()
l.backward()
if optimizer is None:
sgd(params, lr, batch_size)
else:
optimizer.step() # “softmax回归的简洁实现”一节将用到
train_l_sum += l.item()
train_acc_sum += (y_hat.argmax(dim=1) == y).sum().item()
n += y.shape[0]
test_acc = evaluate_accuracy(test_iter, net)
print('epoch %d, loss %.4f, train acc %.3f, test acc %.3f'
% (epoch + 1, train_l_sum / n, train_acc_sum / n, test_acc))
num_epochs, lr = 5, 0.5
train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size, params, lr)
epoch 1, loss 0.0031, train acc 0.702, test acc 0.775
epoch 2, loss 0.0019, train acc 0.821, test acc 0.807
epoch 3, loss 0.0016, train acc 0.843, test acc 0.831
epoch 4, loss 0.0015, train acc 0.855, test acc 0.818
epoch 5, loss 0.0014, train acc 0.863, test acc 0.816
标签:href optimizer port 代码 -- turn normal float ams
原文地址:https://www.cnblogs.com/onemorepoint/p/11811339.html