标签:int layer learning parameter mod group logging time() sel
import logging
from logzero import logger
logzero.loglevel(logging.DEBUG)
logdir = os.path.join(args.output_dir, "logs")
os.makedirs(logdir, exist_ok=True)
logzero.logfile(os.path.join(logdir, f"bert_{int(time.time())}.log"))
param_optimizer = list(self.model.named_parameters())
no_decay = ['bias', 'LayerNorm.bias', 'LayerNorm.weight']
optimizer_grouped_parameters = [
{'params': [p for n, p in param_optimizer if not any(nd in n for nd in no_decay)], 'weight_decay': 0.01},
{'params': [p for n, p in param_optimizer if any(nd in n for nd in no_decay)], 'weight_decay': 0.0}
]
optimizer.zero_grad()
loss, hidden = model(data, hidden, targets)
loss.backward()
torch.nn.utils.clip_grad_norm(model.parameters(), args.clip)
optimizer.step()
标签:int layer learning parameter mod group logging time() sel
原文地址:https://www.cnblogs.com/JohnRain/p/11615395.html