标签:poi ssi flags ref step size RKE dir pytho
原因:还是因为MonitoredTrainingSession中没有指定:master=server.target,添加之后就可以正常运行了。
with tf.train.MonitoredTrainingSession( master=server.target, is_chief=is_chief, checkpoint_dir=checkpoint_dir, save_checkpoint_secs=FLAGS.save_interval_secs, save_summaries_steps=100, save_summaries_secs=None, config=sess_config, hooks=hooks) as sess:
但是还会报一次tensorflow:Waiting for model to be ready. Ready_for_local_init_op: Variables not initialized
此时,可以让非worker 0 sleep 5秒
time.sleep(5)
参考: https://stackoverflow.com/questions/42397370/distributed-tensorflow-save-fails-no-device
标签:poi ssi flags ref step size RKE dir pytho
原文地址:https://www.cnblogs.com/lixiaolun/p/9787063.html