在nm启动container的过程中,有一个步骤是把当前的tokens写入本地目录,默认情况下具体的调用的方法是在DefaultContainerExecutor类的startLocalizer 方法中:
public synchronized void startLocalizer (Path nmPrivateContainerTokensPath,
InetSocketAddress nmAddr, String user, String appId, String locId,
List<String> localDirs, List<String> logDirs)
throws IOException, InterruptedException {
ContainerLocalizer localizer =
new ContainerLocalizer( lfs, user, appId, locId, getPaths(localDirs),
RecordFactoryProvider.getRecordFactory(getConf()));
createUserLocalDirs(localDirs, user); //Initialize the local directories for a particular user,create $local.dir/usercache/$user and its immediate parent
createUserCacheDirs(localDirs, user); //Initialize the local cache directories for a particular user.$local.dir/usercache/$user,$local.dir/usercache/$user/appcache,$local.dir/usercache/$user/filecache
createAppDirs(localDirs, user, appId); //Initialize the local directories for a particular user.$local.dir/usercache/$user/appcache/$appi
createAppLogDirs(appId, logDirs); //Create application log directories on all disks.create $log.dir/$appid
// TODO : Why pick first app dir. The same in LCE why not random?
Path appStorageDir = getFirstApplicationDir (localDirs, user, appId);
String tokenFn = String.format(ContainerLocalizer.TOKEN_FILE_NAME_FMT, locId);
Path tokenDst = new Path (appStorageDir, tokenFn);
lfs.util().copy(nmPrivateContainerTokensPath, tokenDst);
LOG.info( "Copying from " + nmPrivateContainerTokensPath + " to " + tokenDst);
lfs.setWorkingDirectory(appStorageDir);
LOG.info( "CWD set to " + appStorageDir + " = " + lfs.getWorkingDirectory());
// TODO : DO it over RPC for maintaining similarity?
localizer.runLocalization(nmAddr);
}主要注意 getFirstApplicationDir (localDirs, user, appId)这一段,先生成token文件的名称,然后调用copy的操作把具体的token文件cp到yarn的本地工作目录。
这里getFirstApplicationDir 方法,传入的第一个参数是yarn写临时数据的目录,和
yarn.nodemanager.local-dirs(List of directories to store localized files in.)
相关
private Path getFirstApplicationDir (List<String> localDirs, String user,
String appId) {
return getApplicationDir( new Path(localDirs.get(0)), user, appId);
}而这里使用了localDirs.get(0),再来看下localDirs的生成:
localDirs的获取定义在ResourceLocalizationService内部类LocalizerRunner类的run方法中:
private LocalDirsHandlerService dirsHandler; .... List<String> localDirs = dirsHandler.getLocalDirs(); List<String> logDirs = dirsHandler.getLogDirs();
调用LocalDirsHandlerService 类:
/** Local dirs to store localized files in */ private DirectoryCollection localDirs = null; /** storage for container logs*/ private DirectoryCollection logDirs = null; localDirs = new DirectoryCollection( validatePaths(conf.getTrimmedStrings(YarnConfiguration.NM_LOCAL_DIRS))); logDirs = new DirectoryCollection( validatePaths(conf.getTrimmedStrings(YarnConfiguration.NM_LOG_DIRS)));
这里localDirs 是通过解析yarn.nodemanager.local-dirs配置项的值获取的,因为配置项是一定的,这就导致得出的localDirs 一直是同一个List,从而导致写入token的目录一直是同一个目录,这其实是一个bug:
https://issues.apache.org/jira/browse/YARN-2566
导致在写入token文件时,所有的container的token都会写到同一个目录,解决的方法其实是使用了随机数的方式,具体可以看patch.
本文出自 “菜光光的博客” 博客,请务必保留此出处http://caiguangguang.blog.51cto.com/1652935/1585277
原文地址:http://caiguangguang.blog.51cto.com/1652935/1585277