标签:muse nod libs ubunt 声明 count cluster bpa sea
srun: fatal: ../../../src/api/step_launch.c:1038 step_launch_state_destroy: pthread_mutex_destroy(): Device or resource busy
slurm.pl: 1 / 40 failed, log is in projects/seame/exp/mono0a/log/acc.9.*.log
$ apt-get update
$ apt-get install git gcc make ruby ruby-dev libpam0g-dev libmariadb-client-lgpl-dev libmysqlclient-dev
$ gem install fpm
$ cd /storage
$ git clone https://github.com/mknoxnv/ubuntu-slurm.git
$ cd /storage
$ wget https://download.schedmd.com/slurm/slurm-19.05.1-2.tar.bz2
$ tar xvjf slurm-19.05.1-2.tar.bz2
$ cd slurm-19.05.1-2
$ ./configure --prefix=/tmp/slurm-build --sysconfdir=/etc/slurm --enable-pam --with-pam_dir=/lib/x86_64-linux-gnu/security/ --without-shared-libslurm
$ make
$ make contrib
$ make install
此处实际上并没有安装成功,需要对之前安装好的目录/tmp/slurm-biuld使用fpm打包,再用dpkg安装。
$ cd ..
$ fpm -s dir -t deb -v 1.0 -n slurm-19.05.1 --prefix=/usr -C /tmp/slurm-build .
$ dpkg -i slurm-19.05.1_1.0_amd64.deb
$ useradd slurm
$ mkdir -p /etc/slurm /etc/slurm/prolog.d /etc/slurm/epilog.d /var/spool/slurm/ctld /var/spool/slurm/d /var/log/slurm
$ chown slurm /var/spool/slurm/ctld /var/spool/slurm/d /var/log/slurm
$ cd /storage
$ cp ubuntu-slurm/slurmctld.service /etc/systemd/system/
$ cp ubuntu-slurm/slurmd.service /etc/systemd/system/
$ cp ubuntu-slurm/slurm.conf /etc/slurm
enable service and start service.
$ systemctl daemon-reload
$ systemctl enable slurmctld
$ systemctl start slurmctld
$ systemctl enable slurmd
$ systemctl start slurmd
在此处一般无法正常开启服务,很大原因是配置文件没有完成,或是参数没有配置对。请看下一部分描述配置文件相关配置、配置写法,以下的参数并不完全。
ClusterName=compute-cluster
ControlMachine=dell-PowerEdge-T630
SlurmUser=slurm
SlurmctldPort=6817
SlurmdPort=6818
AuthType=auth/munge
StateSaveLocation=/var/spool/slurm/ctld
SlurmdSpoolDir=/var/spool/slurm/d
SwitchType=switch/none
MpiDefault=none
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmdPidFile=/var/run/slurmd.pid
PluginDir=/usr/lib/slurm
ReturnToService=1
Proctracktype=proctrack/linuxproc
CacheGroups=0
ReturnToService=2
SelectType=select/cons_res
SelectTypeParameters=CR_Core,CR_ONE_TASK_PER_CORE,CR_CORE_DEFAULT_DIST_BLOCK
TaskPlugin=task/affinity
TaskPluginParam=Sched
KillOnBadExit=1
# TIMERS
SlurmctldTimeout=300
SlurmdTimeout=300
InactiveLimit=0
MinJobAge=300
KillWait=30
Waittime=0
# SCHEDULING
SchedulerType=sched/backfill
FastSchedule=1
# LOGGING
SlurmctldDebug=3
SlurmctldLogFile=/var/log/slurmctld.log
SlurmdDebug=3
SlurmdLogFile=/var/log/slurmd.log
JobCompType=jobcomp/none
# ACCOUNTING
JobAcctGatherType=jobacct_gather/linux
# COMPUTE NODES
GresTypes=gpu
NodeName=dell-PowerEdge-T630 CPUs=56 Sockets=2 CoresPerSocket=14 ThreadsPerCore=2 State=UNKNOWN Gres=gpu:4
PartitionName=mipitalk Nodes=ALL Default=YES MaxTime=INFINITE State=UP
标签:muse nod libs ubunt 声明 count cluster bpa sea
原文地址:https://www.cnblogs.com/hallboo/p/11203391.html