标签:
最近工作中遇到一个问题,在集群上运行的任务有时候无法正常结束,或者无法正常启动。这会造成这批运行的任务无法正常结束运行,处于pending的状态,导致后面的任务无法正常启动。
该问题困扰我们项目已经有半年左右了,一直没有想到很好的解决办法。主要原因就是任务的状态只能在浏览器中看出,无法通过后台的日志或者数据库查询得到。在浏览器中,如果我们看到某个任务长时间没有运行时间和状态的变化,就可以把这个任务当做是“僵尸”任务,从而可以将该任务手动结束掉(kill)。
春节之后在网上看到一些有关爬虫的文章,里面提到过有一种爬虫就是模拟浏览器的行为(包括登录、点击等)去得到网页的数据,进而进行网页抓取,有用信息提取。于是我思考,我们项目的问题和浏览器的交互,只有几种情况,完全可以通过这种方式解决“僵尸”任务。经过一周左右的研究和一周断断续续的coding,终于将这个问题解决了,现在把解决问题的主要思路和关键技术难点写下来,希望一来可以加深自己的印象,二来可以帮助到需要的人。因为实现的任务比较单一,且实现过程比较仓促,code主要就是实现了一些功能,没有进行优化,也没有太参考什么编码规范,设计模式之类的。以后遇到更大的问题,再考虑这些吧。
技术要点:
(1)Python的package:selenium,用这个package,可以和浏览器进行交互,如打开某个浏览器(Chrome,FireFox等),登录需要验证的网站(输入用户名&密码),点击某个特定图标等等,下面是两个有关selenium的链接:
https://www.baidu.com/link?url=tTeJRPOMKX8noXyTa2YPgpaD6vVlGQ2-RVAfwRg4Yvm&wd=&eqid=acd0879a0043c2e9000000045741cd39
http://www.cnblogs.com/fnng/archive/2013/05/29/3106515.html
(2)selenium的PhantomJS,这是一个虚拟的浏览器,可以把它看成一个在后台运行的浏览器,用户看不到浏览器的页面,但其他的功能和普通浏览器基本一样,比如可以截图,点击某个图标,抓取网页信息等,之所以使用了这个用来模仿浏览器,是因为我们的server无法安装普通的浏览器,只能运行在终端模式下运行的程序;
http://phantomjs.org/
(3)xpath,这个是我编程中耗时最多的模块,主要原因有几个,一是元素定位有问题,网站是一秒钟刷新一次,上一秒获取到的元素下一秒就找不到了;二是相似元素太多,层级关系太复杂,用一般的相对路径去寻找,有可能找到一些不想要的元素,所以就造成了寻找元素过程的费时费力。下面是两个有关xpath的介绍,比较实用,特别是在网页爬虫方面(后面我还要专门介绍爬虫):
http://www.cnblogs.com/fdszlzl/archive/2009/06/02/1494836.html
http://www.ruanyifeng.com/blog/2009/07/xpath_path_expressions.html
以下是核心code,因为项目隐私的原因,把一些敏感的内容用*******代替。如果有什么问题,可以给我留言。
‘‘‘ command: python tools/webScrap/KillJobs.py -url=******** -screenShotPath=****** ‘‘‘ from selenium import webdriver from selenium.common.exceptions import NoSuchElementException from selenium.webdriver.common.keys import Keys import re import time import argparse import sys import os mailReceiver = [ *********, ******** ] ZOMBIE_JOB_LIST = {"list1": [], "list2": [], "list3":[]} def getMailReceiver(): receiver = ‘ ‘ for recv in mailReceiver: receiver = receiver + recv + ‘ ‘ return receiver def kill_zombie_jobs(screenShotPath, url): # browser = webdriver.PhantomJS() # Get local session of PhantomJS browser = webdriver.Firefox() # Get local session of Firefox browser.set_window_size(1600, 1000) targetUrl = "http://%s/#JOBS" %url print "url: ", targetUrl job_to_be_kill_indicate = 0 browser.get(targetUrl) # Load page userName = browser.find_elements_by_class_name("gwt-TextBox") password = browser.find_elements_by_class_name("gwt-PasswordTextBox") submitButton = browser.find_elements_by_class_name("gwt-Button") if len(userName) == 0 or len(password) == 0 or len(submitButton) == 0: print "error in open url: %s" %targetUrl browser.quit() return userName[0].send_keys("******") password[0].send_keys("*******") time.sleep(1) submitButton[0].click() time.sleep(4) sceen_shot_name = screenShotPath + "/Before_kill_jobs_screen_shot.png" browser.save_screenshot(sceen_shot_name) # get the Job_Name # job_name_pattern = "/html/body/div[2]/div[2]/div/div[4]/div/div[3]/div/div[4]/div/div[2]/div/div[2]/div/div/div/div[3]/table[2]/tbody/tr[1]/td/fieldset/table/tbody/tr/td/table/tbody/tr[1]/td/table/tbody[1]/tr[1]/td[1]/div" # jobs_name_pattern_0 = "//body/div[2]/div[2]/div/div[4]/div/div[3]/div/div[4]/div/div[2]/div/div[2]/div/div/div/div[3]/table[2]/tbody/tr[1]/td/fieldset/table/tbody/tr/td/table/tbody/tr/td/table/tbody/tr/td[@class=‘GJOFO-MDOC GJOFO-MDAD GJOFO-MDBD‘ or @class=‘GJOFO-MDOC GJOFO-MDAE GJOFO-MDBD‘]" jobs_name_pattern_0 = "//body/div[2]/div[2]/div/div[4]/div/div[3]/div/div[4]/div/div[2]/div/div[2]/div/div/div/div[3]/table[2]/tbody/tr[1]/td/fieldset/table/tbody/tr/td/table/tbody/tr/td/table/tbody/tr/td[1]" jobs_name_pattern = "//body/div[2]/div[2]/div/div[4]/div/div[3]/div/div[4]/div/div[2]/div/div[2]/div/div/div/div[3]/table[2]/tbody/tr[1]/td/fieldset/table/tbody/tr/td/table/tbody/tr/td/table/tbody/tr[Order]/td[1]" jobs_kill_pattern = "//body/div[2]/div[2]/div/div[4]/div/div[3]/div/div[4]/div/div[2]/div/div[2]/div/div/div/div[3]/table[2]/tbody/tr[1]/td/fieldset/table/tbody/tr/td/table/tbody/tr/td/table/tbody/tr[Order]/td[3]" jobs_duration_pattern = "//body/div[2]/div[2]/div/div[4]/div/div[3]/div/div[4]/div/div[2]/div/div[2]/div/div/div/div[3]/table[2]/tbody/tr[1]/td/fieldset/table/tbody/tr/td/table/tbody/tr/td/table/tbody/tr[Order]/td[5]" for i in range(1, 4): tmp_list = "list"+str(i) job_name_elements_list = browser.find_elements_by_xpath(jobs_name_pattern_0) job_length = len(job_name_elements_list) for index in range(1, job_length+1): job_name_pattern = jobs_name_pattern.replace("Order", str(index)) job_duration_pattern = jobs_duration_pattern.replace("Order", str(index)) job_name = get_element_name(browser, job_name_pattern) job_duration_time = durationTime(get_element_name(browser, job_duration_pattern)) if len(job_name) > 10 and durationTime == 0: ZOMBIE_JOB_LIST[tmp_list].append(job_name) time.sleep(60) print "zombie job list: ", ZOMBIE_JOB_LIST job_name_elements_list = browser.find_elements_by_xpath(jobs_name_pattern_0) job_length = len(job_name_elements_list) for index in range(1, job_length+1): # print "index:", index job_name_pattern = jobs_name_pattern.replace("Order", str(index)) job_kill_pattern = jobs_kill_pattern.replace("Order", str(index)) job_name = get_element_name(browser, job_name_pattern) if job_name in ZOMBIE_JOB_LIST["list1"] and job_name in ZOMBIE_JOB_LIST["list2"] and job_name in ZOMBIE_JOB_LIST["list3"]: job_to_be_kill_indicate = 1 print "this job is should be killed: ", job_name kill_button_element = browser.find_element_by_xpath(job_kill_pattern) kill_button_element.click() time.sleep(1) confirm_kill_button_pattern = "//tbody/tr/td[1]/button[@class=‘gwt-Button‘]" confirm_kill_button_element = browser.find_element_by_xpath(confirm_kill_button_pattern) confirm_kill_button_element.click() time.sleep(2) time.sleep(2) sceen_shot_name = screenShotPath + "/After_kill_jobs_screen_shot.png" browser.save_screenshot(sceen_shot_name) browser.quit() return job_to_be_kill_indicate def get_element_name(browser, element_pattern): element_name = "" try: element = browser.find_element_by_xpath(element_pattern) element_name = element.text except Exception, e: print "element not exist any more!!!!!" element_name = "" return element_name def durationTime(timeStr): if timeStr is None or timeStr == "": return 0 if re.match(r"\d{2}:\d{2}:\d{2}", timeStr) is None: return 0 timeSec = int(timeStr[0:2]) * 3600 + int(timeStr[3:5]) * 60 + int(timeStr[6:8]) return timeSec def send_kill_jobs_mail(mailer, screenShotPath, url, indicator): # jobs screen before and after kill mailTitle = "Jobs_on_%s_Hanging" %url screenShotFile1 = screenShotPath + "/Before_kill_jobs_screen_shot.png" screenShotFile2 = screenShotPath + "/After_kill_jobs_screen_shot.png" logFile = screenShotPath + "/nodes_hanging.log" command = ‘mail -a ‘ + screenShotFile1 + ‘ -a ‘ + screenShotFile2 + ‘ -s ‘ + mailTitle + mailer + ‘ < ‘ + logFile print "command: ", command os.system(command) return 0 def monitor(): # kill exist PhantomJS command = "killall phantomjs" print "kill all existing phantomjs: ", command os.system(command) parser = argparse.ArgumentParser() parser.add_argument(‘-url‘, action=‘store‘, dest=‘url‘, help=‘data url‘, required=True) parser.add_argument(‘-screenShotPath‘, action=‘store‘, dest=‘screenShotPath‘, help=‘the screen shot path‘, required=True) results = parser.parse_args() print ‘DataRush URL = ‘, results.url url = results.url print ‘Screen Shot Path = ‘, results.screenShotPath screenShotPath = results.screenShotPath mailer = getMailReceiver() print "START: Monitor DataRush Starting.............................." job_killed_inicate = kill_zombie_jobs(screenShotPath, url) if job_killed_inicate == 1: print "zombie jobs has been killed!!!!!!!" send_kill_jobs_mail(mailer, screenShotPath, url, 1) else: pass print "End: Monitor Finished....................................." if __name__ == ‘__main__‘: monitor()
标签:
原文地址:http://www.cnblogs.com/zhangchao3322218/p/5518379.html