标签:table selenium nbsp 中文 nes .exe with 基本 ant
https://developer.mozilla.org/en-US/docs/Web/API/Window/getComputedStyle
getComputedStyle can pull style info from pseudo-elements (for example, ::after
, ::before
, ::marker
, ::line-marker
—see spec here).
<style> h3::after { content: ‘ rocks!‘; } </style> <h3>generated content</h3> <script> var h3 = document.querySelector(‘h3‘); var result = getComputedStyle(h3, ‘:after‘).content; console.log(‘the generated content is: ‘, result); // returns ‘ rocks!‘ </script>
In [1]: import requests ...: from scrapy import Selector ...: ...: ...: url = ‘https://car.autohome.com.cn/config/series/3170.html‘ ...: s = requests.Session() ...: s.verify = False ...: s.headers = { ...: ‘User-Agent‘: ‘Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36‘ ...: } ...: ...: r = s.get(url) ...: sel = Selector(text=r.text) ...: ...: sel.css(‘table‘) ...: Out[1]: [<Selector xpath=‘descendant-or-self::table‘ data=‘<table class="t1 txtCen" width="100%">\n ‘>]
In [3]: from scrapy import Selector ...: from selenium import webdriver ...: ...: ...: url = ‘https://car.autohome.com.cn/config/series/3170.html‘ ...: dr = webdriver.Chrome() ...: dr.get(url) ...: sel = Selector(text=dr.page_source) ...: sel.css(‘table‘) ...: DevTools listening on ws://127.0.0.1:12968/devtools/browser/2e9adb31-7510-421f-ac13-835350af144e Out[3]: [<Selector xpath=‘descendant-or-self::table‘ data=‘<table class="t1 txtCen" width="100%">\n ‘>, <Selector xpath=‘descendant-or-self::table‘ data=‘<table cellspacing="0" cellpadding="0" c‘>, <Selector xpath=‘descendant-or-self::table‘ data=‘<table id="tab_side" cellspacing="0" cel‘>, <Selector xpath=‘descendant-or-self::table‘ data=‘<table cellspacing="0" cellpadding="0" c‘>, <Selector xpath=‘descendant-or-self::table‘ data=‘<table id="tab_0" cellspacing="0" cellpa‘>, <Selector xpath=‘descendant-or-self::table‘ data=‘<table id="tab_1" cellspacing="0" cellpa‘>, <Selector xpath=‘descendant-or-self::table‘ data=‘<table id="tab_2" cellspacing="0" cellpa‘>, <Selector xpath=‘descendant-or-self::table‘ data=‘<table id="tab_3" cellspacing="0" cellpa‘>, <Selector xpath=‘descendant-or-self::table‘ data=‘<table id="tab_4" cellspacing="0" cellpa‘>, <Selector xpath=‘descendant-or-self::table‘ data=‘<table id="tab_5" cellspacing="0" cellpa‘>, <Selector xpath=‘descendant-or-self::table‘ data=‘<table id="tab_100" cellspacing="0" cell‘>, <Selector xpath=‘descendant-or-self::table‘ data=‘<table id="tab_101" cellspacing="0" cell‘>, <Selector xpath=‘descendant-or-self::table‘ data=‘<table id="tab_102" cellspacing="0" cell‘>, <Selector xpath=‘descendant-or-self::table‘ data=‘<table id="tab_103" cellspacing="0" cell‘>, <Selector xpath=‘descendant-or-self::table‘ data=‘<table id="tab_104" cellspacing="0" cell‘>, <Selector xpath=‘descendant-or-self::table‘ data=‘<table id="tab_105" cellspacing="0" cell‘>, <Selector xpath=‘descendant-or-self::table‘ data=‘<table id="tab_106" cellspacing="0" cell‘>, <Selector xpath=‘descendant-or-self::table‘ data=‘<table id="tab_107" cellspacing="0" cell‘>, <Selector xpath=‘descendant-or-self::table‘ data=‘<table id="tab_108" cellspacing="0" cell‘>]
In [4]: for t in sel.css(‘table#tab_0‘): ...: print(‘@‘*10) ...: for row in t.xpath(‘.//tr‘): ...: print(‘-‘*100) ...: for col in row.xpath(‘.//th|.//td‘): ...: print(‘‘.join(col.xpath(‘.//node()/text()‘).extract()), end=‘\t‘) ...: print() ...: @@@@@@@@@@ ---------------------------------------------------------------------------------------------------- 基本参数 ---------------------------------------------------------------------------------------------------- 厂 - - - - - - - - - - - - ---------------------------------------------------------------------------------------------------- 级别 ---------------------------------------------------------------------------------------------------- 能源类型 汽油 汽油 汽油 汽油 汽油 汽油 汽油 汽油 汽油 汽油 汽油 汽油 ---------------------------------------------------------------------------------------------------- 上市 2017.10 2017.10 2017.10 2017.10 2017.10 2017.10 2017.10 2017.10 2017.10 2017.10 2017.10 2017.10 ---------------------------------------------------------------------------------------------------- (kW) 110 110 110 110 140 140 110 110 110 110 140 140 ---------------------------------------------------------------------------------------------------- (N·m) 250 250 250 250 320 320 250 250 250 250 320 320 ---------------------------------------------------------------------------------------------------- 发动机 1.4T 150马力 L4 1.4T 150马力 L4 1.4T 150马力 L4 1.4T 150马力 L4 2.0T 190马力 L4 2.0T 190马力 L4 1.4T 150马力 L4 1.4T 150马力 L4 1.4T 150马力 L4 1.4T 150马力 L4 2.0T 190马力 L4 2.0T 190马力 L4 ---------------------------------------------------------------------------------------------------- 变速箱 7挡双离合 7挡双离合 7挡双离合 7挡双离合 7挡双离合 7挡双离合 7挡双离合 7挡双离合 7挡双离合 7挡双离合 7挡双离合 7挡双离合 ---------------------------------------------------------------------------------------------------- 长*宽*高(mm) 4312*1785*1426 4321*1785*1426 4312*1785*1426 4321*1785*1426 4312*1785*1426 4321*1785*1426 4457*1796*1417 4462*1796*1417 4457*1796*1417 4462*1796*1417 4457*1796*1417 4462*1796*1417 ---------------------------------------------------------------------------------------------------- 车身结构 5门5座两厢车 5门5座两厢车 5门5座两厢车 5门5座两厢车 5门5座两厢车 5门5座两厢车 4门5座三厢车 4门5座三厢车 4门5座三厢车 4门5座三厢车 4门5座三厢车 4门5座三厢车 ---------------------------------------------------------------------------------------------------- 最高车速(km/h) 215 215 215 215 230 230 215 215 215 215 230 230 ---------------------------------------------------------------------------------------------------- 官方0-100km/h加速(s) 8.4 8.4 8.4 8.4 7.4 7.4 8.4 8.4 8.4 8.4 7.4 7.4 ---------------------------------------------------------------------------------------------------- 0-100km/h加速(s) - - - - - - - - - - - - ---------------------------------------------------------------------------------------------------- 100-0km/h制动(m) - - - - - - - - - - - - ---------------------------------------------------------------------------------------------------- 工信部(L/100km) 5.7 5.7 5.7 5.7 6.2 6.2 5.6 5.6 5.6 5.6 6.1 6.1 ---------------------------------------------------------------------------------------------------- (L/100km) - - - - - - - - - - - - ---------------------------------------------------------------------------------------------------- 整车 三10公里 三10公里 三10公里 三10公里 三10公里 三10公里 三10公里 三10公里 三10公里 三10公里 三10公里 三10公里
.hs_kw22_configyk::before {
content: "商";
}
.hs_kw44_configyk::before {
content: "一汽";
}
In [19]: sel = Selector(text=dr.page_source) In [20]: sel.css(‘span[class*="hs_kw"]‘).extract() Out[20]: [‘<span class="hs_kw34_configyk"></span>‘, ‘<span class="hs_kw66_configyk"></span>‘, ... ‘<span class="hs_kw42_optionLA"></span>‘, ]
继续搜索
import re from html import unescape from scrapy import Selector from selenium import webdriver # <script> # (function(qZ_) { # var Dp_ = function(Dp__) { # ... # return ‘.hs_kw‘ + $index$ + ‘_baikeYK‘; # ... # } # )(document); # </script> # 提取目标脚本 pattern_script = re.compile(r""" <script> ( #(): all js code \(function.*? ‘\.hs_kw‘.*?‘(.*?)‘ #(.*?): postfix _baikeYK in ‘.hs_kw‘ + $index$ + ‘_baikeYK‘ .*? ) </script> """, re.X) #|re.S # 改造目标脚本 # 中间插入 push 语句 # myarr.push([$index$, $temp$]); # $InsertRule$($index$, $temp$); pattern_code = re.compile(r""" ^\(function\((.*?)\) # (.*?): function argument \{ (.*?) # (.*?): top half js code inside {} (\$InsertRule\$\(\$index\$,\s*\$temp\$\);) #(...) (.*?) # (.*?): bottom half js code inside {} \} \)\(document\);$ # discard """, re.X) repl_code = r""" return myfunc(document); function myfunc(\1){ myarr = Array(); \2 myarr.push([$index$, $temp$]); \3 \4 return myarr; } """ # 根据返回字典填充 html myjs = """ var arr = document.querySelectorAll(‘.hs_kw%(key)s%(postfix)s‘); if(arr.length !== 0){ for (var i=0; i<arr.length; i++){ console.log(‘.hs_kw%(key)s%(postfix)s %(value)s‘); arr[i].innerHTML = ‘%(value)s‘ + arr[i].innerHTML; } } return null; """ url = ‘https://car.autohome.com.cn/config/series/3170.html‘ dr = webdriver.Chrome() dr.get(url) # dr.refresh() # 提取多个代码段<script>...</script>及其中的class name 后缀 script_list = pattern_script.findall(dr.page_source) for (js, postfix) in script_list: print(postfix) js = unescape(js) #& & js = pattern_code.sub(repl_code, js) ret = dr.execute_script(js) mydict = dict(ret) print(mydict) # 根据返回 myarr 字典填充文字到 html 对应 class name 的 span 内部 for (key, value) in mydict.items(): js = myjs%({‘postfix‘: postfix, ‘key‘: key, ‘value‘: value}) # print(js) dr.execute_script(js) sel = Selector(text=dr.page_source) # sel.css(‘span[class*="hs_kw"]‘).extract() for t in sel.css(‘table‘): # for t in sel.css(‘table#tab_0‘): print(‘@‘*10) for row in t.xpath(‘.//tr‘): print(‘-‘*100) for col in row.xpath(‘.//th|.//td‘): print(‘‘.join(col.xpath(‘.//node()/text()‘).extract()), end=‘\t‘) print()
js分析 汽_车_之_家 js生成css伪元素 hs_kw44_configUS::before
标签:table selenium nbsp 中文 nes .exe with 基本 ant
原文地址:https://www.cnblogs.com/my8100/p/js_qichezhijia.html