码迷,mamicode.com
首页 > 其他好文 > 详细

ibm z13 处理器primer阅读1

时间:2017-10-29 20:26:53      阅读:233      评论:0      收藏:0      [点我收藏+]

标签:live   说明   ram   out   rsa   nec   cache   table   dap   

BTB单元

Branch prediction in z processors is performed ‘asynchronously‘ to instruction processing

–  The branch prediction logic can find/locate/predict future occurrences of branch-type instructions(including calls andreturns) and their corresponding directions (taken or not taken) and targets (where to go next) on its own withoutrequiring / waiting for the downstream pipeline to actually decode / detect a branch instruction包含了call/return的指示,是否包含RAS不详

–  The branch prediction logic tries its best in predicting the program path much further into program code than wherethe instruction fetching unit is currently delivering instructions at (and should be way ahead of where the executionengines are executing) 这是BTB的基本运行机制,但是对于这个单元会提前取指多少cycle不详
The branch prediction logic adapts many advanced algorithms / structures in maintaining and predicting branching behaviors in program code, as seen in Figure 3, including

–  First level branch target buffer (BTB1) and branch (direction) history table (BHT1)BHT中存储pc tag以及相应的2-bit 饱和计数器

–  Second level target and history buffers (BTB2 and BHT2) (introduced since zEC12) with a pre-buffer (BTBP) used asa transient buffer to filter out unnecessary histories二级BHT

? Note: BHT2 is only used in zEC12

–  Accelerators for improving prediction throughput (ACC) by “predicting the prediction” (since zEC12) so it can make aprediction every cycle (for a limited subset of branches)

–  Pattern based direction and target predictors (PHT and CTB) to predict based on “how the program gets here” branch history (that represents the program flow), e.g. for predicting an ending of a branch on count loop, or a subroutinereturn that has multiple callersACC的含义到底是什么?原文描述为Column Predictor 
? The branch prediction logic communicates its prediction results to the instruction fetching logic throughan overflow queue (BPOQ); such that it can always search ahead of where instructions are being fetched 说明了BTB单元能力减少branch penalty 的实质。
Instruction Delivery 

?  Since z/Architecture instructions are of variable lengths of 2, 4 and 6 bytes, an instruction can start at any half word (integral 2-byte) granularity

?  Instruction fetching fetches “chunks” of storage aligned data from the instruction cache, starting at adisruption point; e.g. after a taken branch (including subroutine calls and returns), or a pipeline flush– Up to 2 16-byte chunks for z196 and zEC12; Up to 4 8-byte chunks for z13

 

 

  These “chunks” of data are then written into an instruction buffer (as a “clump”), where instructions are extracted (or parsed) into individual z-instructions in program order 一次取指为2条指令,或者32 bytes的指令包,因此sequential的指令将会被放在instruction buffer 中,变长的指令集设计决定接下来的指令将从这个instruction buffer中执行非对齐操作获得。(文中用了extacted 和parsed)

Instruction Grouping 

?  As instructions (and μop’s) are grouped, they are subject to various grouping rules, which prevent certain instructions from being grouped with others

?  In z196 and zEC12, one group of up to 3 instructions can be grouped at a time, while z13 allows two groups of up to 3 instructions at a time

?  Once instructions are dispatched (or written) into the issue queue as a group, they are tracked in the global completion table (GCT) until every instruction in the group has finished processing; then the groupis completed and retired 

类似VLIW 处理器的分组机制。

 

ibm z13 处理器primer阅读1

标签:live   说明   ram   out   rsa   nec   cache   table   dap   

原文地址:http://www.cnblogs.com/smile-at-desert/p/7750772.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!