标签:
Sim-outorder.c
Main性能
Fetch ——> despetch——> issue——> writeback ——>commit
Code text——>fetch queue ——> RUU/LSQ(—>readyqueue)——>event queue——>删除envent——>RUU/LSQ
功能模拟高速跳过的指令,之后在for循环中一个cycle一个cycle的模拟,每一个cycle逆序运行:
/* commitentries from RUU/LSQ to architected register file */
ruu_commit();
/* service function unit release events*/
ruu_release_fu();
/* ==> may have ready queue entriescarried over from previous cycles */
/* service result completions, alsoreadies dependent operations */
/* ==> inserts operations into readyqueue --> register deps resolved */
ruu_writeback();
/* try to locate memory operations that are ready to execute */
/* ==> inserts operations into ready queue --> mem deps resolved*/
lsq_refresh();
/* issue operations ready to execute from a previous cycle */
/* <== drains ready queue <-- ready operations commence execution*/
ruu_issue();
/* decode and dispatch new operations */
/* ==> insert ops w/ no deps or allregs ready --> reg deps resolved */
ruu_dispatch();
/* call instruction fetch unit if it isnot blocked */
if (!ruu_fetch_issue_delay)
ruu_fetch();
else
ruu_fetch_issue_delay--;
逆序运行是由于要用顺序运行的代码,模拟并发运行的硬件。假设顺序运行的话,在同一个cycle内,取指阶段取到的指令可能立刻就得到了dispatch。即前一阶段的运行结果会改动相关内容。导致下一阶段本来要使用的数据被改动。因此採用逆序运行。
重要数据结构:
/* areservation station link: this structure links elements of a RUU
reservation station list; used for readyinstruction queue, event queue, and
output dependency lists; each RS_LINKnode contains a pointer to the RUU
entry it references along with an instancetag, the RS_LINK is only valid if
the instruction instance tag matches theinstruction RUU entry instance tag;
this strategy allows entries in the RUU canbe squashed and reused without
updating the lists that point to it, whichsignificantly improves the
performance of (all to frequent) squashevents */
struct RS_link {
struct RS_link *next; /* next entry in list */
struct RUU_station *rs; /* referenced RUU resv station */
INST_TAG_TYPE tag; /* inst instance sequence number */
union {
tick_t when; /* time stamp of entry (for eventq) */
INST_SEQ_TYPE seq; /* inst sequence */
int opnum; /*input/output operand number */
} x;
};
用在3个地方,就绪队列,ready_queue,event_queue和保留站中每条指令的输出依赖(即输入依赖于该条指令输出的全部其它保留站)。
/* a register update unit (RUU) station, thisrecord is contained in the
processors RUU, which serves as a collection of ordered reservations
stations. The reservationstations capture register results and await
thetime when all operands are ready, at which time the instruction is
issued to the functional units; the RUU is an order circular queue, inwhich
instructions are inserted in fetch (program) order, results are storedin
theRUU buffers, and later when an RUU entry is the oldest entry in the
machines, it and its instruction‘s value is retired to the architectural
register file in program order, NOTE: the RUU and LSQ share the same
structure, this is useful because loads and stores are split into two
operations: an effective address add and a load/store, the add isinserted
intothe RUU and the load/store inserted into the LSQ, allowing the add
towake up the load/store when effective address computation has finished */
structRUU_station {
/*inst info */
md_inst_t IR; /*instruction bits */
enummd_opcode op; /*decoded instruction opcode */
md_addr_t PC, next_PC, pred_PC; /*inst PC, next PC, predicted PC */
intin_LSQ; /*non-zero if op is in LSQ */
intea_comp; /*non-zero if op is an addr comp */
intrecover_inst; /* start ofmis-speculation?
*/
intstack_recover_idx; /*non-speculative TOS for RSB pred */
struct bpred_update_t dir_update; /*bpred direction update info */
intspec_mode; /* non-zeroif issued in spec_mode */
md_addr_t addr; /*effective address for ld/st‘s */
INST_TAG_TYPE tag; /*RUU slot tag, increment to
squash operation */
INST_SEQ_TYPE seq; /*instruction sequence, used to
sort the ready list and tag inst */
unsigned int ptrace_seq; /*pipetrace sequence number */
intslip;
/*instruction status */
intqueued; /*operands ready and queued */
intissued; /*operation is/was executing */
intcompleted; /*operation has completed execution */
/*output operand dependency list, these lists are used to
limit the number of associative searches into the RUU when
instructions complete and need to wake up dependent insts */
intonames[MAX_ODEPS]; /* outputlogical names (NA=unused) */
struct RS_link *odep_list[MAX_ODEPS]; /*chains to consuming operations */
/*input dependent links, the output chains rooted above use these
fields to mark input operands as ready, when all these fields have
been set non-zero, the RUU operation has all of its register
operands, it may commence execution as soon as all of its memory
operands are known to be read (see lsq_refresh() for details on
enforcing memory dependencies) */
int idep_ready[MAX_IDEPS]; /* input operand ready? */
};
/*
* thecreate vector maps a logical register to a creator in the RUU (and
*specific output operand) or the architected register file (if RS_link
* isNULL)
*/
/* an entry in the create vector */
structCV_link {
struct RUU_station *rs; /* creator‘s reservation station */
intodep_num; /*specific output operand */
};
每一个寄存器都相应0个或者一个CV_link结构。CV即create vector。即最新的产生该寄存器值的保留站。
寄存器有regs和 spec_regs_R(F、C),前者为真实逻辑寄存器,后者为判断运行时的寄存器。
struct regs_t {
md_gpr_t regs_R; /*(signed) integer register file */
md_fpr_t regs_F; /*floating point register file */
md_ctrl_t regs_C; /*control register file */
md_addr_t regs_PC; /*program counter */
md_addr_t regs_NPC; /*next-cycle program counter */
};
在非判断运行模式,在dispatch的译码阶段真正运行指令,包含读写寄存器,此时读写的是regs,在判断运行模式。即spec_mode=true时,例如以下:
#define GPR(N) (BITMAP_SET_P(use_spec_R,R_BMAP_SZ, (N))\
? spec_regs_R[N] \
: regs.regs_R[N])
#define SET_GPR(N,EXPR) (spec_mode \
? ((spec_regs_R[N] = (EXPR)), \
BITMAP_SET(use_spec_R, R_BMAP_SZ, (N)),\
spec_regs_R[N]) \
: (regs.regs_R[N] = (EXPR)))
即,写寄存器时。假设写spec_regs_R寄存器。并将相应的bitmask表中寄存器相应的位置位;读寄存器时,假设bitmask表中寄存器相应的位置位了,则读spec_regs_R,否则读普通寄存器regs。
各阶段的数据结构:
Ruu_fetch:code_textà fetch_data,即从代码段将指令读进取指队列。
Ruu_dispatch:fetch_dataàRUU(àready_queue,状态为queued),即从取指队列将指令读进保留站,对于普通算术指令,假设操作数准备好了直接发射。对于store指令,操作数准备好了直接发射(发射即进入ready_queue)。对于load、store指令和long latency的指令排在就绪队列的前边。其它的按指令序列插入就绪队列。
Ruu_issue:ready_queueàevent_queue(RSlink,指向保留站)。状态为issued,对ready_queue中的就绪指令进行运行,store指令立马完毕(在commit阶段真正訪存)。load指令检查前面的store指令地址,匹配延迟为1。否则訪问cache。记录延迟,设置事件发生在延迟之后;普通算术指令延迟为运行时间,并设置事件发生时间为延迟之后。
lsq_refresh:LSQàready_queue。更新LSQ,处理WAW和WAR。后面的对同一地址的store会覆盖前面的,假设load指令之前的全部store指令都没有地址同样的。则发射load指令。
ruu_writeback:event_queueà删除该事件,并置rs状态为completed,假设当前指令为转移指令且推測错误,则将该指令之后进入保留站RUU的指令撤销,而且消除判断运行过程中对内存和寄存器的写,而且将返回地址栈顶设置为该转移指令正确的转移地址,设置分支延迟;假设为writeback阶段更新分支预測表则更新;将该指令的输出依赖表置空,并将依赖该指令输出的操作数值为ready。并在所有ready的情况下让指令进入ready_queue。
ruu_release_fu:将资源池中的全部busy的资源的busy值减1。
ruu_commit:删除completed的RUU和LSQ中的指令。处理全部completed的指令,对store指令真正訪存写入cache,可是不产生事件(全部使用同一地址的数据的指令已经都得到了数据)。假设分支预測器更新在commit阶段则更新。(寄存器的写回在dispatch阶段就已经完毕了)。
转移推測更新能够在dispatch阶段、writeback阶段或者commit阶段。
ruu_fetch();
while (ifq不满的情况下。最多取ruu_decode_width * fetch_speed条指令)
/*fetch an instruction at the next predicted fetch address */
fetch_regs_PC =fetch_pred_PC;
If(PC合法)
到memory中取指令赋值给inst(cache仅仅模拟訪问过程,无数据)
If 存在cache和tlb
则模拟訪问cache和tlb,得到取指令的延迟lat
If lat != cache_il1_lat
则堵塞取指令ruu_fetch_issue_delay += lat - 1;
Else
指令为空指令
If 存在分支预測器pred
取操作码op
Ifop为control指令
fetch_pred_PC=预測器预測的指令(同一时候得到stack_recover_idx)
else
fetch_pred_PC为当前指令的下条指令
else
fetch_pred_PC为当前指令的下条指令
当前指令进入指令队列:
fetch_data[fetch_tail].IR = inst;
fetch_data[fetch_tail].regs_PC =fetch_regs_PC;
fetch_data[fetch_tail].pred_PC =fetch_pred_PC;
fetch_data[fetch_tail].stack_recover_idx= stack_recover_idx;
fetch_data[fetch_tail].ptrace_seq =ptrace_seq++;
ruu_dispatch();
while 取指队列不空,保留站和LSQ不满。没达到每轮取指最大值
假设在“顺序”模式,且最后一条指令操作数没准备好,则退出
//取指令队列中头结点的数据,例如以下:
inst = fetch_data[fetch_head].IR;
regs.regs_PC = fetch_data[fetch_head].regs_PC;
pred_PC = fetch_data[fetch_head].pred_PC;
dir_update_ptr = &(fetch_data[fetch_head].dir_update);
stack_recover_idx = fetch_data[fetch_head].stack_recover_idx;
pseq = fetch_data[fetch_head].ptrace_seq;
regs.regs_NPC= regs.regs_PC + sizeof(md_inst_t);
//译码and真正运行指令
switch(op)
//next PC 不等于 PC+4,即发生了跳转
br_taken = (regs.regs_NPC != (regs.regs_PC + sizeof(md_inst_t)));
//predicted PC 不等于 PC+4,即预測结果是发生了跳转
br_pred_taken = (pred_PC != (regs.regs_PC + sizeof(md_inst_t)));
if 完美预測下预測错误 或者直接跳转预測跳转但目标错误(显然的错误)
| 修正next PC、指令对列,并设置取指延迟为分支延迟
| fetch_redirected = TRUE;//告知已经取指重定向
if 操作码非空
| 设置保留站rs的对应值
| If 操作为訪存操作
| rs->op = MD_AGEN_OP;//add指令计算地址
| rs->ea_comp = TRUE;
| 设置lsq的对应值
| 设置rs和lsq的in/out依赖
| ruu_link_idep(rs, 0, NA);//rs依赖于哪一个寄存器相应的CVlink
ruu_link_idep(rs, 1, in2);
ruu_link_idep(rs, 2, in3);
/* install output after inputs to prevent self reference */
ruu_install_odep(rs, 0, DTMP);//创建寄存器的CVlink
If rs的操作数准备好了
| readyq_enqueue(rs);
if lsq的操作数准备好了(仅仅有store指令!。!
)
| readyq_enqueue(lsq);
else
设置rs的in/out依赖
If rs的操作数准备好了
| | readyq_enqueue(rs);
Else 空指令 rs=NULL
If当前不是判断模式
If 当前指令是分支指令且设置的为dispatch阶段更新转移预測表。则
更新分支预測的信息表
if (pred_PC != regs.regs_NPC && !fetch_redirected)
//假设预測的结果不正确,且没有修正!
。
spec_mode = TRUE;//開始判断运行
rs->recover_inst = TRUE;
recover_PC = regs.regs_NPC;
end while
ruu_issue:
node = ready_queue;//将ready_queue赋给node
ready_queue = NULL;//将ready_queue赋为空
for 不到发射宽度(default 4)
| ifnode合法
| | 从node取rs,并rs->queued = FALSE;
| | if 是store指令
| | | 直接完毕,设置rs状态为completed
| | else //不是store指令
| | | if 该指令须要功能部件fu
| | | | if 拿到了对应的fu
| | | | | rs->issued = TRUE;而且设置fu的busy值
| | | | | if 指令为load指令
| | | | | | 查询是否有store在load之前且地址同样,是则延迟为1
| | | | | | 假设没有则到TLB和cache中找,并按结果写入事件队列
| | | | | end if
| | | | else 将rs又一次放回就绪队列
| | | else 不须要功能部件,直接将结果写入事件队列。延迟为1
| | end if
| end node合法
| RSLINK_FREE(node);//释放处理过的
End for
For node不为空
| 将node链表中的就绪指令又一次放回ready_queue
End for
lsq_refresh:
std_unknowns[MAX_STD_UNKNOWNS];一个地址数组,该地址的值不知道(有未完毕的store)
for 遍历LSQ中的每一个操作
| ifstore指令
| | if 地址没准备好
| | | 结束,一个不知道地址的store指令能够堵塞之后全部的load、store指令
| | elseif 操作数没准备好
| | | 将该store指令的地址写入std_unknowns中
| | else 操作数和地址都准备好,则将std_unknowns中地址同样的地址清除
| end if
| if 是queue=false,没发射。没完毕且操作数准备好的load指令
| | 看std_unknowns中是否有地址与load地址同样的,没有则该load进入就绪队列
ruu_writeback:
while (rs = eventq_next_event())//当前sim_cycle有事件
| rs状态置位completed
| if(rs->recover_inst)即该指令是分支且预測错误
| | ruu_recover//清除RUU和LSQ中。该指令之后的全部指令
| | tracer_recover//清空判断运行寄存器的值。清空对memory的写、取指队列。取消判断运行,并将fetch_pred_PC = fetch_regs_PC = recover_PC;
| | bpred_recover//pred->retstack.tos= stack_recover_idx;
| | 设置分支预測错误延迟3
| end if
| if设置的为该WB阶段更新分支预測器,则更新
| for
| | if当前相应该指令输出的寄存器的creator依赖于该指令
则将依赖于该指令的输出的creator vector清空
| | for将依赖于该指令结果的指令的操作数设置为ready,
| | if 依赖指令得到结果后。操作数都ready了且(不是訪存指令或者是store指令)
| | | 将指令增加就绪队列
ruu_release_fu:资源池中的每一个资源假设busy不为0。则减1
ruu_commit:
while RUU不为空且不超过提交宽度
| 获取rs
| if指令为地址比較(LSQ中有相应的load store指令)
| | if LSQ中的load、store指令为完毕,则break
| | if 指令为store指令
| | | 取store port即fu
| | | if 取到了fu
| | | | 设置fu的busy值为发射延迟为issuelat
| | | | 訪问TLB和cache写回数据
| | | else 即没有store port,则break
| | end if
| | 将LSQ的第一个元素删除(load指令已经完毕)
| if设置的为该CT阶段更新分支预測器,则更新
| 删除RUU的第一个元素
1、全部跳转指令都须要运算部件。可是直接跳转假设预測跳转且预測跳转的目标不等于指令中的地址,则明显预測错误。能够在译码阶段就清空取指队列并设置分支延迟。
2、译码阶段的宏展开:
switch (op)
{
1、#defineDEFINST(OP,MSK,NAME,OPFORM,RES,CLASS,O1,O2,I1,I2,I3) \
caseOP: \
/* compute output/input dependencies toout1-2 and in1-3 */ \
out1 = O1; out2 = O2; \
in1 = I1; in2 = I2; in3 = I3; \
/* execute the instruction */ \
SYMCAT(OP,_IMPL); \
break;
省略·············································
2、#include "machine.def"
Default:
}
在machine.def中有
4、#define LDA_IMPL \
{ \
SET_GPR(RA, GPR(RB) + SEXT(OFS)); \
}
3、DEFINST(LDA, 0x08,
"lda", "a,o(b)",
IntALU, F_ICOMP,
DGPR(RA), DNA, DNA,DGPR(RB), DNA)
当中1是宏定义相当于空格,2处引入了文件machine.def。相当于将machine.def的内容放入switch语句中,当然宏4仍然当做空格。3处的代码。正好相应了1处的宏定义。进行展开,变成了case OP语句,而1中的SYMCAT(OP,_IMPL),又相应了4处的宏,于是展开为4中的代码,于是switch语句就变成了
Switch(op)
Case LDA:
out1 = O1; out2 = O2; \
in1 = I1; in2 = I2; in3 = I3; \
SET_GPR(RA, GPR(RB) + SEXT(OFS)); \
break;
版权声明:本文博主原创文章,博客,未经同意不得转载。
标签:
原文地址:http://www.cnblogs.com/gcczhongduan/p/4907663.html