感激不尽是什么意思| 梦见自己大出血是什么征兆| 猪肝炒什么| xo酱是什么酱| 风热感冒吃什么水果| 酒糟鼻买什么药膏去红| 杏花代表什么生肖| 肝掌是什么原因引起的| 老人肚子胀是什么原因| 宫缩是什么原因引起的| 这是什么呀| 眼睛疼滴什么眼药水| 97年属什么生肖| 男人硬不起来是什么原因| 干咳挂什么科| chest是什么意思| 打更是什么意思| 1962年五行属什么| 11月27日是什么星座| 交尾是什么意思| 男人喝什么酒壮阳最快| 胰腺低密度影什么意思| 肠胃痉挛什么症状| 蜂蜜有什么作用| 身体湿热吃什么中成药| 蒙脱石散适合什么腹泻| 头皮发紧是什么病的前兆| 八仙过海是什么意思| 文武双全是什么生肖| 二郎腿为什么叫二郎腿| 免疫五项能查出什么病| 眼皮跳是什么预兆| 胎梦梦见蛇是什么意思| 七寸是什么意思| 致什么意思| 兵不血刃的意思是什么| 什么叫憩室| 感冒扁桃体发炎吃什么药| 睡觉爱做梦是什么原因| 妨夫是什么意思| 什么分泌胰岛素| 暗物质是什么| 无名指下面的竖线代表什么| 女同是什么| 酉读什么| 早上流鼻血是什么原因| 88年是什么命| 5月25号是什么星座| 漏斗胸是什么原因造成的| 阴虚湿热吃什么中成药| 什么食物含维生素b| 咽喉痛吃什么药好得快| 肛门潮湿用什么药最好| 2004年是什么命| 果脯是什么| 看痔疮挂什么科| 一边脸大一边脸小是什么原因| 甲壳素是什么东西| 结石挂什么科| 检察长什么级别| 有什么蔬菜| 霸王硬上弓是什么意思| 鸡蛋有什么营养| 多吃木耳有什么好处和坏处| 右边腰疼是什么原因| 趴着睡觉是什么原因| 肝喜欢什么食物| 肝病有什么反应| 女人吃秋葵有什么好处| 白带发黄什么原因| 断念是什么意思| 什么专业好找工作| 牙出血是什么病的前兆| 外地车进北京有什么限制| 12月21是什么星座| 裸辞是什么意思| brush什么意思| 十月十一日是什么星座| o型血和b型血生的孩子是什么血型| 荔枝肉是什么菜系| 小拇指有痣代表什么| 梦见煮鱼有什么预兆| 宫颈锥切术是什么意思| 低密度脂蛋白胆固醇是什么意思| 胸腔积液是什么意思| 父亲节是什么时候| 北极贝长什么样| 什么叫临床医学| 严重失眠吃什么中成药| 仓鼠可以吃什么| 必承其重上一句是什么| 心电轴重度左偏是什么意思| 莱昂纳多为什么叫小李子| 名什么中什么| 上梁山是什么意思| 梦见抓龙虾是什么意思| beginning什么意思| 杨贵妃属什么生肖| 鼻子旁边长痘是什么原因| loser是什么意思| 婧读什么| 痛风不能吃什么食物表| 办理护照需要什么手续| 胃疼吃什么药管用| 39属什么| 师姐是什么意思| 反复呕吐是什么病症| 三本是什么学历| 喝罗汉果水有什么功效| 腔梗吃什么药| 放疗和化疗有什么区别| 为什么右眼皮一直跳| 你喜欢我什么我改| 孕初期吃什么对胎儿好| 日行一善下一句是什么| yellow是什么颜色| 鹿字五行属什么| 肚子疼是为什么| 女人背心正中间疼是什么原因| 二月出生是什么星座| 装修属于什么行业| 应接不暇的暇是什么意思| 血脂六项包括什么| 为什么手脚冰凉还出汗| 六月中旬是什么时候| 维生素d和d3有什么区别| 北豆腐是什么| 乌梅是什么水果做的| 包饺子什么意思| 1993年出生属什么生肖| 为什么会突然头晕| 为什么会有鼻炎| 心火旺吃什么药| 为什么一直打嗝| 姓陆的女孩取什么名字好| 扁平足看什么科| 胃不好吃什么水果| 面瘫是什么症状| 总想喝水是什么原因| 弥可保是什么药| 腮腺炎挂什么科| 由来是什么意思| 出马仙是什么意思| 左侧附件区囊性占位是什么意思| 鼻窦炎是什么原因引起的呢| 梦见很多猪是什么意思| 炖羊汤放什么调料| 拉肚子吃什么食物好| 大熊猫属于什么科| 睡眠障碍挂什么科| 刷酸什么意思| 娃娃鱼吃什么| 包含是什么意思| 辛酉日五行属什么| 什么是眩晕症| 水痘挂什么科| 劳苦功高是什么意思| 手脚抽筋是什么原因引起的| 碧血是什么意思| 什么是过敏体质| 广州有什么区| 低密度脂蛋白高有什么症状| 补办手机卡需要什么| 纳氏囊肿是什么意思| 清炖排骨放什么调料| 舌头干燥吃什么药| 洗洗睡吧什么意思| 呼吸困难是什么原因| 月经量多是什么原因导致的| 什么时候割包皮最好| 9月12号是什么星座| 2017 年是什么年| 卯时五行属什么| 清福是什么意思| 尿血应该挂什么科| 神经性头痛吃什么药| 儿童上火了吃什么降火最快| 3月23是什么星座| 澳大利亚有什么特产| 糖类抗原是什么意思| 风疹病毒抗体偏高是什么意思| 吃羊肉不能和什么一起吃| 什么杯子喝水最健康| 什么酒最贵| 路痴是什么意思| 什么雅| 结膜炎挂什么科| 卫衣是什么| 住院医师是什么意思| 哗众取宠是什么意思| 什么时候普及高中| 前列腺炎有什么症状| 性感染有什么症状| 什么饮料好喝| 枸橼酸西地那非片有什么副作用| 肺心病吃什么药| 玄牝是什么意思| 狐臭看什么科| 胖大海是什么东西| 什么叫脘腹胀痛| 吃什么油最健康排行榜| 桑黄是什么树上长出来的| 申时属什么| 现在可以种什么农作物| 白细胞低什么原因| 黄绿色是什么颜色| 哪吒妈妈叫什么名字| 小孩感冒流鼻涕吃什么药| psa检查是什么意思| 高抬贵手是什么意思| 梅毒单阳性是什么意思| wi-fi是什么意思| 6月份是什么星座| 狗的尾巴有什么作用| 中巴友谊为什么这么好| 女娲是一个什么样的人| 什么蔬菜| 车加昆念什么| 脸浮肿是什么病的前兆| 毛宁和毛阿敏是什么关系| 近视眼睛什么牌子好| 白酒不能和什么一起吃| 丧门是什么意思| 嗔什么意思| 为什么人会流泪| 慢性前列腺炎吃什么药| 脚冰冰凉是什么原因| 情人节送什么给女孩子| 农历六月十九是什么星座| 女生为什么喊你男神| 痛经 吃什么| 右肺纤维灶是什么意思| 7号来的月经什么时候是排卵期| 4ever是什么意思| 血热是什么症状| 雪青色是什么颜色| 腰椎间盘突出不能吃什么食物| 招财猫鱼吃什么| member是什么意思| 饮食男女是什么意思| 平行宇宙是什么意思| 放我鸽子是什么意思| 阳历6月28日是什么星座| 血热吃什么药可以凉血| 戒指戴左手中指是什么意思| 出生医学证明有什么用| 警察在古代叫什么| 属龙女和什么属相最配| 最大的淡水湖是什么湖| 隶属什么意思| 区人大代表是什么级别| 高血压注意什么事项| aut0是什么意思| 人工荨麻疹是什么原因引起的| 螃蟹的血是什么颜色的| hvp阳性是什么病| 大暑吃什么| 医生规培是什么意思| 女人矜持是什么意思| 血象是指什么| 什么是气胸有什么症状| 尿潜血十一是什么意思| 国药准字号是什么意思| gc是什么| 百度Jump to content

湖南造光绪元宝鉴定收购去哪

From Wikipedia, the free encyclopedia
百度 ▲(北京军区总医院高级营养配餐师于仁文)版权声明:凡本网注明来源:生命时报的所有作品,均为《生命时报》合法拥有版权或有权使用的作品,任何报刊、网站等媒体或个人未经本报书面授权不得转载、链接、转帖或以其他方式复制发布。

In CPU design, the use of a sum-addressed decoder (SAD) or sum-addressed memory (SAM) decoder is a method of reducing the latency of the CPU cache access and address calculation (base + offset). This is achieved by fusing the address generation sum operation with the decode operation in the cache SRAM.

Overview

[edit]

The L1 data cache should usually be in the most critical CPU resource, because few things improve instructions per cycle (IPC) as directly as a larger data cache, a larger data cache takes longer to access, and pipelining the data cache makes IPC worse. One way of reducing the latency of the L1 data cache access is by fusing the address generation sum operation with the decode operation in the cache SRAM.

The address generation sum operation still must be performed, because other units in the memory pipe will use the resulting virtual address. That sum will be performed in parallel with the fused add/decode described here.

The most profitable recurrence to accelerate is a load, followed by a use of that load in a chain of integer operations leading to another load. Assuming that load results are bypassed with the same priority as integer results, then it's possible to summarize this recurrence as a load followed by another load—as if the program was following a linked list.

The rest of this page assumes an instruction set architecture (ISA) with a single addressing mode (register+offset), a virtually indexed data cache, and sign-extending loads that may be variable-width. Most RISC ISAs fit this description. In ISAs such as the Intel x86, three or four inputs are summed to generate the virtual address. Multiple-input additions can be reduced to a two-input addition with carry save adders, and the remaining problem is as described below. The critical recurrence, then, is an adder, a decoder, the SRAM word line, the SRAM bit line(s), the sense amp(s), the byte steering muxes, and the bypass muxes.

For this example, a direct-mapped 16 KB data cache which returns doubleword (8-byte) aligned values is assumed. Each line of the SRAM is 8 bytes, and there are 2048 lines, addressed by Addr[13:3]. The sum-addressed SRAM idea applies equally well to set associative caches.

Sum-addressed cache: collapse the adder and decoder

[edit]

The SRAM decoder for this example has an 11-bit input, Addr[13:3], and 2048 outputs, the decoded word lines. One word line is driven high in response to each unique Addr[13:3] value.

In the simplest form of decoder, each of the 2048 lines is logically an AND gate. The 11 bits (call them A[13:3] and their complements (call them B[13:3]) are driven up the decoder. For each line, 11 bits or complements are fed into an 11-input AND gate. For instance, 1026 decimal is equal to 10000000010 binary. The function for line 1026 would be:

wordline[1026] = A[13] & B[12] & B[11] & B[10] & B[9] & B[8] & B[7] & B[6] & B[5] & A[4] & B[3]

Both the carry chain of the adder and the decoder combine information from the entire width of the index portion of the address. Combining information across the entire width twice is redundant. A sum-addressed SRAM combines the information just once by implementing the adder and decoder together in one structure.

Recall that the SRAM is indexed with the result of an add. Call the summands R (for register) and O (for the offset to that register). The sum-addressed decoder is going to decode R+O. For each decoder line, call the line number L.

Suppose that our decoder drove both R and O over each decoder line, and each decoder line implemented:

wordline[L] = (R+O)==L
(R+O)==L <=> R+O-L==0
         <=> R+O+~L+1==0
         <=> R+O+~L==-1==11..1.

A set of full adders can be used to reduce R+O+~L to S+C (this is carry save addition). S+C==11..1 <=> S==~C. There will be no carries in the final add. Note that since C is a row of carries, it's shifted up one bit, so that R[13:3]+O[13:3]+~L[13:3] == {0,S[13:3]} + {C[14:4],0}

With this formulation, each row in the decoder is a set of full adders which reduce the base register, the offset, and the row number to a carry-save format, and a comparator. Most of this hardware will be proven redundant below, but for now it's simpler to think of it all existing in each row.

Ignoring the LSBs: late select on carry

[edit]

The formulation above checks the entire result of an add. However, in a CPU cache decoder, the entire result of the add is a byte address, and the cache is usually indexed with a larger address, in our example, that of an 8-byte block. It is preferable to ignore a few of the LSBs of the address. However, the LSBs of the two summands can't be ignored because they may produce a carry-out which would change the doubleword addressed.

If R[13:3] and O[13:3] are added to get some index I[13:3], then the actual address Addr[13:3] is equal to either I[13:3], or I[13:3] + 1, depending on whether R[2:0]+O[2:0] generates a carry-out. Both I and I+1 can be fetched if there are two banks of SRAM, one with even addresses and one with odd. The even bank holds addresses 000xxx, 010xxx, 100xxx, 110xxx, etc., and the odd bank holds addresses 001xxx, 011xxx, 101xxx, 111xxx, etc. The carry-out from R[2:0]+O[2:0] can then be used to select the even or odd doubleword fetched later.

Note that fetching from two half-size banks of SRAM will dissipate more power than fetching from one full-size bank, as it causes more switching in the sense amps and data steering logic.

Match generation

[edit]
I[13:3] even bank
fetches line
odd bank
fetches line
100 100 101
101 110 101
110 110 111

Referring to the adjacent diagram, the even bank will fetch line 110 when I[13:3]==101 or I[13:3]==110. The odd bank will fetch line 101 when I[13:3]==100 or I[13:3]==101.

In general, the odd SRAM bank should fetch line Lo==2N+1 when either I[13:3]==2N or I[13:3]==2N+1. The two conditions can be written as:

I[13:3] = Lo-1 =>  R[13:3] + O[13:3] + ~Lo+1 = 11..11
               =>  R[13:3] + O[13:3] + ~Lo   = 11..10
I[13:3] = Lo   =>  R[13:3] + O[13:3] + ~Lo   = 11..11

Ignore the last digit of the compare: (S+C)[13:4]==11..1

Similarly, the even SRAM bank fetches line Le==2N when either I[13:3]==2N or I[13:3]==2N-1. The conditions are written as follows, and once again ignore the last digit of the compare.

I[13:3] = Le-1 =>  R[13:3] + O[13:3] + ~Le = 11..10
I[13:3] = Le   =>  R[13:3] + O[13:3] + ~Le = 11..11

Gate-level implementation

[edit]
    R13 ... R6  R5  R4  R3
    O13 ... O6  O5  O4  O3
    L13 ... L6  L5  L4  L3
--------------------------
    S13 ... S6  S5  S4  S3
C14 C13 ... C6  C5  C4

Before collapsing redundancy between rows, review:

Each row of each decoder for each of two banks implements a set of full adders which reduce the three numbers to be added (R[13:3], O[13:3], and L) to two numbers (S[14:4] and C[13:3]). The LSB (==S[3]) is discarded. Carry out (==C[14]) is also discarded. The row matches if S[13:4] == ~C[13:4], which is &( xor(S[13:4], C[13:4])).

It is possible to partially specialize the full adders to 2-input AND, OR, XOR, and XNOR because the L input is constant. The resulting expressions are common to all lines of the decoder and can be collected at the bottom.

S0;i   = S(Ri, Oi, 0) = Ri xor Oi
S1;i   = S(Ri, Oi, 1) = Ri xnor Oi
C0;i+1 = C(Ri, Oi, 0) = Ri and Oi
C1;i+1 = C(Ri, Oi, 1) = Ri or Oi.

At each digit position, there are only two possible Si, two possible Ci, and four possible XORs between them:

Li=0 and Li-1=0: X0;0;i = S0;i xor C0;i = Ri xor Oi xor (Ri-1 and Oi-1)
Li=0 and Li-1=1: X0;1;i = S0;i xor C1;i = Ri xor Oi xor (Ri-1 or Oi-1)
Li=1 and Li-1=0: X1;0;i = S1;i xor C0;i = Ri xnor Oi xor (Ri-1 and Oi-1) = !X0;0;i
Li=1 and Li-1=1: X1;1;i = S1;i xor C1;i = Ri xnor Oi xor (Ri-1 or Oi-1)  = !X0;1;i

One possible decoder for the example might calculate these four expressions for each of the bits 4..13, and drive all 40 wires up the decoder. Each line of the decoder would select one of the four wires for each bit, and consist of a 10-input AND.

What has been saved?

[edit]

A simpler data cache path would have an adder followed by a traditional decoder. For our example cache subsystem, the critical path would be a 14-bit adder, producing true and complement values, followed by an 11-bit AND gate for each row of the decoder.

In the sum-addressed design, the final AND gate in the decoder remains, although 10 bits wide instead of 11. The adder has been replaced by a four input logical expression at each bit. The latency savings comes from the speed difference between the adder and that four input expression, a savings of perhaps three simple CMOS gates.

If the reader feels that this was an inordinate amount of brain-twisting work for a three gate improvement in a multi-cycle critical path, then the reader has a better appreciation for the level to which modern CPUs are optimized.

Further optimizations: predecode

[edit]

Many decoder designs avoid high-fan-in AND gates in the decode line itself by employing a predecode stage. For instance, an 11-bit decoder might be predecoded into three groups of 4, 4, and 3 bits each. Each 3-bit group would drive 8 wires up the main decode array, each 4-bit group would drive 16 wires. The decoder line then becomes a 3-input AND gate. This reorganization can save significant implementation area and some power.

This same reorganization can be applied to the sum-addressed decoder. Each bit in the non-predecoded formulation above can be viewed as a local two-bit add. With predecoding, each predecode group is a local three, four, or even five-bit add, with the predecode groups overlapping by one bit.

Predecoding generally increases the number of wires traversing the decoder, and sum-addressed decoders generally have about twice as many wires as the equivalent simple decoder. These wires can be the limiting factor on the amount of feasible predecoding.

References

[edit]
  • Paul Demone has an explanation of sum-addressed caches in a realworldtech article.
  • Heald et al.[1] have a paper in ISSCC 1998 that explains what may be the original sum-addressed cache in the Ultrasparc III.
  • Sum-addressed memory is described in

United States patent 5,754,819, May 19, 1998, Low-latency memory indexing method and structure. Inventors: Lynch; William L. (Palo Alto, CA), Lauterbach; Gary R. (Los Altos, CA); Assignee: Sun Microsystems, Inc. (Mountain View, CA), Filed: July 28, 1994

  • At least one of the inventors named on a patent related to carry-free address decoding credits the following publication:

Evaluation of A + B = K Conditions without Carry Propagation (1992) Jordi Cortadella, Jose M. Llaberia IEEE Transactions on Computers, [1] [2]

  • The following patent extends this work, to use redundant form arithmetic throughout the processor, and so avoid carry propagation overhead even in ALU operations, or when an ALU operation is bypassed into a memory address:

United States Patent 5,619,664, Processor with architecture for improved pipelining of arithmetic instructions by forwarding redundant intermediate data forms, awarded April 18, 1997, Inventor: Glew; Andrew F. (Hillsboro, OR); Assignee: Intel Corporation (Santa Clara, CA), Appl. No.: 08/402,322, Filed: March 10, 1995

  1. ^ Heald, R.; et al. (1998). "64 kB Sum-Addressed-Memory Cache with 1.6 ns Cycle and 2.6 ns Latency". ISSCC Digest of Technical Papers. pp. 350–351. doi:10.1109/ISSCC.1998.672519.
金牛座和什么星座最不配 什么是otc fierce是什么意思 尿液发黄是什么病 噤口痢是什么意思
11月7日是什么星座 厚植是什么意思 屁眼火辣辣的疼是什么原因 授记是什么意思 tomboy是什么意思
寿眉属于什么茶 妇科腺肌症是什么病 唐僧被封为什么佛 什么是周记 边界欠清是什么意思
hpv亚型是什么意思 脸上长藓用什么药 柿子和什么不能一起吃 女生自慰是什么感觉 布鲁斯是什么
绿豆和什么一起煮好hcv8jop1ns6r.cn 东华帝君是什么神仙hcv8jop1ns0r.cn 尔时是什么意思hcv9jop0ns5r.cn 人参果是什么季节的hcv8jop5ns0r.cn ipo是什么hcv8jop6ns2r.cn
狗屎运是什么意思hcv9jop8ns3r.cn v1是什么意思hcv8jop9ns1r.cn 百分点是什么意思hcv9jop4ns1r.cn 女生肚脐眼下面疼是什么原因hcv9jop0ns8r.cn 湿疹怎么治用什么药膏hcv8jop6ns2r.cn
早醒是什么原因造成的xinjiangjialails.com 儿郎是什么意思520myf.com 鼎字五行属什么hcv8jop1ns2r.cn 1997年属牛的是什么命hcv9jop6ns4r.cn 经常泡脚有什么好处gangsutong.com
alds是什么病hcv9jop1ns4r.cn 婴儿老是放屁是什么原因hcv7jop4ns6r.cn 绿树成荫是什么季节mmeoe.com 水瓶女喜欢什么样的男生beikeqingting.com 男人身体怕冷是什么原因如何调理bfb118.com
百度