IntSim: A CAD tool for Optimization of Multilevel Interconnect Networks
Intsim: A CAD Tool for Optimization of Multilevel Interconnect Networks
Download IntSim: http://www.monolithic3d.com/simulators.html
Download: IntSim Algorithm and Equations
这篇论文主要介绍了IntSim的工作原理和方法论以及案例。IntSim是一款主要用于优化半导体电路中多层互连网络的电脑辅助设计工具。
从宏观角度看,该论文主要内容大致包括2 + 4 个模型,或者说论文按IntSim算法中涉及到的数据处理以及数据流顺序主要分 2 + 4 + 1 部分(当然工具应用方面有3个案例可以用来验证其正确性,可靠性和实用性):
- 推导出一个新的随机线长分布的模型;这个模型新颖就在于其前提假设芯片上的若干电路区块中(论文中可能定义其为:socket,窝)有逻辑门的可能性是不确定的,换句话说,逻辑门是随机分布,而不是均匀分布在芯片上,其概率等于逻辑门与芯片面积的比值。后面我会有具体说明。这里有个很有趣的事情:与以往的线分布模型不同,新的随机线长分布模型是关于逻辑门尺寸的函数,本质上,如果芯片的面积固定了,我们用更小尺寸的逻辑门,它们能放得更近,所以平均线长减小。实际的算法流程是先根据芯片的工作频率选逻辑门尺寸,再根据逻辑门尺寸推导出随机线长分布,从而从底层到上层bottom-up(也就是tape-out工艺顺序)依次优化1.本地互连,2.中间/半全局互连(或者叫没加Repeater)互连,3.全局互连,4.插上Repeater后的互连,计算功耗,再把这些互连用算法组合起来。
- 描述如何根据芯片的工作频率时序或其他要求选逻辑门尺寸;可参考公式(12)
- 优化本地互连;
- 中间/半全局(或者叫没加Repeater)互连;
- 全局互连;
- 插上Repeater后的互连;
- 把前面涉及到的几个模型(并计算功耗)用算法组合起来。
从微观角度看,论文中主要模型中涉及到需要处理和计算的物理量或者半导体工业界特征(指标)有相关的公式推导或者文献参考。
本文将主要分析IntSim的模型涉及到的基础理论和算法,大体按照论文顺序先从各个模型开始分析,最后基于论文和MonolithIC 3D公司在线文档介绍分析组合算法。
Models used in IntSim v2.0
http://www.monolithic3d.com/intsims-models.html
The Models from the Paper
- Logic Gate Model
- Signal Wire Length Distribution for 2D and 3D-ICs
- Local Interconnect Model
- Intermediate and Semi-Global Interconnect Model
- Global Interconnect Model
- Repeater Insertion Model
- Algorithm to combine all the models and design a 2D or 3D-IC
Notation for the Paper
Nsockets : the number of gate sockets.
Ngates : the number of gates.
pgates : the percentage of die area that is occupied by logic gates.
: a certain length for interconnect, value of l is in gate socket lengths. 互连长度值以门窝长度表示。比如,两个门窝之间隔着3个门窝,则l的值为4。
: the number of gate socket pairs separated by a distance l.
: the average number of interconnects between a gate socket pair separated by l.
: the expected number of interconnects of a certain length l.
P(gate in block A) = pgates : 门窝A有逻辑门的概率,也是逻辑门在芯片上的面积占比。
f.o.: fan-out,系统的平均扇出数,一个gate后面驱动gate或ff, reg的个数。
k : Rent’s constant
p : Rent’s constant
NA : the number of gates in block A
NB : the number of gates in block B
NC : the number of gates in block C
Lavg : 平均线长
td : the delay of a logic path having 2 input NAND gates driving a fan-out f.o.
Ld : logic depth
: a factor, convert point-to-point net length to wiring net length
RNAND : average drive resistance of a minimum size 2 input NAND gate, obtained from equations given in [18]
CNAND : input capacitance of the NAND gate, computed nMOS and pMOS devices are sized equally in a 2 input NAND gate
Cint : capacitance of an average wire
c: capacitance per unit length of a wire
A: die area
F: feature size
W: width
P : global wire pitch
Npower_pads : the number of power pads
: wire resistivity
: current distributed per pad
dpad_to_pad : distance between adjacent power pads
erouter : routing efficiency factor
VIR : user specified IR drop lime for global power wiring
lpad : pad length
D : distance between the drive and load of a H tree
Ro : output resistance of a minimum size inverter
Co : input capacitance of a minimum size inverter
: raio of maximum rise time allowed for clock tree to the clock period
f : clock frequency
cclock : clock wire capacitance per unit length
Plocal : local interconnect pitch, 2F
F : feature size
lmax : the length of the longest wire routed in local interconnect levels
eW : wiring efficiency,绕线效率
erouter : the efficiency of the wire routing tool (typically around 0.5)
epower/gnd : fraction of area used by power and ground wires, obtained from the model for local power distribution networks derived in [20].
evias : fraction of area used by vias
Nwire_higher : the number of wires routed in higher metal levels
Nrepeaters : the number of repeater for higher metal levels
: design rule unit
s : via covering factor which is typically 3 [12]
Nwires_higher : found from the stochastic wiring distribution by finding the number of wires whose length is greater than lmax
: wire resistivity, ()
C : wire capacitance per unit length (F/m)
b : percentage of time circuit is not sleep gated
f : frequency (Hz)
Ro : resistance of minimum sized repeater ()
Co : capacitance of minimum sized repeater (F)
Ileak : leakage of minimum sized repeater (A)
Vdd : supply voltage (V)
: 0.25 (short wires) 0.9 (longer wires)
a : activity
ar : Wire aspect ratio
[18] Models in BACPAC: www.eecs.umich.edu/~dennis/bacpac
[20] R. Sarvari, A. Naeemi, P. Zarkesh-Ha, J. Meindl, “Design and optimization for nanoscale power distribution networks in gigascale systems”, Proc. Intl. Interconnect Technology conference, 2007
[12] Q. Chen, J. Davis, P. Zarkesh-Ha, J. Meindl, “A compact physical via blockage model”, Trans. VLSI Systems, Dec. 2000
1. Derivation of Stochastic Wiring Distribution
– A New Stochastic Wire Length Distribution Model
The new wire length distribution considers random arrangement of gates in a circuit block rather than assuming gates are uniformly distributed all over the chip and then finds a distribution of wire lengths using Rent’s rule.
在这篇论文发表前的人们使用Rent法则研究线长的分布情况是假设芯片上的门都是均匀分布,而论文中这种新的线长分布考虑在一个电路区块中的门是随机分布的。打个不恰当的比方:一个萝卜一个坑,但是一个坑不一定有一个萝卜,比如:菜地里有10 x 10 = 100个坑,这些坑里总共随机长了30个萝卜,那么每个坑里有萝卜的概率是30%。需要注意的是socket一个门窝中可能不止一个逻辑门。(根据公式 7 : NA = 1,是否说明block = socket,一个socket最多只能有一个逻辑门,或者一个block可以有多个逻辑门?)
Define a new quantity called a gate socket. 假设芯片的形状是正方形。
定义了一个新的量:门窝,我把这个量翻译成“门窝”,因为“窝”的含义可能更贴切socket在论文中的含义,为了方便理解并熟记该术语从而在原理上更深刻地理解优化多层互连网络的方法论,我们不妨这样理解:窝从字形上:某个窝,也就是某个“穴内”,有个填空,这就对应于芯片上,某“物理位置”,可能有“逻辑门”,也有可能没“逻辑门”,门窝有逻辑门的概率等于芯片上逻辑门的面积占比,所以,比较形象的说明了下图1。那么门的个数,门窝个数,逻辑门与芯片占比的关系如下:
Ngates = Nsockets • pgates (1)
门的数量 = 门窝数量 x 芯片上逻辑门的面积占比
例如:
10M 门 = 门窝数量 x 50%
得:门窝数量是20M。并且10M门随机分布在这些门窝里。如果门窝数不是整数,则结果四舍五入取整数。
The expected number of interconnects of a certain length l:
(2)
某长度的互连的期望条数 = 该长度分隔的门窝对数量 x 该长度分隔的介于一个门窝对之间的互连的平均条数(2)
可以这样理解,互连长度可能取值范围是(0, ∞),每个值都有对于的N条互连,比如我们想求出长度为10μm的互连的期望条数,那么我们可以将10μm分隔的门窝对数量,比如5对门窝,乘以一个网窝对之间的互连的平均条数,假设平均6条互连,等于30条,那么意味着长度为10微米的期望互连条数有30条。
M(l) = f (l, Nsockets) (3)
[16] J. Davis, V. De, J. Meindl, “Apriori wiring estimations and optimal multilevel wiring networks for portable ULSI systems”, Proc. Electronic Components and Technology Conference, 1996
与Davis[16]推导以线长l分隔的逻辑门对个数的方法类似,求门窝对个数的函数是一个以互连长度和门窝个数为自变量的分段函数,区间分别是:大于等于1小于芯片边长(即两个门窝分布在芯片分布在芯片内部),大于等于边长小于2倍边长(两个门窝可能分布在芯片四边中的某边的一头一尾或者对角顶点上,即它们分布在芯片的边缘)。在此不写具体公式。其实很好理解,直观理解门窝对个数由互连长度和门窝个数决定,在其他条件不变的情况下,当门窝个数越多即芯片越大,门窝对个数越多;其他条件不变的情况下,当互连长度越小,门窝对个数越多。
A gate socket length is defined as the distance between two adjacent gate sockets and is equal to , Davis defines gate pitch as , so a socket length =. 也就是说,因为定义门窝的长度是两个相邻门窝的距离,Davis定义门间距是两个门之间的距离,所以一个门窝长度就是“门窝有门”的概率的开平方。
以图1为例,互连距离为4的门窝对,A-C,求这个互连的平均个数:
(4)
(5)
巧妙就巧妙在,把门窝的个数简化理解为对应的线长值,换句话说,将线长与门窝个数“统一”起来,方便理解且大大降低计算量。好比说一个萝卜一个坑,但是一个坑不一定一个萝卜,比如菜地里有10 x 10 = 100个坑,这些坑里总共随机长了30个萝卜,那么每个坑里有萝卜的概率是30%,而且每个萝卜是随机分布在不同的坑里。萝卜对A和C的距离以坑的尺寸值为单位,比如A坑到C坑的距离为5,即这两个萝卜中间隔了4个坑。需要注意的是socket一个门窝中可能不止一个逻辑门。
(6)
(7)
Combining (2-7) and normalizing, we get the average number of interconnects of length l gate socket lengths (是不是理解成以l个门窝或门窝个数度量互连长度或者?) to be:
The average wire length for this interconnect distribution is
求互连分布的平均线长,等于,总线长除以线的个数,也就是各个长度对应的线长与各自长度对应的线的个数相乘再求和,最后除以线的总个数。
因为线的个数是以线长为自变量的函数,所以积分下上限分别是从位于芯片内部相邻的两个区块的线长(最短的线长,也就是一个门窝的长度,记为1)到位于芯片边缘对角顶点的两个区块的线长(最长的线长,也就是芯片边长的2倍)。
For a large number of gates and p>0.5, this expression can be simplified to
(9)
When gates were uniformly distributed over the die area, Davis derived the expression for average wire length to be
对比上面两个公式,可以发现,很明显,新的线分布情况下平均线长是近似化的Davis平均线长乘以一个因数:这个因数有关Rent常数p和逻辑门占芯片总面积的比例。在大部分经典的电路区块中,逻辑门占总芯片面积的50-75%。
图3对比了22 ISCA’89电路区块的平均线长测量值与Donath分布[18]的预测值,Davis分布的预测值,还有新的分布的预测值。Donath分布有75%的平均误差,Davis分布有38%的平均误差,新模型只有8%-24%的误差,对应于Rent常数pgates范围(0.5-0.75)。
很明显,用新模型预测平均线长比Donath分布模型和Davis分布模型更精确。
从上表2,对比看Davis分布模型与新模型得到的平均线长。在Davis分布模型中标准电路Benchmark circuit里平均误差是26%,而新模型的平均误差只有2%-12%。
很明显,用新模型预测平均线长比Davis分布模型更精确。
图4表明对于有12M逻辑门的36mm2 电路区块,逻辑门占芯片面积比例为0.5,pgates = 0.5,平均扇出数为3,fan-out = 3,Rent’s constant k=4, p=0.55。新的线分布与Davis线分布的区别。公式8和9表明新模型的平均长度比Davis分布模型少27%,从图4b,线性尺度表明线长越短,线长分别的差别越大。
[18] Models in BACPAC: www.eecs.umich.edu/~dennis/bacpac
2. Logic Gate Modeling
Logic gate paths are modeled as multiple stages of 2-input NAND gates between flip-flops, as shown in Fig. M4: Logic path model.
(10)
这里有个很有趣的事情:与前面的线分布模型不同,新的线分布是逻辑门尺寸的函数,本质上,如果芯片的面积固定了,我们用更小尺寸的逻辑门,它们能放得更近,所以平均线长减小。
(10)
becomes
(12)
3. Global Interconnect Modeling
Please refer to the notation and Figure 5 about parameters in Equation 13.
4. Local Interconnect Modeling
IntSim has two wire levels for routing local signal, power and clock wiring.
(14)(15)
本质上,公式14的左边代表2层本地互连可用的绕线的区域的面积,右边是用来绕所有长度(从1到lmax门窝长度)的需要的区域的面积。
(16)
请参考Notation中有关公式里各个参数的定义。
5. Intermediate and Semi-global Interconnect Modeling
中间和半全局互连建模基于公式(17)(18)。
(17)
公式(17)的左边代表2层互连可用的绕线的区域的面积,右边是用来绕线长度处于门窝长度一对线级之间从lmin 到lmax的需要的区域的面积。
P : the pitch of the pair of wiring levels.
Equation (18) 代表情况:在一对金属级的最长的线应该是时钟周期的一部分,参考[8]
(18a)
公式 18a 表示没有插repeaters。No repeaters are inserted
(18b)
公式18b用于电路中插了repeater的阶段,需要考虑Energy-Delay Product minimization strategy[4]. Width of wires is equal to spacing between wires.线宽等于线之间的间距。
: wire resistivity, ()
C : wire capacitance per unit length (F/m)
b: percentage of time circuit is not sleep gated
f : frequency (Hz)
Ro : resistance of minimum sized repeater ()
Co : capacitance of minimum sized repeater (F)
Ileak : leakage of minimum sized repeater (A)
Vdd : supply voltage (V)
: 0.25 (short wires) 0.9 (longer wires)
a : activity
ar : Wire aspect ratio
中间和半全局级绕线效率有三个影响因素
- Repeater 孔禁区因为repeater在更高的金属层
- 对于信号线在更高金属层的孔禁区,以[12]建模
- 电源/地孔禁区,在文献[20]有说明。
The wiring efficiency factor for intermediate and semi-global levels has three sources: (i) Repeater via blockage due to repeaters in higher metal levels (ii) Via blockage to signal wires routed in higher levels that is modeled based on [12] (iii) Power/ground via blockage that is got from equations in [20]. Wire resistivity increases due to size effects are modeled as shown in [21].
6. Algorithm to Combine All the Models and Design a 2D or 3D-IC
The key challenge with combining all the models shown thus far is:
- The design of power interconnects and their area allocation depends on the chip power. However, chip power is not known until repeaters are designed in the multilevel wiring network, especially in sub-90nm chips where repeaters consume a significant fraction of total power.
- Design of the interconnect stack needs some knowledge of via blockage caused by repeaters.
Thus, an iterative process is followed for assigning wires in a multilevel wiring network. Steps involved in simulating the 2D or 3D-IC are:
In IntSim, the process of selecting wire pitches for different interconnect levels proceeds in several steps:
- Input all parameters: The user inputs various details of the system that is being modeled.
- Logic gate sizing: Logic gates are sized based on Equation (12) such that clock frequency targets are reached.Margin: Fraction of clock cycle lost due to skew and process variations
For computing logic gate size W, if W becomes smaller, length becomes shorter, frequency decrease.Logic gate paths are modeled as multiple stages of 2-input NAND gates between flip-flops, as shown in Fig. M4: Logic path model. An average length wire, whose length is determined from the wire length distribution model, exists between two NAND gate stages. NAND gate sizes are determined based on these conditions, using the equations shown in Fig. M5: Equations for determining average gate size. While this is an approximate method to obtain logic gate sizes, area and power, it is simple and widely-used and is therefore adopted for IntSim v2.0. Please see [15][18] for more details of this model.
- Generation of stochastic wiring distribution: Based on logic gate size chosen in Step 2, the fraction of die area occupied by logic gates, pgates, is found. This is used to generate the stochastic wiring distribution given in Equation (8). The IntSim CAD tool relies on predicting the distribution of wire lengths on a chip, and using these for determining gate sizes and multilevel interconnect networks. The value of Rent’s constants k and p are determined from previous generations of a certain chip. For example, various generations of the Intel Pentium chips, such as the Pentium, Pentium Pro and the Pentium 4, all had similar Rent’s constants [17][18]. It is generally accepted that custom chips such as microprocessors have Rent’s constant p = 0.55 while standard-cell ASICs have Rent’s constant p = 0.65 [31], although there may be variations on a case-by-case basis. Rent’s constant k is typically around 4.
- Set baseline parameters for iterations: The design of power interconnects and allocation of area for them depends on the chip power. However, chip power is not known until repeaters are designed in the multilevel wiring network, especially in sub-90nm chips where repeaters consume a significant fraction of total power. Also, design of the interconnect stack needs some knowledge of via blockage caused by repeaters. Thus, an iterative process is followed for assigning wires in a multilevel wiring network. An initial chip power estimate is set (as 100W, say) and the number of repeaters is set as 0.
- Local interconnect modeling: Local wire pitch is set as 2F. Using Equations (14), (15) and (16), the longest wire routed in M1 and M2 is determined.公式14可以算出本地互连的最长的信号线IntSim v2.0 has two metal levels for each device layer to route local signal, power and clock interconnect networks. These metal levels are also designed to remove heat from transistors in stacked device layers. Local interconnect pitch is selected as 2F, where F is the minimum feature size. Length of the longest signal wire routed in local interconnect levels is obtained from the equations shown in Fig. M11. Power wire efficiency is obtained from the local power grid model derived in [26] and also from considerations shown in the thermal models page. Via blockage to higher metal levels and via blockage due to repeaters are modeled using equations derived in [27].
- Arrangement of wires without repeaters: Once the longest wire routed in M1/M2 is determined, it is set as lmin in Equation (17)一旦步骤5本地互连的最长线长确定,即是公式17中的积分下限. Equations (17) and (18a) are then used to find the pitch of M3/M4 and and maximum wire length routed in them. This in turn is set as lmin for the next pair of metal levels and this process continues till the longest interconnect of the wiring distribution is assigned a pitch.
Intermediate and Semi-Global Interconnect Model(这步也称为“没有加repeater”的线布局阶段)
A 2D-IC, or each device layer of a 3D-IC, in IntSim v2.0 has its own set of intermediate and semi-global interconnects. The wire pitches of these interconnect levels are obtained using the equations shown in Fig. M12. Essentially, The right hand side of Eq. (1) denotes the area required for routing wires in a pair of wire levels, 公式(1)的右边是绕线的要求面积and the left hand side denotes the area available for routing,公式(1)的左边是绕线的可用面积. Here, P is the pitch of the pair of wiring levels. Eq. (2) and Eq. (3) represent the condition that the delay of the longest wire in a pair of metal levels should be a certain fraction of the clock period, as discussed in [24]. For short-wires whose delay is typically logic gate dominated, this fraction is set as 0.25, and for long-wires whose delay is wire dominated, this fraction is typically around 0.8-0.9. Eq. (2) represents this criterion when no repeaters are inserted while Eq. (3) represents the case when repeaters are inserted with the Energy-Delay Product minimization strategy discussed in the repeater models section. Width of wires is equal to spacing between wires.The wiring efficiency factor for intermediate and semi-global levels has two sources. The first source is via blockage due to vias to higher levels of metal and due to repeaters. These are modeled based on [27]. The second source is power via blockage that is modeled based on [26]. Wire resistivity increases due to size effects are modeled as shown in [26].
- Global interconnect modeling: A top-down process of global interconnect pitch selection and repeater insertion then begins. Global wire pitch is constrained to be the value found from Equation (13). The area needed for routing power wires is then found from equations given in [13], and this helps calculate the area available for signal wires in global wire levels. Clock wire area is neglected in IntSim because previous work has shown it is small [22]. Repeaters are inserted into these global signal wires, and the shortest signal wire routed in global wire levels is found based on a formula similar to Equation (17).
- Assignment of wires with repeaters: Based on the length of shortest global signal wire, wires with repeaters are assigned to the pair of metal levels below the global wire levels based on Equations (17) and (18b). The pitch and shortest wire lmin are found for this pair of wiring levels and this lmin is set as lmax for the pair of wiring layers below it. Repeater insertion is performed for the pair of wiring layers below it and this keeps continuing till one runs out of die area for placing more repeaters or till the addition of repeaters does not improve wire delay.
Eq. (3) represents the case when repeaters are inserted with the Energy-Delay Product minimization strategy discussed in the repeater models section. Width of wires is equal to spacing between wires.
The wiring efficiency factor for intermediate and semi-global levels has two sources. The first source is via blockage due to vias to higher levels of metal and due to repeaters. These are modeled based on [27]. The second source is power via blockage that is modeled based on [26]. Wire resistivity increases due to size effects are modeled as shown in [26].
- Power computation and iteration: Once repeaters are assigned, the total chip power is calculated. Logic gate power is found using device widths calculated in Step 2 and formulate given in [18]. Local clock power is computed by extending models in [23]. Wire power is calculated based on the stochastic wiring distribution [8], and repeater power is calculated based on Step 8 and repeater power models given in [24]. Leakage power variability is modeled as discussed in [25]. If the total power calculated is different from the power estimate used for designing power distribution wiring, IntSim sets and goes back to Step 5. For the next iteration, the number of repeaters is set as the value calculated in Step 8, Assignment of wires with repeaters. 因为有repeater的线在本地互连阶段是不知道的,有via,blockage,等等,在global阶段才确定repeater,所以,循环是为了确定repeater的数量是合理的。
- Data output: When the simulation converges, the total number of wire levels, pitches of each wire level and a power estimate are output.
Download IntSim
http://www.monolithic3d.com/simulators.html