ࡱ>   za߿ߝXo( UUUUzag\Oy Ek( DD_(` / 0DTimes New RomanTTXܖ 0ܖDTahomaew RomanTTXܖ 0ܖ DWingdingsRomanTTXܖ 0ܖ0De0}fԚingsRomanTTXܖ 0ܖ@DArialngsRomanTTXܖ 0ܖPDSymbolgsRomanTTXܖ 0ܖ A.  @n?" dd@  @@`` >%-DH -   . # $%   !81/Xr$߿ߝXoifr$g\Oy Ekii <AA`MMM(@MMM8ʚ;ʚ;g4BdBdp 04ppp@ <4!d!d k 0T4<4dddd k 0T4 <4BdBd l 0T00___PPT10 ___PPT9nu=!BWk@~PNG  IHDRF} PLTE3:tRNS@f cmPPJCmp0712Om9IDATc``b $<&40(Zжj˂AtM iIENDB`. 225 !?B+ .\ELEG323-05F\Topic4.pptO =S !Topics 4: Performance Measurement""c0 "  Reading List c(  VSlides: Topic4 Henn & Patt: Chapter 4 Other papers as assigned in class or homework  "  "  "  " . " xVccV   Performance c(  0 An attempt to quantify how well a particular computer can perform a user s applications Problems: Essentially a software+hardware issue Different machines have different strengths and weaknesses There is an enormous amount of hype and outright deception in the market  be waryYZ2 Z -Z[cccc  Conflicting Goalsc(   User: Find the most suitable machine to get the job done at the lowest cost Application-oriented metrics Vendor: Persuade you to buy their machine regardless of your needs hardware-oriented metricslx2_xccHccc=cc  Why Study Performance?c(   Know the vocabulary and understand the issues, so that: As a user/buyer, you can make better purchasing decisions As an engineer, you can make better hardware/software design decisionn92: - -F -:cc  71Summary of Metricsc(  yLatency and throughput CPU time, CPI, clock rate and instruction count MIPS, relative MIPS SPEC ratio and rate Benchmarkszza z  Latency vs. Throughputc(  These are two very different metrics! Latency: How long does it take to get a particular task done? - Also called execution time or running time - Usually measured in time (e.g., microseconds) Throughput: How many tasks can you perform in a unit of time? - Also related to bandwidth (communication channels, storage) - Usually measured in units per time (e.g., megabytes/ second) Relationship between them &>_ - - -.c7c`c cccgg   Performance Expressed as TimePc(  Absolute time measures Difference between start and finish of an operation Synonyms: running time, elapsed time, completion time, execution time, response time, latency Relative (normalized) time measures Running time normalized to some reference time  "  -$ " 0 - -cc$c1c   (Choosing a Time-Based Performance Metric)P)c( ) dGuiding principle: choose performance measures that track running time Performance Higher performance means it takes less time to run the application, so bigger is betterB " XcZc   The Nature of Execution Timec(  Execution time on a computer is typically divided into: User time: Time spent executing instructions in the user code System time: Time spent executing instructions in the kernel on behalf of the user code (e.g., opening files) Other: Time when the system is idle or executing other programs Use  time and  top commands in Unix to see these8>n@3Bc6c cdccAcccc c [  Illustration of Execution Timec(  f Real or  wall clock time is the sum of all three44c 3 82CPU Time vs. Latencyc(  M- The time CPU spends for computing the given task, not including the time waiting for I/O or running other programs. Also known as CPU execution time -Consists of user CPU time and system CPU time. User CPU time: Total time CPU spends in the task System CPU time: Total time CPU spends in operating system for the sake of the task.vv"0 -vc"c0c c  N  (Application Metrics vs. Hardware Metrics)P)c( )  How do you relate the application-oriented performance measurements to what is going on inside the machine? Most processors are synchronous, so we can use the clock as a basis.@xccc   Clock Cycles c(  Clock  ticks refer to clock edges (rising or falling) Cycle time (period) = time between ticks = seconds per cycle Clock rate (frequency) = cycles per second (1 Hz = 1 cycle/sec) A 2GHz clock has a cycle time ofh " ;cg(*cg(Kc  Measuring Timec(  >If you re lucky, you can count clock cycles directly; some CPUs have a built-in counter which increments every clock cycle. If you re not, you have to use a slower clock. Most systems have extra hardware which generates a regular tick; many operating systems will count these ticks for you. Timing accuracy limited by the resolution of the clock  you get less accurate readings off a 1Hz clock than a 1MHz clock!j| " x " ( " x " ({ " xc  Cycles and Instructionsc$  In almost all processors, a single instruction (executing one line of assembly code) requires more than one clock cycle. Either: - One instruction must finish before the next can begin - Consecutive instructions may overlap ( pipelining ) In most processors, different types of instructions may take different numbers of cycles (e.g., integer vs. floating point)  "  " 9 - -8 -| "  " qcp   Relating cycles and Instructions!P!c( ! So we can add the following to our vocabulary: Cycles per instruction (CPI)  smaller is better Instruction per cycle (IPC) bigger is better If the cycles to execute one instruction vary depending on the instruction, then the average CPI or IPC of a program will depend on how many of each type of instruction is executed.01 "  "  " 0ccg(.cg(c F  Clock, CPI and Instruction Count!P!c( !  Clock rate - Hardware technology and organization CPI - Instruction set architecture Instruction - Instruction set architecture and count compiler technology - CPI should be measured, instead of check  Manuals Why? ( affected by many factors, e.g Cache/memory, etc.) - The most important is time : lower inst. count may increase instruction clock cycle time Fc g+c g"c gcgc z Examplec(   A program requires executing 100 million instructions on a processor which typically takes 2 CPI with a 2GHz clock. How much time will the program take?xc  Answerc(  p Or you can work backwards from a known execution times and clock rate to calculate the CPI for a given program.qZqc p ;5How to Improve the Performance? P c(  Reduce the number of instructions to execute Increase the number of instructions per cycle Concurrent execution of instructions Increase clock rate- "  " (. " % "  " .c.c%cc  4. Weighted CPI P c(   Sometimes it is useful in designing the CPU to calculate the number of total CPU clock cycles as CPU clock cycles = (CPIi * Ii)fxckckc  5/ Weighted CPI P c(   Where Ii represents number of times instruction of type i is executed in a program and CPIi represents the average number of clock cycles for instruction of type i. This form can be used to express CPU time as CPU time =( (CPIi * Ii))/clock ratexck0cc!ckccckckcc c  60CPI Should Be Measured g(   CPI should be measured and not just calculated from a table in the back of a reference manual Always bear in mind that the real measure of computer performance is time. ^_x2Mxcgc  Hardware-Oriented Metricsc(  "Clock rate and IPC are often combined into various figures of merit: MIPS (Millions of Instructions Per Second)  pronounced  mips MOPS (Millions of Operations Per Second)  pronounced  mops MFLOPS (Millions of Floating-point Operations Per Second)  pronounced  megaflops and sometimes written  megaFLOPS Replace first letter with K (kilo), G (giga), T (tera), P (peta), etc., as appropriate. xE( " x " 2Z " Fcg(;cg(9cg(ccc cccccc c  'Problems with Hardware-Oriented Metrics(P(c( ( Processors with different ISAs may require a different number of instructions to perform the same task, so MIPS hard to compare - MOPS and MFLOPS are a somewhat better measure - How do you count floating-point divides? Vendors usually report  peak ratesf "  " 2] " x " 2$ " c  "MIPS CalculationPc(  | One alternative to time as the metric is MIPS, or million instructions per second. For a given program, MIPS is simply nx(<*cgNck | *$Limitations of MIPSc(  -Meaningful only for comparing machines with same ISA, same program, and same input Instruction capability not considered -May vary inversely with performance! Instruction count is an absolute number without considering the frequency of each instruction classzTx&x&xdxTc&c&cd c    MIPS - What is Wrong with It ?c(    A number of popular measures have been adopted in the quest for a standard measure of computer performance, with the result that a few innocent terms have been twisted from their well-defined environment and forced into a service for which they were never intended.B xcguc   93"Misleading Performance Measurement##c( # :-MIPS=instruction count/(execution time*106) MIPS1= MIPS2=-*ckcckckc ; !$Key: Execution Time of Real Programs%P%c( % ` The authors position is that the only consistent and reliable measure of performance is the execution time of real programs, and that all proposed alternatives to time as the metric or to real programs as the items measured have eventually led to misleading claims or even mistakes in computer design. d1^cgcgc 0 &  What is MIPS?c(   Meaningless Indication of Processor Speed - Bob Estall Computer, 1987N/;P<cc c i '!&MIPS Is Not A Multidimensional Measure'P'c( ' |A computer system is multidimensional - therefore should be measured by some  vector ; MIPS is a scalar - measures only one dimension; MIPS is a very useful measure within it s dimension.dW "  " (1 "  " (5 " c  -  0` @EoOV` @Eff؂o` MMMwww` 33f3Ƨgzf` 3ffE` JH3f̙ff` 33̙fRP` =bf>?" dd@,?wnd@ n< w_@nA``< n?" dd@   @@``PP   @ ` ` p>> @(8(CC '(   !T    "b   # " \    "B   HDA "B   HDA "B   HDA "@@B   HDA "B   HDA "B   HDA "B   HDA "@@B   HDA "B   HDA "B   HDA "B   HDA "@@B   HDA "  B   HDA "  B   HDA "  B   HDA "@ @ B   HDA "  B   HDA "  B   HDA "  B   HDA "@@B   HDA "B   HDA "B   HDA "z\    "B   HDA "B   HDA "B   HDA "@@B   HDA "B   HDA "B !  HDA "B "  HDA "@@B #  HDA "B $  HDA "B %  HDA "B &  HDA "@@B '  HDA " B (  HDA " B )  HDA " B *  HDA "@ @ B +  HDA " B ,  HDA " B -  HDA " B .  HDA "@@B /  HDA "B 0  HDA "B 1  HDA "B 2  HDA "@@B 3  HDA "B 4  HDA "B 5  HDA "B 6  HDA "@@B 7  HDA "B 8  HDA " 9  # t?A?60%"@`tB :  6D"tb `  ; # "|i4 tB < B 6D"`  tB =  6D"PP 2 > B  BCENGGHʲI[TQ zR(VzR(V[T`TzR(V[T`T" ? 6tz " z X Click to edit Master title style!!  @ vz Rectangle: Click to edit Master text styles Second level Third level Fourth level Fifth level"0 z RClick to edit Master text styles Second level Third level Fourth level Fifth level!    S  A 601z "`` z h*c    B 6 #z "`  z j*c    C 6z "`  z j*c   H  0޽h ?>  @Eff؂o___PPT92p22 Blueprint*  0 **@GG*(  B$T   "6b  # "  T??"@`\   "B  HDA "B  HDA "B  HDA "@@B   HDA "B   HDA "B   HDA "B   HDA "@@B   HDA "B  HDA "B  HDA "B  HDA "@@B  HDA "  B  HDA "  B  HDA "  B  HDA "@ @ B  HDA "  B  HDA "  B  HDA "  B  HDA "@@B  HDA "B  HDA "B  HDA "B  HDA "B  HDA "B  HDA "@@B  HDA "B   HDA "B ! HDA "B " HDA "@@B # HDA "B $ HDA "B % HDA "B & HDA "@@B ' HDA " B ( HDA " B ) HDA " B * HDA "@ @ B + HDA " B , HDA " B - HDA " B . HDA "@@B / HDA "B 0 HDA "B 1 HDA "B 2 HDA "@@B 3 HDA "B 4 HDA "B 5 HDA "B 6 HDA "@@B 7 HDA "B 8 HDA "tB 9 6D"$\ /c3  : "/c3 B ; 6D"/3 ,$D  0tB < 6D" tB = 6D"c2 >B  ZBCENGGHʲI[TQ zR(VzR(V[T`TzR(V[T`T"]Ft\  ? "tB @ 6D"r r tB AB 6D"442 B  ZBCENGGHʲI[TQ zR(VzR(V[T`TzR(V[T`T"   C 6.  "Pp  .  X Click to edit Master title style!!  D Tn.  Rectangle: Click to edit Master text styles Second level Third level Fourth level Fifth level"%p0u  .  [#Click to edit Master subtitle style$$  E 6r.  "`` .  h*c    F 6`w.  "`  .  j*c    G 6,.  "`  .  j*c   H  0޽h ?/ >B @Eff؂of 0 ,v(  , , 0S 18    j*c    , 0S g 8   l*c   d , c $ ?4d    , 0S  6b   RClick to edit Master text styles Second level Third level Fourth level Fifth level!    S  , 6dS (1`  S j*c    , 6S (g ` S l*c   H , 0bf@ ? ̙3380___PPT10.. PHt(  t t 00] 18  ] h*c    t 0 ] g 8 ] j*c    t 6(] (1`  ] h*c    t 6t] (g ` ] j*c   H t 0bf@ ? ̙3380___PPT10. 0 0(  x  c $h. Dp@ .    .  Rectangle: Click to edit Master text styles Second level Third level Fourth level Fifth levelp  8Introduction to Computer Systems Engineering (CPEG 323)<9(wn.c a 9 H  0޽h ? ̙33y___PPT10Y+D=' = @B +  0 P|6(  |x | c $ ?    ~ | s * @ Z=     H | 0޽h ? 333gggy___PPT10Y+D=' = @B +m  0 `$(  r  S P% ?    r  S  @     H  0޽h ? @Eff؂oy___PPT10Y+D=' = @B +m  0 p$(  r  S h~?   ~ r  S ~@ S  ~ H  0޽h ? @Eff؂oy___PPT10Y+D=' = @B +m  0 $(  r  S 8~?   ~ r  S ,t~@   ~ H  0޽h ? @Eff؂oy___PPT10Y+D=' = @B +m  0 $(  r  S L?   L r  S |L@ 0  L H  0޽h ? @Eff؂oy___PPT10Y+D=' = @B +m  0 $$(  $r $ S 8~?   ~ r $ S ~@ x ~ H $ 0޽h ? @Eff؂oy___PPT10Y+D=' = @B +m  0 0$(  0r 0 S D'L?   L r 0 S (L@ nB L H 0 0޽h ? @Eff؂oy___PPT10Y+D=' = @B +  0 4i(  4r 4 S 7L?   L r 4 S h8L@ Gg L =8 P P0  4 (r ZB 4 s *Dx @@ 4 < ck c   H <L T  M1c   H <LQ e K  Mxc  `8    H  ZB H s *D   H <Lm   Psec.c   H <L   v109 nsec> ckc   H <LR RL  X = 0.5 nsec.  c  H H 0޽h ? @Eff؂oy___PPT10Y+D=' = @B +m  0  L$(  Lr L S  L?   L r L S |@ ~. L H L 0޽h ? @Eff؂oy___PPT10Y+D=' = @B +m  0 0X$(  Xr X S ,L?   L r X S L@ v  L H X 0޽h ? @Eff؂oy___PPT10Y+D=' = @B +m  0 @\$(  \r \ S @L?   L r \ S L@  2 L H \ 0޽h ? @Eff؂oy___PPT10Y+D=' = @B +  0  P(    3 r|Lgֳgֳ ?@ 2Z L   BTL ?   L H  0޽h ? ̙33y___PPT10Y+D=' = @B +m  0 ``$(  `r ` S F?   F r ` S F@ 4  F H ` 0޽h ? @Eff؂oy___PPT10Y+D=' = @B +  0 c [ pd (  dr d S L ?    r d S M @ W      d <O F  Mxc  XB  d 0DY K   d BtT  , y  W instruction  c    d B|X w -  T2 cycles  c   d <[ yj s 1 x 108instructions>ck c  RB d s *Dh4 d <` 0k |2 x 109 cycles>ckc   d <f  T1 second  c   d <j q k Mxc   d < % ,$D  0 [ = 0.1 secondsc  0l  "  dd V^,$D 0 d m  Rectangle: Click to edit Master text styles Second level Third level Fourth level Fifth level   { CPU time = * 0wnZ c  ZB d s *DW   d <z   V Clock rate  c   d 0r h "  cInstruction count * CPIc  H d 0޽h ? @Eff؂o; 3 ___PPT10 +.D ' } = @B Dz ' = @BA?%,( < +O%,( < +Du' =%(D' =%(D' =A@BBBB0B%(D' =1:Bvisible*o3>+B#style.visibility<*d%(D' =+4 8?RCBBCB#ppt_wB*Y3>B ppt_w<*dD' =+4 8?RCBBCB#ppt_hB*Y3>B ppt_h<*dD' =+K4 8?CBB#ppt_x+(cos(-2*pi*(1-$))*-#ppt_x-sin(-2*pi*(1-$))*(1-#ppt_y))*(1-$)CB?B*Y3>B ppt_x<*dD' =+K4 8?CBB#ppt_y+(sin(-2*pi*(1-$))*-#ppt_x+cos(-2*pi*(1-$))*(1-#ppt_y))*(1-$)CB?B*Y3>B ppt_y<*dD4' =%(D' =%(D' =4@BBBB%(D' =1:Bvisible*o3>+B#style.visibility<*d%(+8+0+d0 +  0 <4(    3 r gֳgֳ ??      3 r gֳgֳ ?@ \M    H  0޽h ? ̙33y___PPT10Y+D=' = @B +V  0 }u (    3 rl gֳgֳ ?@ )      T gֳgֳ? X , &  Kn `    T gֳgֳ?  W  Oi=1 b    <\  ? %   KSa    B@  ?    H  0޽h ? ̙33y___PPT10Y+D=' = @B +  0  (    3 rp gֳgֳ ?@ ||    5F 5e    D R N a   5e   T gֳgֳ?}  Kn `    TX gֳgֳ?a   Oi=1 b    <   KSa    B  ?      < sm \ Cont d g  H  0޽h ? ̙33y___PPT10Y+D=' = @B +  0  (    3 r gֳgֳ ?@ ##]      B  ?    H  0޽h ? ̙33y___PPT10Y+D=' = @B +   0 h(  hr h S  ?     h S  @ 0<$D  0   S h < =w 3,$  0 ( or even E (exa), Z (zeta) ..)B   .    H h 0޽h ? @Eff؂oaY___PPT109+D' = @B D' = @BA?%,( < +O%,( < +D' =%(D' =%(D1' =4@BBBB%(D' =1:Bvisible*o3>+B#style.visibility<*h8%(D' =-m6Bbox(in)*<3<*h8D' =%(D' =%(D>' =A@BBBB0B%(D' =1:Bvisible*o3>+B#style.visibility<*h%(D' =-m6Bbox(in)*<3<*h+8+0+h0 +m  0 l$(  lr l S  ?    r l S  @ v    H l 0޽h ? @Eff؂oy___PPT10Y+D=' = @B +  0   (    3 r gֳgֳ ?@ )      T gֳgֳ?   RMIPS =  a  pB  H1?( (   Tp gֳgֳ?z tX  K= a    <X 0 RG P  [Instruction counta    <䕫 0 P  T Clock rate  a    <|  Q4  Execution time * 106 8aia    <؝  4  c CPI * 106* ai  pB  H1?& o+(   BD  ?    H  0޽h ? ̙33y___PPT10Y+D=' = @B +  0 <4(    3 r\ gֳgֳ ?? W     3 r4 gֳgֳ ?@ (    H  0޽h ? ̙33y___PPT10Y+D=' = @B +  0 <4(    3 rī gֳgֳ ?? W     3 rū gֳgֳ ?@ (    H  0޽h ? ̙33]E  0 ((67((  x  c $ӫ ?      c $ԫ @  r<$ 0   Gv O 3 # #""te f   <׫ ?M3 e1c   @`  <\ ?M3 e1c   @`  < ? M3 f10c   @`  < ?O M 3 n Compiler 2  c   @`   < 7 ?FM e1c   @`   < ?FM e1c   @`   <\ӡ? FM e5c   @`   <ܡ?O F M n Compiler 1  c   @`   <,ס?SF eCc   @`  <?SF eBc   @`  <? SF fA c   @`  <S? S ;Instruction counts (in billions) for each instruction class<<c <  @`  < S?O  F { Code from  c    @``B  0o ?O  `B  01 ?O FF`B  01 ?O MMfB  6o ?O 33`B  0o ?O O FfB  6o ?3`B  01 ? SS`B  01 ?  3`B  01 ?S3`B  01 ?S3fB  6o ?O FO 3fB  6o ? . r yv  # #"! s +  <S?v  e3c   @`   <X!S? e2c   @` ! <*S? e1c   @` " <3S? gCPIc   @` # < ,$D  0 !{(1*5+2*1+3*1)*109}/(500*106)=20sb" 2ck ckc " H 3 0lS9 3 ,$D  0 "{(1*10+2*1+3*1)*109}/(500*106)=30sb# 2ck ckc # @ 4 0DtSd Y^,$D  0 {(5+1+1)*109}/(20*106)=350b 2 ckckc  A 5 0|Si>c,$D  0 {(10+1+1)*109}/(30*106)=400b 2 ckckc  > 6 S Rectangle: Click to edit Master text styles Second level Third level Fourth level Fifth levele .w ,$D 0 -Execution time=S(CPIi*li)/clock rate Execution time1= Execution time2= & wn" n<c cckckc H  7 0xS  `Clock rate: 500MHZ 2c  H  0޽h ? @Eff؂ovn___PPT10N.+~*D'  = @B D' = @BA?%,( < +O%,( < +DA' =%(D' =%(D' =A@BBBB0B%(D' =1:Bvisible*o3>+B#style.visibility<*6%(D{' =%(D#' =%(D' =A@BBBB0B%(D' =1:Bvisible*o3>+B#style.visibility<*1%(D' =+4 8?\CB#ppt_xBCB#ppt_xB*Y3>B ppt_x<*1D' =+4 8?dCB1+#ppt_h/2BCB#ppt_yB*Y3>B ppt_y<*1D{' =%(D#' =%(D' =A@BBBB0B%(D' =1:Bvisible*o3>+B#style.visibility<*3%(D' =+4 8?\CB#ppt_xBCB#ppt_xB*Y3>B ppt_x<*3D' =+4 8?dCB1+#ppt_h/2BCB#ppt_yB*Y3>B ppt_y<*3Ds' =%(D' =%(D' =A@BBBB0B%(D' =1:Bvisible*o3>+B#style.visibility<*-%(D' =A@BBBB0B%(D' =1:Bvisible*o3>+B#style.visibility<*-4%(D' =A@BBBB0B%(D' =1:Bvisible*o3>+B#style.visibility<*4;%(D{' =%(D#' =%(D' =A@BBBB0B%(D' =1:Bvisible*o3>+B#style.visibility<*4%(D' =+4 8?\CB#ppt_xBCB#ppt_xB*Y3>B ppt_x<*4D' =+4 8?dCB1+#ppt_h/2BCB#ppt_yB*Y3>B ppt_y<*4D{' =%(D#' =%(D' =A@BBBB0B%(D' =1:Bvisible*o3>+B#style.visibility<*5%(D' =+4 8?\CB#ppt_xBCB#ppt_xB*Y3>B ppt_x<*5D' =+4 8?dCB1+#ppt_h/2BCB#ppt_yB*Y3>B ppt_y<*5+P+0+0 ++0+10 ++0+30 ++0+40 ++0+50 ++0+60 +\  0  0(    3 rhYSgֳgֳ ?@ /  S   BS ?   S H  0޽h ? ̙33  0 B:@(    3 r Sgֳgֳ ?? V  S   C xSgֳgֳ ?@  0| S H  0޽h ? ̙33  0 TLP(    3 rXzgֳgֳ ?@ >  z   s Dzgֳgֳ ??   z H  0޽h ? ̙33r@_L0 ~ Ph"0$3  T90D#RF& *O4`*m/g6 .IU;&X;1`(` / 0DTimes New RomanTTXܖ 0ܖDTahomaOh+'08 hp  < H T`hPowerPoint PresentationPGaoPC:\Program Files\Microsoft Office\Templates\Presentation Designs\Blueprint.potGuang R. Gao73Microsoft PowerPoint@~@@w{s@M(pG(7g  9  y--$xx--'--$:::--'-B( DD--$ --'-B( DD--$ --'-B( DD--$ --'-B( DD--$ --'-B( DD--$ --'-B( DD--$  --'-B( DD--$ $$%%%%%%%%$$--'-B( DD--$ ****++++****--'-B( DD--$ ///00000///--'-B( DD--$ 445555555544--'-B( DD--$ ::::;;;;::::--'-B( DD--$ ???@@@@@???--'-B( DD--$ DDEEEEEEEEDD--'-B( DD--$ JJJJKKKKJJJJ--'-B( DD--$ OOOPPPPPOOO--'-B( DD--$ TTUUUUUUUUTT--'-B( DD--$ ZZZZ[[[[ZZZZ--'-B( DD--$ ___`````___--'-B( DD--$ ddeeeeeeeedd--'-B( DD--$ jjjjkkkkjjjj--'-B( DD--$ ooopppppooo--'-B( DD--$ ttuuuuuuuutt--'-B( DD--$ xxxxx--'-B( DD--$ x x x x x --'-B( DD--$ xxxx--'-B( DD--$ xxxxx--'-B( DD--$ xxxxx--'-B( DD--$ x xxx --'-B( DD--$ %%x%x%x%x$x$%%%%%--'-B( DD--$ ++x+x*x*x*x***+++--'-B( DD--$ 00x0x/x/x//0000--'-B( DD--$ 55x5x5x5x4x455555--'-B( DD--$ ;;x;x:x:x:x:::;;;--'-B( DD--$ @@x@x?x?x??@@@@--'-B( DD--$ EExExExExDxDEEEEE--'-B( DD--$ KKxKxJxJxJxJJJKKK--'-B( DD--$ PPxPxOxOxOOPPPP--'-B( DD--$ UUxUxUxUxTxTUUUUU--'-B( DD--$ [[x[xZxZxZxZZZ[[[--'-B( DD--$ ``x`x_x_x__````--'-B( DD--$ eexexexexdxdeeeee--'-B( DD--$ kkxkxjxjxjxjjjkkk--'-B( DD--$ ppxpxoxoxoopppp--'-B( DD--$ uuxuxuxuxtxtuuuuu--'-B( DD--$ {{x{xzxzxzxzzz{{{--'-B( DD--$ xxxx--'-B( DD--$ xxxxx--'-B( DD--$ xxxxx--'-B( DD--$ xxxx--'-B( DD--$ xxxxx--'-B( DD--$ xxxxx--'o--%)--'o--%A--'o--%Y55--'o--%t --'o-- %     --'o--%)__--'o--%6h--'o-- %__``aaa``_^]]]--'@Tahoma-. @E2 u 2005/10/23."System9-@Tahoma-. @E 2 uC\.-@Tahoma-. @E2 uDELEG323.-@Tahoma-. @E 2 uL-.-@Tahoma-. @E 2 uM05F.-@Tahoma-. @E 2 uP\.-@Tahoma-. @E2 uQ Topic4.ppt.-@Tahoma-. @E 2 u1.-@Arial-. @E(2 &Topics 4: Performance  .-@Arial-. @E2 2/ Measurement  .-@Arial-. @E92 F#!Introduction to Computer Systems .-@Arial-. @E2 MC Engineering.-@Arial-. @E2 ^G (CPEG 323).-Root EntrydO) gPicturesCurrent User,SummaryInformation(8fԚingsRomanTTXܖ 0ܖ@DArialngsRomanTTXܖ 0ܖPDSymbolgsRomanTTXܖ 0ܖ A.  @n?" dd@  @@`` >%-DH -   . # $%   !81/Xr$߿ߝXoifr$g\Oy Ekii <AA`MMM(@MMM8ʚ;ʚ;g4BdBdp 04ppp@ <4!d!d k 0T4<4dddd k 0T4 <4BdBd l 0T00___PPT10 ___PPT9nu=!BWk@~PNG  IHDRF} PLTE3:tRNS@f cmPPJCmp0712Om9IDATc``b $<&40(Zжj˂AtM iIENDB`. 225 !?B+ .\ELEG323-05F\Topic4.pptO =S !Topics 4: Performance Measurement""c0 "  Reading List c(  VSlides: Topic4 Henn & Patt: Chapter 4 Other papers as assigned in class or homework  "  "  "  " . " xVccV   Performance c(  0 An attempt to quantify how well a particular computer can perform a user s applications Problems: Essentially a software+hardware issue Different machines have different strengths and weaknesses There is an enormous amount of hype and outright deception in the market  be waryYZ2 Z -Z[cccc  Conflicting Goalsc(   User: Find the most suitable machine to get the job done at the lowest cost Application-oriented metrics Vendor: Persuade you to buy their machine regardless of your needs hardware-oriented metricslx2_xccHccc=cc  Why Study Performance?c(   Know the vocabulary and understand the issues, so that: As a user/buyer, you can make better purchasing decisions As an engineer, you can make better hardware/software design decisionn92: - -F -:cc  71Summary of Metricsc(  yLatency and throughput CPU time, CPI, clock rate and instruction count MIPS, relative MIPS SPEC ratio and rate Benchmarkszza z  Latency vs. Throughputc(  These are two very different metrics! Latency: How long does it take to get a particular task done? - Also called execution time or running time - Usually measured in time (e.g., microseconds) Throughput: How many tasks can you perform in a unit of time? - Also related to bandwidth (communication channels, storage) - Usually measured in units per time (e.g., megabytes/ second) Relationship between them &>_ - - -.c7c`c cccgg   Performance Expressed as TimePc(  Absolute time measures Difference between start and finish of an operation Synonyms: running time, elapsed time, completion time, execution time, response time, latency Relative (normalized) time measures Running time normalized to some reference time  "  -$ " 0 - -cc$c1c   (Choosing a Time-Based Performance Metric)P)c( ) dGuiding principle: choose performance measures that track running time Performance Higher performance means it takes less time to run the applicatio  !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ fBj  !"#$%&'()*+,-./0123456789:;<=CDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abh  hted CPICPI Should Be MeasuredHardware-Oriented Metrics(Problems with Hardware-Oriented MetricsMIPS CalculationLimitations of MIPS#MIPS - What May Go Wrong with It ?#Misleading Performance Measurement%Key: Execution Time of Real ProgramsWhat is MIPS?'MIPS Is Not A Multidimensional Measure  Fonts UsedDesign Template Slide Titles!4 _fLongLongGuang R. Gew RomanTTXܖ 0ܖ DWingdingsRomanTTXܖ 0ܖ0De0}fԚingsRomanTTXܖ 0ܖ@DArialngsRomanTTXܖ 0ܖPDSymbolgsRomanTTXܖ 0ܖ A.  @n?" dd@  @@`` >%-DH -   . # $%   !81/Xr$߿ߝXoifr$g\Oy Ekii <AA`MMM(@MMM8ʚ;ʚ;g4BdBdp 04ppp@ <4!d!d k 0T4<4dddd k 0T4 <4BdBd l 0T00___PPT10 ___PPT9nu=!BWk@~PNG  IHDRF} PLTE3:tRNS@f cmPPJCmp0712Om9IDATc``b $<&40(Zжj˂AtM iIENDB`. 225 !?B+ .\ELEG323-05F\Topic4.pptO =S !Topics 4: Performance Measurement""c0 "  Reading List c(  VSlides: Topic4 Henn & Patt: Chapter 4 Other papers as assigned in class or homework  "  "  "  " . " xVccV   Performance c(  0 An attempt to quantify how well a particular computer can perform a user s applications Problems: Essentially a software+hardware issue Different machines have different strengths and weaknesses There is an enormous amount of hype and outright deception in the market  be waryYZ2 Z -Z[cccc  Conflicting Goalsc(   User: Find the most suitable machine to get the job done at the lowest cost Application-oriented metrics Vendor: Persuade you to buy their machine regardless of your needs hardware-oriented metricslx2_xccHccc=cc  Why Study Performance?c(   Know the vocabulary and understand the issues, so that: As a user/buyer, you can make better purchasing decisions As an engineer, you can make better hardware/software design decisionn92: - -F -:cc  71Summary of Metricsc(  yLatency and throughput CPU time, CPI, clock rate and instruction count MIPS, relative MIPS SPEC ratio and rate Benchmarkszza z  Latency vs. Throughputc(  These are two very different metrics! Latency: How long does it take to get a particular task done? - Also called execution time or running time - Usually measured in time (e.g., microseconds) Throughput: How many tasks can you perform in a unit of time? - Also related to bandwidth (communication channels, storage) - Usually measured in units per time (e.g., megabytes/ second) Relationship between them &>_ - - -.c7c`c cccgg   Performance Expressed as TimePc(  Absolute time measures Difference between start and finish of an operation Synonyms: running time, elapsed time, completion time, execution time, response time, latency Relative (normalized) time measures Running time normalized to some reference time  "  -$ " 0 - -cc$c1c   (Choosing a Time-Based Performance Metric)P)c( ) dGuiding principle: choose performance measures that track running time Performance Higher performance means it takes less time to run the application, so bigger is betterB " XcZc   The Nature of Execution Timec(  Execution time on a computer is typically divided into: User time: Time spent executing instructions in the user code System time: Time spent executing instructions in the kernel on behalf of the user code (e.g., opening files) Other: Time when the system is idle or executing other programs Use  time and  top commands in Unix to see these8>n@3Bc6c cdccAcccc c [  Illustration of Execution Timec(  f Real or  wall clock time is the sum of all three44c 3 82CPU Time vs. Latencyc(  M- The time CPU spends for computing the given task, not including the time waiting for I/O or running other programs. Also known as CPU execution time -Consists of user CPU time and system CPU time. User CPU time: Total time CPU spends in the task System CPU time: Total time CPU spends in operating system for the sake of the task.vv"0 -vc"c0c c  N  (Application Metrics vs. Hardware Metrics)P)c( )  How do you relate the application-oriented performance measurements to what is going on inside the machine? Most processors are synchronous, so we can use the clock as a basis.@xccc   Clock Cycles c(  Clock  ticks refer to clock edges (rising or falling) Cycle time (period) = time between ticks = seconds per cycle Clock rate (frequency) = cycles per second (1 Hz = 1 cycle/sec) A 2GHz clock has a cycle time ofh " ;cg(*cg(Kc  Measuring Timec(  >If you re lucky, you can count clock cycles directly; some CPUs have a built-in counter which increments every clock cycle. If you re not, you have to use a slower clock. Most systems have extra hardware which generates a regular tick; many operating systems will count these ticks for you. Timing accuracy limited by the resolution of the clock  you get less accurate readings off a 1Hz clock than a 1MHz clock!j| " x " ( " x " ({ " xc  Cycles and Instructionsc$  In almost all processors, a single instruction (executing one line of assembly code) requires more than one clock cycle. Either: - One instruction must finish before the next can begin - Consecutive instructions may overlap ( pipelining ) In most processors, different types of instructions may take different numbers of cycles (e.g., integer vs. floating point)  "  " 9 - -8 -| "  " qcp   Relating cycles and Instructions!P!c( ! So we can add the following to our vocabulary: Cycles per instruction (CPI)  smaller is better Instruction per cycle (IPC) bigger is better If the cycles to execute one instruction vary depending on the instruction, then the average CPI or IPC of a program will depend on how many of each type of instruction is executed.01 "  "  " 0ccg(.cg(c F  Clock, CPI and Instruction Count!P!c( !  Clock rate - Hardware technology and organization CPI - Instruction set architecture Instruction - Instruction set architecture and count compiler technology - CPI should be measured, instead of check  Manuals Why? ( affected by many factors, e.g Cache/memory, etc.) - The most important is time : lower inst. count may increase instruction clock cycle time Fc g+c g"c gcgc z Examplec(   A program requires executing 100 million instructions on a processor which typically takes 2 CPI with a 2GHz clock. How much time will the program take?xc  Answerc(  p Or you can work backwards from a known execution times and clock rate to calculate the CPI for a given program.qZqc p ;5How to Improve the Performance? P c(  Reduce the number of instructions to execute Increase the number of instructions per cycle Concurrent execution of instructions Increase clock rate- "  " (. " % "  " .c.c%cc  4. Weighted CPI P c(   Sometimes it is useful in designing the CPU to calculate the number of total CPU clock cycles as CPU clock cycles = (CPIi * Ii)fxckckc  5/ Weighted CPI P c(   Where Ii represents number of times instruction of type i is executed in a program and CPIi represents the average number of clock cycles for instruction of type i. This form can be used to express CPU time as CPU time =( (CPIi * Ii))/clock ratexck0cc!ckccckckcc c  60CPI Should Be Measured g(   CPI should be measured and not just calculated from a table in the back of a reference manual Always bear in mind that the real measure of computer performance is time. ^_x2Mxcgc  Hardware-Oriented Metricsc(  "Clock rate and IPC are often combined into various figures of merit: MIPS (Millions of Instructions Per Second)  pronounced  mips MOPS (Millions of Operations Per Second)  pronounced  mops MFLOPS (Millions of Floating-point Operations Per Second)  pronounced  megaflops and sometimes written  megaFLOPS Replace first letter with K (kilo), G (giga), T (tera), P (peta), etc., as appropriate. xE( " x " 2Z " Fcg(;cg(9cg(ccc cccccc c  'Problems with Hardware-Oriented Metrics(P(c( ( Processors with different ISAs may require a different number of instructions to perform the same task, so MIPS hard to compare - MOPS and MFLOPS are a somewhat better measure - How do you count floating-point divides? Vendors usually report  peak ratesf "  " 2] " x " 2$ " c  "MIPS CalculationPc(  | One alternative to time as the metric is MIPS, or million instructions per second. For a given program, MIPS is simply nx(<*cgNck | *$Limitations of MIPSc(  -Meaningful only for comparing machines with same ISA, same program, and same input Instruction capability not considered -May vary inversely with performance! Instruction count is an absolute number without considering the frequency of each instruction classzTx&x&xdxTc&c&cd c    "MIPS - What May Go Wrong with It ?##c(&      A number of popular measures have been adopted in the quest for a standard measure of computer performance, with the result that a few innocent terms have been twisted from their well-defined environment and forced into a service for which they were never intended.B xcguc   93"Misleading Performance Measurement##c( # :-MIPS=instruction count/(execution time*106) MIPS1= MIPS2=-*ckcckckc ; !$Key: Execution Time of Real Programs%P%c( % ` The authors position is that the only consistent and reliable measure of performance is the execution time of real programs, and that all proposed alternatives to time as the metric or to real programs as the items measured have eventually led to misleading claims or even mistakes in computer design. d1^cgcgc 0 &  What is MIPS?c(   Meaningless Indication of Processor Speed - Bob Estall Computer, 1987N/;P<cc c i '!&MIPS Is Not A Multidimensional Measure'P'c( ' |A computer system is multidimensional - therefore should be measured by some  vector ; MIPS is a scalar - measures only one dimension; MIPS is a very useful measure within it s dimension.dW "  " (1 "  " (5 " c    0 <4(    3 rī gֳgֳ ?? W     3 rū gֳgֳ ?@ (    H  0޽h ? ̙33y___PPT10Y+D=' = @B +rD [ x;1_(` / 0DTimes New RomanTTXܖ 0ܖDTahomaew RomanTTXܖ 0ܖ DWingdingsRomanTTXܖ 0ܖ0De0} ՜.+,0l    On-screen ShowUDelf! (Times New RomanTahoma Wingdings ¼wArialSymbol BlueprintSlide 1 Reading List PerformanceConflicting GoalsWhy Study Performance?Summary of MetricsLatency vs. ThroughputPerformance Expressed as Time)Choosing a Time-Based Performance MetricThe Nature of Execution TimeIllustration of Execution TimeCPU Time vs. Latency)Application Metrics vs. Hardware Metrics Clock CyclesMeasuring TimeCycles and Instructions!Relating cycles and Instructions!Clock, CPI and Instruction CountExampleAnswer How to Improve the Performance? Weighted CPI Weighted CPICPI Should Be MeasuredHardware-Oriented Metrics(Problems with Hardware-Oriented MetricsMIPS CalculationLimitations of MIPS#MIPS - What May Go Wrong with It ?#Misleading Performance Measurement%Key: Execution Time of Real ProgramsWhat is MIPS?'MIPS Is Not A Multidimensional Measure  Fonts UsedDesign Template Slide Titles!$_f 0Guang R. GaoGuang R. Gn, so bigger is betterB " XcZc   The Nature of Execution Timec(  Execution time on a computer is typically divided into: User time: Time spent executing instructions in the user code System time: Time spent executing instructions in the kernel on behalf of the user code (e.g., opening files) Other: Time when the system is idle or executing other programs Use  time and  top commands in Unix to see these8>n@3Bc6c cdccAcccc c [  Illustration of Execution Timec(  f Real or  wall clock time is the sum of all three44c 3 82CPU Time vs. Latencyc(  M- The time CPU spends for computing the given task, not including the time waiting for I/O or running other programs. Also known as CPU execution time -Consists of user CPU time and system CPU time. User CPU time: Total time CPU spends in the task System CPU time: Total time CPU spends in operating system for the sake of the task.vv"0 -vc"c0c c  N  (Application Metrics vs. Hardware Metrics)P)c( )  How do you relate the application-oriented performance measurements to what is going on inside the machine? Most processors are synchronous, so we can use the clock as a basis.@xccc   Clock Cycles c(  Clock  ticks refer to clock edges (rising or falling) Cycle time (period) = time between ticks = seconds per cycle Clock rate (frequency) = cycles per second (1 Hz = 1 cycle/sec) A 2GHz clock has a cycle time ofh " ;cg(*cg(Kc  Measuring Timec(  >If you re lucky, you can count clock cycles directly; some CPUs have a built-in counter which increments every clock cycle. If you re not, you have to use a slower clock. Most systems have extra hardware which generates a regular tick; many operating systems will count these ticks for you. Timing accuracy limited by the resolution of the clock  you get less accurate readings off a 1Hz clock than a 1MHz clock!j| " x " ( " x " ({ " xc  Cycles and Instructionsc$  In almost all processors, a single instruction (executing one line of assembly code) requires more than one clock cycle. Either: - One instruction must finish before the next can begin - Consecutive instructions may overlap ( pipelining ) In most processors, different types of instructions may take different numbers of cycles (e.g., integer vs. floating point)  "  " 9 - -8 -| "  " qcp   Relating cycles and Instructions!P!c( ! So we can add the following to our vocabulary: Cycles per instruction (CPI)  smaller is better Instruction per cycle (IPC) bigger is better If the cycles to execute one instruction vary depending on the instruction, then the average CPI or IPC of a program will depend on how many of each type of instruction is executed.01 "  "  " 0ccg(.cg(c F  Clock, CPI and Instruction Count!P!c( !  Clock rate - Hardware technology and organization CPI - Instruction set architecture Instruction - Instruction set architecture and count compiler technology - CPI should be measured, instead of check  Manuals Why? ( affected by many factors, e.g Cache/memory, etc.) - The most important is time : lower inst. count may increase instruction clock cycle time Fc g+c g"c gcgc z Examplec(   A program requires executing 100 million instructions on a processor which typically takes 2 CPI with a 2GHz clock. How much time will the program take?xc  Answerc(  p Or you can work backwards from a known execution times and clock rate to calculate the CPI for a given program.qZqc p ;5How to Improve the Performance? P c(  Reduce the number of instructions to execute Increase the number of instructions per cycle Concurrent execution of instructions Increase clock rate- "  " (. " % "  " .c.c%cc  4. Weighted CPI P c(   Sometimes it is useful in designing the CPU to calculate the number of total CPU clock cycles as CPU clock cycles = (CPIi * Ii)fxckckc  5/ Weighted CPI P c(   Where Ii represents number of times instruction of type i is executed in a program and CPIi represents the average number of clock cycles for instruction of type i. This form can be used to express CPU time as CPU time =( (CPIi * Ii))/clock ratexck0cc!ckccckckcc c  60CPI Should Be Measured g(   CPI should be measured and not just calculated from a table in the back of a reference manual Always bear in mind that the real measure of computer performance is time. ^_x2Mxcgc  Hardware-Oriented Metricsc(  "Clock rate and IPC are often combined into various figures of merit: MIPS (Millions of Instructions Per Second)  pronounced  mips MOPS (Millions of Operations Per Second)  pronounced  mops MFLOPS (Millions of Floating-point Operations Per Second)  pronounced  megaflops and sometimes written  megaFLOPS Replace first letter with K (kilo), G (giga), T (tera), P (peta), etc., as appropriate. xE( " x " 2Z " Fcg(;cg(9cg(ccc cccccc c  'Problems with Hardware-Oriented Metrics(P(c( ( Processors with different ISAs may require a different number of instructions to perform the same task, so MIPS hard to compare - MOPS and MFLOPS are a somewhat better measure - How do you count floating-point divides? Vendors usually report  peak ratesf "  " 2] " x " 2$ " c  "MIPS CalculationPc(  | One alternative to time as the metric is MIPS, or million instructions per second. For a given program, MIPS is simply nx(<*cgNck | *$Limitations of MIPSc(  -Meaningful only for comparing machines with same ISA, same program, and same input Instruction capability not considered -May vary inversely with performance! Instruction count is an absolute number without considering the frequency of each instruction classzTx&x&xdxTc&c&cd c    "MIPS - What May Go Wrong with It ?##c( #   A number of popular measures have been adopted in the quest for a standard measure of computer performance, with the result that a few innocent terms have been twisted from their well-defined environment and forced into a service for which they were never intended.B xcguc   93"Misleading Performance Measurement##c( # :-MIPS=instruction count/(execution time*106) MIPS1= MIPS2=-*ckcckckc ; !$Key: Execution Time of Real Programs%P%c( % ` The authors position is that the only consistent and reliable measure of performance is the execution time of real programs, and that all proposed alternatives to time as the metric or to real programs as the items measured have eventually led to misleading claims or even mistakes in computer design. d1^cgcgc 0 &  What is MIPS?c(   Meaningless Indication of Processor Speed - Bob Estall Computer, 1987N/;P<cc c i '!&MIPS Is Not A Multidimensional Measure'P'c( ' |A computer system is multidimensional - therefore should be measured by some  vector ; MIPS is a scalar - measures only one dimension; MIPS is a very useful measure within it s dimension.dW "  " (1 "  " (5 " c  r!f;1Root EntrydO)0gPicturesCurrent UserDSummaryInformation(8B  !"#$%&'()*+,-./0123456789:;<=CDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abfhij PowerPoint Document(fDocumentSummaryInformation8՜.+,D՜.+,l    On-screen ShowUDelf! (Times New RomanTahoma Wingdings ¼wArialSymbol BlueprintSlide 1 Reading List PerformanceConflicting GoalsWhy Study Performance?Summary of MetricsLatency vs. ThroughputPerformance Expressed as Time)Choosing a Time-Based Performance MetricThe Nature of Execution TimeIllustration of Execution TimeCPU Time vs. Latency)Application Metrics vs. Hardware Metrics Clock CyclesMeasuring TimeCycles and Instructions!Relating cycles and Instructions!Clock, CPI and Instruction CountExampleAnswer How to Improve the Performance? Weighted CPI Weighted CPICPI Should Be MeasuredHardware-Oriented Metrics(Problems with Hardware-Oriented MetricsMIPS CalculationLimitations of MIPS#MIPS - What May Go Wrong with It ?#Misleading Performance Measurement%Key: Execution Time of Real ProgramsWhat is MIPS?'MIPS Is Not A Multidimensional Measure  Fonts UsedDesign Template Slide Titles!4 $_f 0Guang R. GaoGuang R. Gao՜.+,0l$,     On-screen ShowUDel! (Times New RomanTahoma Wingdings ¼wArialSymbol BlueprintSlide 1 Reading List PerformanceConflicting GoalsWhy Study Performanc