

|       |                                                             |                   |                | Internal Assess   | smen      | t Test 1       |                |       |       |      |     |    |
|-------|-------------------------------------------------------------|-------------------|----------------|-------------------|-----------|----------------|----------------|-------|-------|------|-----|----|
|       |                                                             |                   |                | Septembe          | er 20     | 19             |                |       |       |      |     |    |
| Sub:  | Advanced                                                    | Computer Arc      | hitecture      |                   |           | Sub Code       | 15CS72         | Bra   | anch: | CSE  |     |    |
| Date: | 23/09/20                                                    | 19 Duration:      | 90 mins        | Max Marks:        | 50        | Sem / Sec:     |                | VII   |       | l    | OF  | BE |
|       |                                                             |                   |                |                   |           |                | A              | A,B,C | 1     |      |     |    |
|       |                                                             |                   | Answer an      | y FIVE FULL       |           | I              |                |       | MAF   | RK S | СО  | RB |
|       |                                                             |                   | <u>Qu</u>      | estions           |           |                |                |       |       |      |     | Т  |
| 1 (a) | Explain Fly                                                 | nn's Classific    | ation of Co    | mputer archite    | ctur      | e along with   | neat diagram   | n.    | 08    |      | CO1 | L2 |
|       | Michael Fly                                                 | nn introduced     | a classificati | on of various co  | ompu      | ter architectu | ures           |       |       |      |     |    |
|       | based on no                                                 | tions of instruc  | tion and dat   | a streams.        |           |                |                |       |       |      |     |    |
|       | Single Instr                                                | ction Stream      | Single Data S  | Stream(SISD)      |           |                |                |       |       |      |     |    |
|       | • It is                                                     | uniprocessor s    | system         |                   |           |                |                |       |       |      |     |    |
|       | • Sing                                                      | le instruction    | is executed b  | by CPU in one c   | lock      | cycle.         |                |       |       |      |     |    |
|       | • Inst                                                      | ructions are ex   | ecuted seque   | entially          |           |                |                |       |       |      |     |    |
|       | • Wo                                                        | kstations of D    | EC, MP & I     | BM, IBM 701, 1    | BM        | 1620, IBM 7    | '090 etc.      |       |       |      |     |    |
|       | Single Instr                                                | ction Stream      | Multiple Dat   | a stream( SIMD    | <u>))</u> |                |                |       |       |      |     |    |
|       | • A s                                                       | ngle instructio   | on is execute  | ed by multiple p  | oroce     | ssing elemen   | nts or process | ors.  |       |      |     |    |
|       | Eac                                                         | n processing e    | lement opera   | tes on different  | data      |                |                |       |       |      |     |    |
|       | • Dat                                                       | a level parallel  | ism is achiev  | ved.              |           |                |                |       |       |      |     |    |
|       | • Exa                                                       | mple: Vector      | supercomput    | er in early 1970  | 0 like    | e CDC star -   | 100, Connec    | tion  |       |      |     |    |
|       | Mae                                                         | hine CM2, M       | aspar MP-1,    | IBM 9000, Cray    | y C90     | ), Fujitsu VP  | ).             |       |       |      |     |    |
|       | Multiple Ins                                                | truction Stream   | n Single Dat   | a Stream(MISD     | <u>))</u> |                |                |       |       |      |     |    |
|       | • The                                                       | same data s       | tream flows    | through a line    | ear a     | rray of pro    | cessor execu   | ting  |       |      |     |    |
|       | diff                                                        | erent instruction | on streams. '  | This architectur  | e is      | known as sy    | stolic arrays  | for   |       |      |     |    |
|       | pipe                                                        | lined executio    | n of specific  | instructions.     |           |                |                |       |       |      |     |    |
|       | • Few                                                       | actual examp      | les of this cl | ass of parallel c | omp       | uter have eve  | er existed. On | e is  |       |      |     |    |
|       | the                                                         | experimental C    | Carnegie Mel   | lon C.MPP con     | npute     | r (1971).      |                |       |       |      |     |    |
|       | • Least popular model to be applied in commercial machines. |                   |                |                   |           |                |                |       |       |      |     |    |
|       | Multiple Ins                                                | truction Stream   | n and Multip   | ole Data Stream   | (MIN      | <u>1D)</u>     |                |       |       |      |     |    |
|       | • Mo                                                        | t popular com     | puter model    |                   |           |                |                |       |       |      |     |    |
|       | • Eve                                                       | ry processor      | may be ex      | ecuting differe   | nt ir     | struction st   | ream and ev    | very  |       |      |     |    |
|       | pro                                                         | essor uses diff   | ferent data st | ream.             |           |                |                |       |       |      |     |    |
|       | • The                                                       | y are also calle  | ed as parallel | computers.        |           |                |                |       |       |      |     |    |
|       | • Exa                                                       | mple: IBM 37      | 0/168MP, U     | nivac 1100/80     |           |                |                |       |       |      |     |    |
|       | • Para                                                      | llel computers    | operate in N   | AIMD mode.        |           |                |                |       |       |      |     |    |
|       | • The                                                       | re are two t      | ypes of M      | IMD i.e. shar     | ed r      | nemory mu      | ltiprocessor   | and   |       |      |     |    |
|       | dist                                                        | ibuted memor      | y multicomp    | outer.            |           |                |                |       |       |      |     |    |
|       | • In s                                                      | hared memory      | multiproce     | ssor system all   | proc      | cessors have   | common sha     | ared  |       |      |     |    |





| 2 (b) | Describe AT2 model for VLSI.                                                                                                                                                                                              | (02) | CO1 | L2 |
|-------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------|-----|----|
|       | The parallel computers use VLSI chips for fabricating processor arrays, memory arrays etc.<br>$\underline{AT^2 \text{ Model}}$<br>Let A be the chip area. The latency T is time required from when the inputs are applied |      |     |    |
|       | until all outputs are produced for a single problem instance. Let s be the size of the problem. Then there exists a lower bound $f(s)$ such that $A * T^2 \ge O(f(s))$ .                                                  |      |     |    |
|       | The chip is represented by the base area in the two horizontal dimensions. The vertical                                                                                                                                   |      |     |    |
|       | dimension corresponds to time. Therefore, the three-dimensional solid represents the                                                                                                                                      |      |     |    |
|       | history of the computation performed by the chip.                                                                                                                                                                         |      |     |    |
| 3 (a) | Explain UMA Model and COMA Model for shared memory multiprocessor<br>systems with neat diagram.                                                                                                                           | (06) | CO1 | L3 |
|       | UMA Model(Uniform Memory Access)                                                                                                                                                                                          |      |     |    |
|       | • Here the physical memory is uniformly shared by all the processors.                                                                                                                                                     |      |     |    |
|       | • All processors have equal access time to all memory words, which is why it is                                                                                                                                           |      |     |    |
|       | called uniform memory access. Refer the diagram given below.                                                                                                                                                              |      |     |    |
|       | Processors                                                                                                                                                                                                                |      |     |    |
|       | P <sub>1</sub> P <sub>2</sub> •••• P <sub>n</sub>                                                                                                                                                                         |      |     |    |
|       |                                                                                                                                                                                                                           |      |     |    |
|       | System Interconnect                                                                                                                                                                                                       |      |     |    |
|       | (Bus, Crossbar, Multistage network)                                                                                                                                                                                       |      |     |    |
|       |                                                                                                                                                                                                                           |      |     |    |
|       | VO Shared Memory SM <sub>m</sub>                                                                                                                                                                                          |      |     |    |
|       | The UMA multiprocessor model                                                                                                                                                                                              |      |     |    |
|       |                                                                                                                                                                                                                           |      |     |    |
|       | • The system interconnect is through bus, crossbar or multistage network. When                                                                                                                                            |      |     |    |
|       | all processors have equal access to all peripheral devices, the system is called a                                                                                                                                        |      |     |    |
|       | symmetric multiprocessor. In this case, all the processors are equally capable of running the executive programs, such as OS kernel and I/O service routines.                                                             |      |     |    |
|       | <ul> <li>In asymmetric multiprocessor only master processor can execute the operating</li> </ul>                                                                                                                          |      |     |    |
|       | system and handle I/O. The remaining processors have no I/O capability and thus                                                                                                                                           |      |     |    |
|       | are called Attached Processors. Attached processors execute user codes under the                                                                                                                                          |      |     |    |
|       | supervision of the master processor.                                                                                                                                                                                      |      |     |    |
|       | COMA( Cache Only Memory Architecture) Model                                                                                                                                                                               |      |     |    |
|       | <ul> <li>Here each processor has a cache memory.</li> </ul>                                                                                                                                                               |      |     |    |
|       | <ul> <li>There is no memory hierarchy at each processor node. All the caches form global</li> </ul>                                                                                                                       |      |     |    |
|       | address space.                                                                                                                                                                                                            |      |     |    |
|       | <ul> <li>Remote access to any cache is assisted by distributed cache directory.</li> </ul>                                                                                                                                |      |     |    |
|       | • Whenever data is accessed in remote cache it gets migrated to where it will be                                                                                                                                          |      |     |    |
|       |                                                                                                                                                                                                                           |      |     |    |

| r     | used This reduces the number of reductories and allows a more officient                                                            |      |     |    |
|-------|------------------------------------------------------------------------------------------------------------------------------------|------|-----|----|
|       | used. This reduces the number of redundant copies and allows a more efficient                                                      |      |     |    |
|       | use of memory resources.                                                                                                           |      |     |    |
|       | • Examples: Kendall Square Research's KSR-1 machine.                                                                               |      |     |    |
|       | Interconnection Network                                                                                                            |      |     |    |
|       | The COMA model of a multiprocessor (P: Processor, C: Cache, D: Directory; e.g. the KSR-1)                                          |      |     |    |
| 3 (b) | Describe the Bernstein's conditions of Parallelism.                                                                                | (04) | CO4 | L2 |
| - (-) | Bernstein's Conditions                                                                                                             |      |     |    |
|       | • There are certain conditions when two processes are executed in parallel. I <sub>i</sub>                                         |      |     |    |
|       | (Input set) is the set of input variables for process $P_{i}$ and $O_i$ (Output set)                                               |      |     |    |
|       | consists of all output variables generated after the execution of process $P_i$ . Now,                                             |      |     |    |
|       | consider two processes $P_1$ and $P_2$ with their input sets $I_1$ and $I_2$ and output sets                                       |      |     |    |
|       | $O_1$ and $O_2$ , respectively. These two processes can execute in parallel and are                                                |      |     |    |
|       |                                                                                                                                    |      |     |    |
|       | denoted as P1   P2 if they are independent and follow these three Bernstein's                                                      |      |     |    |
|       | conditions as given below                                                                                                          |      |     |    |
|       | • $I_1 \cap O_2 = \phi$                                                                                                            |      |     |    |
|       | • $I_2 \cap O_1 = \phi$                                                                                                            |      |     |    |
|       | • $O_1 \cap O_2 = \phi$                                                                                                            |      |     |    |
|       | • Bernstein's conditions simply imply that two processes can execute in parallel if                                                |      |     |    |
|       | they are flow-independent, anti-independent, and output-independent.                                                               |      |     |    |
|       | • In general, a set of processes, $P_1$ , $P_2$ ,, $P_K$ can execute in parallel if Bernstein's                                    |      |     |    |
|       | conditions are satisfied on a pair wise basis; that is, $P_1 \parallel P_2 \parallel P_3 \parallel P_4 \parallel \parallel P_K$ if |      |     |    |
|       | and only if $P_i \parallel P_j$ for all $i \neq j$                                                                                 |      |     |    |
| 4     | Explain different types of Dependences in program. Analyze the dependences for                                                     | (10) | CO1 | L4 |
|       | following code segment and draw dependence graph and assume there is only one                                                      |      |     |    |
|       | functional unit for Load and Store. Note M (10) contains value 64.                                                                 |      |     |    |
|       | S1: Load R1,1024                                                                                                                   |      |     |    |
|       | S2: Load R2,M(10)                                                                                                                  |      |     |    |
|       | S3: Add R1,R2                                                                                                                      |      |     |    |
|       | S4: Store M(1024),R1                                                                                                               |      |     |    |
|       | S5: Store M((R2)),1024                                                                                                             |      |     |    |
|       | Data Dependence: There are five types of Data Dependence as shown below                                                            |      |     |    |
|       | • Flow Dependence: A statement S2 is flow dependent on S1 if at least one output                                                   |      |     |    |
|       | of S1 feeds in as input to S2. Flow Dependence is denoted as $S1 \rightarrow S2$                                                   |      |     |    |
|       | <ul> <li>Anti Dependence: Statement S2 is anti-dependent on statement S1 if S2 follows</li> </ul>                                  |      |     |    |
|       | S1 in program order and if the output of S2 overlaps the input to S1. It is denoted                                                |      |     |    |
|       | or in program order and if the output of 52 overlaps the input to 51. It is denoted                                                |      |     |    |

|        | as follows                                                                     |     |
|--------|--------------------------------------------------------------------------------|-----|
|        | S1 +> S2                                                                       |     |
| •      | Output Dependence: Two statements are output-dependent if they produce the     |     |
|        | same output variable. It is denoted as follows                                 |     |
|        | S1 ↔ S2                                                                        |     |
| ٠      | I/O Dependence: The read and write statements are I/O statements. The I/O      |     |
|        | dependence occurs when the same file is referenced by both I/O statements.     |     |
| •      | Unknown dependence: The dependence relation between two statements car         | not |
|        | be determined in the following situations:                                     |     |
|        | > The subscript of a variable is itself subscribed(indirect addressing mode)   |     |
|        | LOAD R1, @100                                                                  |     |
|        | The subscript does not contain the loop index variable.                        |     |
|        | ➢ A variable appears more than once with subscripts having different           |     |
|        | coefficients of the loop variable.                                             |     |
|        | The subscript is nonlinear in the loop index variable.                         |     |
| Contro | 1 Dependence                                                                   |     |
| •      | The conditional statements are evaluated at run time and hence the execution   |     |
|        | path followed would be different.                                              |     |
| ٠      | Different paths taken after a conditional branch may introduce or eliminate da | a   |
|        | dependencies among instructions.                                               |     |
| •      | Dependence may also exist between operations performed in successive           |     |
|        | iterations of a looping procedure. In the following, we show one loop example  | ;   |
|        | with and another without control-dependent iterations.                         |     |
| •      | The following loop has independent iterations                                  |     |
|        | <b>Do</b> $20I = 1, N$                                                         |     |
|        | A(I) = C(I)                                                                    |     |
|        | IF (A(I) .LT. 0) $A(I) = 1$<br>20 Continue                                     |     |
| •      | The following loop has dependent iterations                                    |     |
|        | $Do \qquad 10I = 1, N$                                                         |     |
|        | IF(A(I-1).EQ. 0) A(I) = 0                                                      |     |
|        | 10 Continue                                                                    |     |
| lesou  | ce Dependence                                                                  |     |
| ٠      | The Resource Dependence occurs due to conflicts in using shared resources li   | ĸe  |
|        | integer units, floating point units, register or memory areas etc.             |     |
| •      | When the conflict is due to ALU unit it is called as ALU dependence and whe    | n   |
|        | the conflict is due to storage it is called as storage dependence.             |     |
|        |                                                                                |     |
|        |                                                                                |     |
|        |                                                                                |     |



|       | instructions executed b                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | y two processors A and B.                                                           | S1 and S2 and L5 and L6 are                                                       |      |     |    |
|-------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------|------|-----|----|
|       | added instructions for I                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |                                                                                     |                                                                                   |      |     |    |
|       | (G)<br>(G)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | Cyrde 1<br>Cyrde 2                                                                  |                                                                                   |      |     |    |
|       |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | Cyde 3                                                                              | L/S: Load/Store operation<br>X: Multiply operation<br>+/-: Add/Subtract operation |      |     |    |
|       | (G-                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | Cycle 4<br>Cycle 4<br>Added<br>instructions<br>for iPC                              |                                                                                   |      |     |    |
|       |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | Cycle 6                                                                             |                                                                                   |      |     |    |
|       | To solve the mismatch problem<br>one approach is to develop com<br>redesign for more efficient expl                                                                                                                                                                                                                                                                                                                                                                                                      | pilation support, and the o                                                         | -                                                                                 |      |     |    |
| 6 (a) | Compare RISC and CISC wit<br>distinctions.                                                                                                                                                                                                                                                                                                                                                                                                                                                               | th respect to its character                                                         | istics and its architectural                                                      | (06) | CO2 | L2 |
|       | Architectural<br>Consideration                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | CISC                                                                                | RISC                                                                              |      |     |    |
|       | Instruction set size and format                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | Large set of instructions<br>with variable<br>format(16-64 bits per<br>instruction) | Small set of instructions<br>with fixed format(32 bit per<br>instruction)         |      |     |    |
|       | Addressing modes                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | 12-24                                                                               | 3-5                                                                               |      |     |    |
|       | General purpose register and cache design                                                                                                                                                                                                                                                                                                                                                                                                                                                                | 8-24 GPR and unified cache                                                          | 32-192 GPR and split cache                                                        |      |     |    |
|       | СРІ                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | Between 2 and 16                                                                    | Average CPI<1.5                                                                   |      |     |    |
|       | CPU control                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | Micro programmed                                                                    | Hardwired                                                                         |      |     |    |
| 6 (b) | <ul> <li>Differentiate VLIW and Superscalar Processor.</li> <li>Differences between VLIW and superscalar <ul> <li>The decoding of VLIW instruction is much easier than superscalar instructions.</li> <li>The code density of superscalar machine is better than VLIW when the instruction level parallelism is less.</li> <li>The CPI of VLIW processor can be even lower than that of superscalar processor. Example: Multiflow trace computer allows up to seven operations to</li> </ul> </li> </ul> |                                                                                     |                                                                                   | (04) | CO3 | L2 |







referenced again in the near future. This is often caused by special program constructs

| such as iterative loops, process stacks, temporary variables, or subroutines. Once a loop  |     |
|--------------------------------------------------------------------------------------------|-----|
| is entered or a subroutine is called, a small code segment will be referenced repeatedly   |     |
| many times.                                                                                |     |
| Spatial Locality: This refers to the tendency for a process to access items whose          |     |
| addresses are near one another. For example, operations on tables or arrays involve        |     |
| accesses of a certain clustered area in the address space.                                 |     |
| Sequential Locality: In typical programs, the execution of instructions follows a          |     |
| sequential order unless branch instructions create out-of-order executions. The ratio of i | n-  |
| order execution to out-of-order execution is roughly 5 to 1 in ordinary programs. Beside   | ès, |
| the access of large data array also follows a sequential order.                            |     |
|                                                                                            |     |
|                                                                                            |     |