# 1. Internal Organization Of Memory Chip: # Internal organization of Memory chips :- Memory cells are usually organized in the born of an array, each cell is capable of storing one bit of inbornation. The big, is an example of a very small memory chip consisting of 16 words of 8 bits each. Each how of cells constitutes a memory word and all cells of a how are connected to a commondine is heberted to as the word line, which is driven by the address decoded on the Chip. The cells in each column are connected to a sense furtile circuit by two bit lines. The sense white circuits are connected to the data i/o lines of the chip. During head operation, these circuits sense, or read the information stored in the cells selected by a Buring a cohile operation, the sense white circuit hereive the introducation and stoke the cells of the selected sense white the data of the selected sense white the data of the data of each sense white the data of the data dire that the connected to a single bidirectional data line that two connected to the data bus of a computer. The data if the data bus of a computer. Two control lines, R/W & cs are provided inaddition to the address & data lines. The R/W input specifies the hequired of the address & data lines. The R/W input specifies the hequired of each of the cs (chip select) if p selects a given chip in a mentile chip memory system. # 2. Internal organization of memory chip 1K\*1. Consider now a slightly larger memory circuit, one that has 1K (1024) memory cells. This circuit can be organized as a $128 \times 8$ memory, requiring a total of 19 external connections. Alternatively, the same number of cells can be organized into a $1K \times 1$ format. In this case, a 10-bit address is needed, but there is only one data line, resulting in 15 external connections. Figure 5.3 shows such an organization. The required 10-bit address is divided into two groups of 5 bits each to form the row and column addresses for the cell array. A row address selects a row of 32 cells, all of which are accessed in parallel. However, according to the column address, only one of these cells is connected to the external data line by the output multiplexer and input demultiplexer. Commercially available memory chips contain a much larger number of memory cells than the examples shown in Figures 5.2 and 5.3. We use small examples to make the figures easy to understand. Large chips have essentially the same organization as Figure 5.3 but use a larger memory cell array and have more external connections. For example, a 4M-bit chip may have a 512K × 8 organization, in which case 19 address and 8 data input/output pins are needed. Chips with a capacity of hundreds of megabits are now available. Figure 5.3 Organization of a $1K \times 1$ memory chip. # 3. Asynchronous memory chip Figure 5.7 Internal organization of a 2M x 8 dynamic memory chip. The cells are organized in the born of a 4k x qk array. The 4096 cells in each how are divided into 512 groups of 8, so that each how can store 512B of data. 12 adoless bits are needed to select a how and 9 bits are needed to specify a group of 8 bits in the selected - row. Thus, a 21-bit address is needed to accept a byte in this memory. of the address Constitute the how & column address of a byte. During a Read of while operation, the how address is applied birst. It is hoaded into the howaddress takes in respect to a signal pulse on the Row Address Strobe (RAS) in the chip. Then a Read operation is willialid, in which all cells on the Selected how are head & hitherhed. ! applied to the address pins and loaded into the column address; later under the control of the column Address Strobe (CAS) signed. The information in the datch are decoded and the appropriate group of 8 sense while are selected. If the R/W control signal indicates a Read ofiration, the of values of the selected circuits are transferred to the data dines P+0. For a write operation, the information on the D+0 times is transferred to the selected we are with the selected we are with this information is then used to overcosite the contents of the believed cells in the corresponding 8 columns. applying a how address causes all cells on the corresponding how to be head and rebreshed during both Read of while operations. To ensure that the contents of a DRAM are maintained, each sow of cells must be accessed periodically. A Retrest counter perborous this function automatically. The timing of the memory device is controlled asynchronously. A specialized memory controller circuit provides the necessary control signals, RAS & CAS signals. The processor must take into account the delay in the response of the memory. Such memories are reberred to as asynchronous DRAM's . Advantages :- - 1) It has high density & low cost, these are widely used. - 2) Available Chips are of size IM to 256 Mbils & larger chips are developed. - 3) A DRAND chip is organized to read or while a number of fiels in parallel. - 4) It provides thexibility on designing memory hystems. # Fast Page Mode: - Transferring the bytes in sequential order is achieved by applying the consecutive sequence of column address under the control of successive CAS signals. - This scheme allows transferring a block of data at a faster rate. The block of transfer capability is called as Fast Page Mode. ### 4. Synchronous DRAM The operation is directly synchronized with a clock bignal, buch memories are known as synchronous DRAM's. highsters. The old of each sense amplibies is connected to a date. A sead operation causes the contents of all eachs in the selected sow to be loaded into these lateres. It an access is made for hethering purposes but it will not change the contents of these lateres and it will hebresh the contents of the cells. Data held in the lateres that Conserpond to the believed columns are transferred into the data of heg. SRAM's have several dibberent modes of oferation, which can be selected by cositing control into mode hegister. for eq, burst operations of dibberent lengths can be specified. The burst operations use the block transfer capability as in the bast page mode beature. In SEDAN'S, It is not necessary to provide externally generated pulbes on the CAS line to select successive columns. The necessary control signals are provided internally using a column counter and the clock signal. New data can be placed on the data lines in each clock cycle. All actions are triggered by the hising edge of the clock. The below big shows the timing diagram of a typical burst head First, the how address is latered under control of the RAS Signal. The memory typically lakes 2 or 3 clock cycles to activate the Scheeted how. Then, the column address is latehed under control of CAS Signal. Abter a delay of one clock cycle, the biret set of data bits is placed on the data lines. The SDRAM automatically intermed the column address to access the next three sets of bills in the Scheeled how, which are placed on the datadines in the next 3 clock cycles. SDRAMS have built - in hebresh circuity. A part of this circuitary is a rebresh counter, which provides the addresses of the hours that are selected bot hebreshing. The Commercial SDRAM's can be used with clock speeds above too MHz. # Latency: - It refers to the amount of time it takes to transfer a word of data to or from the memory. - For a transfer of single word, the latency provides the complete indication of memory performance. - For a block transfer, the latency denote the time it takes to transfer the first word of data. #### Bandwidth: - It is defined as the number of bits or bytes that can be transferred in one second. - Bandwidth mainly depends upon the speed of access to the stored data & on the number of bits that can be accessed in parallel. ### Double Data Rate SDRAM(DDR-SDRAM): - The standard SDRAM performs all actions on the rising edge of the clock signal. - The double data rate SDRAM transfer data on both the edges(loading edge, trailing edge). - The Bandwidth of DDR-SDRAM is doubled for long burst transfer. - To make it possible to access the data at high rate, the cell array is organized into two banks. ### 5. Memory Hierarchy # Fig:Memory Hierarchy # Types of Cache Memory: - The Cache memory is of 2 types. They are, - Primary /Processor Cache(Level1 or L1 cache) - Secondary Cache(Level2 or L2 cache) Primary Cache → It is always located on the processor chip. Secondary Cache → It is placed between the primary cache and the rest of the memory. - The main memory is implemented using the dynamic components(SIMM,RIMM,DIMM). - The access time for main memory is about 10 times longer than the access time for L1 cache. # **SPEED, SIZE COST:** | Characteristics | SRAM | DRAM | Magnetis Disk | |-----------------|-----------|----------------|------------------| | Speed | Very Fast | Slower | Much slower than | | | | | DRAM | | Size | Large | Small | Small | | Cost | Expensive | Less Expensive | Low price | ### 6. Cache Memory: # CACHE MEMORIES The effectiveness of cache mechanism is based on the property of 'Locality of reference'. # Locality of Reference: - Many instructions in the localized areas of the program are executed repeatedly during some time period and remainder of the program is accessed relatively infrequently. - It manifests itself in 2 ways. They are, - Temporal(The recently executed instruction are likely to be executed again very soon.) - > Spatial(The instructions in close proximity to recently executed instruction are also likely to be executed soon.) - The Cache memory stores a reasonable number of blocks at a given time but this number is small compared to the total number of blocks available in Main Memory. - The correspondence between main memory block and the block in cache memory is specified by a mapping function. - The Cache control hardware decide that which block should be removed to create space for the new block that contains the referenced word. - The collection of rule for making this decision is called the replacement algorithm. - The cache control circuit determines whether the requested word currently exists in the cache. - If it exists, then Read/Write operation will take place on appropriate cache location. In this case Read/Write hit will occur. - In a Read operation, the memory will not involve. - The write operation is proceed in 2 ways. They are, - Write-through protocol - Write-back protocol # Write-through protocol: Here the cache location and the main memory locations are updated simultaneously. # Write-back protocol: - This technique is to update only the cache location and to mark it as with associated flag bit called dirty/modified bit. - The word in the main memory will be updated later, when the block containing this marked word is to be removed from the cache to make room for a new block. - If the requested word currently not exists in the cache during read operation, then read miss will occur. - To overcome the read miss Load -through / Early restart protocol is used. ### Read Miss: The block of words that contains the requested word is copied from the main memory into cache. # Load -through: - After the entire block is loaded into cache, the particular word requested is forwarded to the processor. - If the requested word not exists in the cache during write operation, then Write Miss will occur. - If Write through protocol is used, the information is written directly into main memory. - If Write back protocol is used then block containing the addressed word is first brought into the cache and then the desired word in the cache is over-written with the new information. ### 7. Performance consideration or Interleaving: performance Considerations: Teor Key bacters of a computer are performance & lost. performance depends on how bast machine instructions can be brought into the processor both execution & how bast they can be executed. An ebbective way to introduce parallelism (Is to use an interleaved organization. Interleaving It the main memory of a computer is structured as collection of physically separate modules, each with its Own address bubber highlier (ABR) and data bubber highlier (OBI), The memory acress operations may proceed in more than One module at the same time. Thus, the aggregate late of transmission of words to and thom the main memby system can Two methods of addresse layout are shown below. a) Consecutive words in a module. b) Consecutive words in consecutive Modules In big (a), the membry address generalid by the processor is duoded. The high-order k bits name one of 'n' modules and the low-order 'no' bilts name a particular word in that module when consecutive locations are arressed, when a block of date is transferred to a cache, only one module is involved. At the same time, however, devices with DMA ability may be accessing indomnation in other memory Modules. The big (b) 88 a more effective way to address the modules is called Memory interleaving. The low-order K bits of the memby address beleet a module, and the high-order on bils name a location with in that module. The consecutive addresses are tocated in successive modules. Any component of the hystem that generalis hequests bor access to consecutive memory locations can keep several modules bury at any one time This herette in both baster access to a block of data & higher any utilitaation of the memory system as a cohole. To implement the interleaved shruelters, there must be 2 modules Otherwise, there will be gaps of nonexistent locations in the memby ### 8. Replacement algorithms: ### Replacement Algorithm: - In direct mapping, the position of each block is pre-determined and there is no need of replacement strategy. - In associative & set associative method, the block position is not predetermined; ie.. when the cache is full and if new blocks are brought into the cache, then the cache controller must decide which of the old blocks has to be replaced. - Therefore, when a block is to be over-written, it is sensible to over-write the one that has gone the longest time without being referenced. This block is called Least recently Used(LRU) block & the technique is called LRU algorithm. - The cache controller track the references to all blocks with the help of block counter. ### Eg: Consider 4 blocks/set in set associative cache, - 2 bit counter can be used for each block. - When a 'hit' occurs, then block counter=0; The counter with values originally lower than the referenced one are incremented by 1 & all others remain unchanged. - When a 'miss' occurs & if the set is full, the blocks with the counter value 3 is removed, the new block is put in its place & its counter is set to '0' and other block counters are incremented by 1. #### Merit: The performance of LRU algorithm is improved by randomness in deciding which block is to be over-written. Explain about MRU also. ### 9. Mapping functions: # Mapping Function: ### Direct Mapping: - It is the simplest technique in which block j of the main memory maps onto block 'j' modulo 128 of the cache. - Thus whenever one of the main memory blocks 0,128,256 is loaded in the cache,it is stored in block 0. - Block 1,129,257 are stored in cache block 1 and so on. - The contention may arise when, - When the cache is full - When more than one memory block is mapped onto a given cache block position. - The contention is resolved by allowing the new blocks to overwrite the currently resident block. - Placement of block in the cache is determined from memory address. Fig: Direct Mapped Cache The memory address is divided into 3 fields. They are, Low Order 4 bit field(word)→Selects one of 16 words in a block. 7 bit cache block field→When new block enters cache, 7 bit determines the cache position in which this block must be stored. 5 bit Tag field > The high order 5 bits of the memory address of the block is stored in 5 tag bits associated with its location in the cache. - As execution proceeds, the high order 5 bits of the address is compared with tag bits associated with that cache location. - If they match, then the desired word is in that block of the cache. - If there is no match, then the block containing the required word must be first read from the main memory and loaded into the cache. ### Merit: • It is easy to implement. # Demerit: • It is not very flexible. # Associative Mapping: In this method, the main memory block can be placed into any cache block position. # Fig:Associative Mapped Cache. - 12 tag bits will identify a memory block when it is resolved in the cache. - The tag bits of an address received from the processor are compared to the tag bits of each block of the cache to see if the desired block is persent. This is called associative mapping. - It gives complete freedom in choosing the cache location. - A new block that has to be brought into the cache has to replace(eject)an existing block if the cache is full. - In this method, the memory has to determine whether a given block is in the cache. - A search of this kind is called an associative Search. ### Merit: It is more flexible than direct mapping technique. ### Demerit: Its cost is high. # 1:4096= 1:212 ### Set-Associative Mapping: - It is the combination of direct and associative mapping. - The blocks of the cache are grouped into sets and the mapping allows a block of the main memory to reside in any block of the specified set. - In this case, the cache has two blocks per set, so the memory blocks 0,64,128......4032 maps into cache set '0' and they can occupy either of the two block position within the set. 6 bit set field Determines which set of cache contains the desired block. 6 bit tag field→The tag field of the address is compared to the tags of the two blocks of the set to clock if the desired block is present. Fig: Set-Associative Mapping: - The cache which contains 1 block per set is called direct Mapping. - A cache that has 'k' blocks per set is called as 'k-way set associative cache'. - · Each block contains a control bit called a valid bit. - The Valid bit indicates that whether the block contains valid data. - The dirty bit indicates that whether the block has been modified during its cache residency. Valid bit=0→When power is initially applied to system Valid bit =1 → When the block is loaded from main memory at first time. - If the main memory block is updated by a source & if the block in the source is already exists in the cache, then the valid bit will be cleared to '0'. - If Processor & DMA uses the same copies of data then it is called as the Cache Coherence Problem. #### Merit: - The Contention problem of direct mapping is solved by having few choices for block placement. - The hardware cost is decreased by reducing the size of associative search. ### 1: 2<sup>6</sup> 1 set is mapped to 64 blocks of main memory \_ ### 10. Virtual Memory: # VIRTUAL MEMORY: - Techniques that automatically move program and data blocks into the physical main memory when they are required for execution is called the Virtual Memory. - The binary address that the processor issues either for instruction or data are called the virtual / Logical address. - The virtual address is translated into physical address by a combination of hardware and software components. This kind of address translation is done by MMU(Memory Management Unit). - When the desired data are in the main memory ,these data are fetched /accessed immediately. - If the data are not in the main memory, the MMU causes the Operating system to bring the data into memory from the disk. - Transfer of data between disk and main memory is performed using DMA scheme. # Fig:Virtual Memory Organisation #### 11. Virtual Address Translation #### Address Translation: - In address translation, all programs and data are composed of fixed length units called Pages. - The Page consists of a block of words that occupy contiguous locations in the main memory. - The pages are commonly range from 2K to 16K bytes in length. - The cache bridge speed up the gap between main memory and secondary storage and it is implemented in software techniques. - Each virtual address generated by the processor contains virtual Page number(Low order bit) and offset(High order bit) Virtual Page number+ Offset -> Specifies the location of a particular byte (or word) within a page. # Page Table: It contains the information about the main memory address where the page is stored & the current status of the page. ### Page Frame: • An area in the main memory that holds one page is called the page frame. ### Page Table Base Register: It contains the starting address of the page table. ### Control Bits in Page Table: The Control bits specifies the status of the page while it is in main memory. ### Function: - The control bit indicates the validity of the page ie)it checks whether the page is actually loaded in the main memory. - It also indicates that whether the page has been modified during its residency in the memory; this information is needed to determine whether the page should be written back to the disk before it is removed from the main memory to make room for another page. Fig:Virtual Memory Address Translation - The Page table information is used by MMU for every read & write access. - The Page table is placed in the main memory but a copy of the small portion of the page table is located within MMU. - This small portion or small cache is called Translation LookAside Buffer(TLB). - This portion consists of the page table enteries that corresponds to the most recently accessed pages and also contains the virtual address of the entry. ### 12. Virtual Memory- Associative Mapped TLB: Fig:Use of Associative Mapped TLB - When the operating system changes the contents of page table, the control bit in TLB will invalidate the corresponding entry in the TLB. - Given a virtual address, the MMU looks in TLB for the referenced page. - If the page table entry for this page is found in TLB, the physical address is obtained immediately. - If there is a miss in TLB, then the required entry is obtained from the page table in the main memory & TLB is updated. - When a program generates an access request to a page that is not in the main memory ,then Page Fault will occur. - The whole page must be broght from disk into memry before an access can proceed. - When it detects a page fault, the MMU asks the operating system to generate an interrupt. - The operating System suspend the execution of the task that caused the page fault and begin execution of another task whose pages are in main memory because the long delay occurs while page transfer takes place. - When the task resumes, either the interrupted instruction must continue from the point of interruption or the instruction must be restarted. - If a new page is brought from the disk when the main memory is full, it must replace one of the resident pages. In that case, it uses LRU algorithm which removes the least referenced Page. - A modified page has to be written back to the disk before it is removed from the main memory. In that case, write -through protocol is used. ### 13. Hit Rate & Miss penalty ``` The number of hits status as a bracetion of all attempts accesses is called the hit rate and the miss rate is the number of misses stated as a bracetion of alternation accesses. High hit rates are 0.9 are essential to high-performance computers. The extra time needed to bring the desired information, into the cache is called the miss penalty. In general, the miss penalty is the time needed to bring a block of data from a blooch unit in the memory hierary of a battle unit. The miss penalty is reduced it difficient mechanism to be dransferring data bloc the various units of the hierarchy are implemented. ``` # 14. Working of Disk controller: - The disk system has 3 parts. They are, - Disk Platter(Usually called Disk) - Disk Drive(spins the disk & moves Read/write heads) - Disk Controller(controls the operation of the system.) Figure 5.30 Organization of one surface of a disk. - Each surface is divided into concentric tracks. - Each track is divided into sectors. - The set of corresponding tracks on all surfaces of a stack of disk form a logical cylinder. - The data are accessed by specifying the surface number, track number and the sector number. - The Read/Write operation start at sector boundaries. - Data bits are stored serially on each track. - Each sector usually contains 512 bytes. Sector header -> contains identification information. It helps to find the desired sector on the selected track. ECC (Error checking code)- used to detect and correct errors. - An unformatted disk has no information on its tracks. - The formatting process divides the disk physically into tracks and sectors and this process may discover some defective sectors on all tracks. - The disk controller keeps a record of such defects. - The disk is divided into logical partitions. They are, - Primary partition - Secondary partition - In the diag, Each track has same number of sectors. - So all tracks have same storage capacity. - Thus the stored information is packed more densely on inner track than on outer track. #### Access time - There are 2 components involved in the time delay between receiving an address and the beginning of the actual data transfer. They are, - Seek time - Rotational delay / Latency Seek time - Time required to move the read/write head to the proper track. Latency – The amount of time that elapses after the head is positioned over the correct track until the starting position of the addressed sector passes under the read/write head. Seek time + Latency = Disk access time ### Typical disk One inch disk- weight=1 ounce, size -> comparable to match book Capacity -> 1GB Inch disk has the following parameter Recording surface=20 Tracks=15000 tracks/surface Sectors=400. Each sector stores 512 bytes of data Capacity of formatted disk=20x15000x400x512=60x10<sup>9</sup>=60GB Seek time=3ms Platter rotation=10000 rev/min Latency=3ms Internet transfer rate=34MB/s # 15. Structure of General Purpose Multi-processor. **Multiprocessing** is the use of two or more central processing units (CPUs) within a single computer system. The term also refers to the ability of a system to support more than one processor and/or the ability to allocate tasks between them. Three ways to implement them - 1. UMA - 2. NUMA - 3. Distributed systems Fig. 8.6: UMA SMP model ### (a) The UMA Model In a UMA (Uniform Memory Access) SMP model, all processors have equal access time to all memory words independent of its location. Hence the name. Each processor may use a private cahce. The UMA model schematic is shown in Fig. 8.6. The UMA model is suitable for general purpose and time sharing applications by multiple users. To coordinate parallel events, synchronization and communication among processors are done using shared variables in the common memory. # (b) The NUMA model ANUMA (Non-Uniform Memory Access) is a SMP system in which the access time varies with the location of the memory word. Fig. 8.7 shows the NUMA model. The shared memory is physically distributed to all processors, called 'local memories'. The collection of all local memories forms a global address space accessible by all the processors. Fig. 8.7: NUMA SMP Model A distributed-memory multicomputer system is shown in Fig. 8.9. The system consists of multiple computers (called 'nodes'), interconnected by a message-passive network. Each node is an autonomous computer consisting of a processor, loss memory and may be peripherals. The message-passing network provides point-to-point static connections among the nodes. All memories are "private" and are accessible only by local processors. This restriction is removed gradually with the introduction of distributed shared memories. Internode communication is carried out by passing "messages" to the interconnection network. A processor cannot access/share a remote memory without the cooperation of mother processor. This cooperation takes place in the form of messages exchanged by the processors with 'message passing' protocol. The sharing is explicit, with system having routines to 'send' and 'receive' "messages" between processors. If the sender needs, receiving processors may send an ACK (acknowledgement) message back to the sender. Applications that need little communication like mail servers, Web search etc., may use this scheme. The practical implementation such message-passing systems is called 'cluster'. A duster is a set of computers connected over a local area network (LAN) that functions is a single large multiprocessor. The computers are connected to each other over their interconnect via standard network switches and cables and each computer runs a copy of the operating system. A hierarchically structured multiprocessor can be modeled. The processors are divided into "clusters". The clusters are connected to global, shared-memory modules. All clusters have equal access time to the global memory. ### 16. Working of Digital Camera- Embedded system. ### 7.1.2 Digital Camera ### 7.1.2.1 Operating Principle In traditional cameras films are used to capture images. Digital cameras use an array of optical sensors to capture images. These sensors are nothing but the devices such as photodiodes which convert light into electric charge. There are two types of sensors which are generally used in commercial products, charge-coupled devices (CCDs) and CMOS technology based sensors. CCDs give higher quality images than sensors based on CMOS technology, but they are more expensive than CMOS based sensors. ### 7.1.2.2 Block Diagram Fig. 7.2 shows a block diagram of a digital camera. Fig. 7.2 Block diagram of a digital camera Lens assembly: The lens assembly consists of zoom and focusing lens. **Optical sensors:** These are sensing elements which generate charges. Each sensing element generates a charge that corresponds to one pixel. Pixel is one point of a pictorial image. The number of pixels determines the quality of pictures. Analog to Digital Converter (ADC): The charge is an analog quantity. This is converted into digital representation using ADC. The digital representation of the image is nothing but the color and intensity of each pixel represented by a number of bits. **System Controller:** This block consists of a processor, memory (RAM, EEPROM) and interface circuits required to connect to other parts of the system. The processor controls the operation of the camera. The image data obtained from ADC is processed by the processor to obtain image representation in standard formats suitable for use in computers, printers, display devices etc. TIFF (Tagged Image File Format) is widely encountered for uncompressed and JPEG (Joint Photographic Experts Group) for compressed images. **Image Storage**: The processed images are stored in storage devices such as flash memory cards, floppy disks, miniature hard disk drives etc. Liquid Crystal Display (LCD): The captured and processed image can be displayed on a LCD screen. **Interface**: A standard interface provides a simple mechanism for transferring the images to a computer or a printer. A simple serial or parallel interface or a connector for a standard bus such as PCI or USB can be used. Images can also be transferred by physically transferring the card if flash memory cards are used. ### Input switches, Motor and the flash unit The motor is used for focusing purposes and the system controller generates the signals needed to control the operation of the motor and the flash unit. Some of the inputs come from switches which are activated by the user. #### 7.1.2.3 Design The processor used in digital camera has to perform quite complex signal processing functions. Therefore a digital camera requires a considerably more powerful processor than is needed for microwave oven. The camera is a battery operated device. So the processor should not consume much power. The processor consumes less power than the display and flash units of a camera.