Cell Architecture Explained - Part 2: Again Inside The Cell - Memory
In the Cell design there are 8 banks of 8MB each and if the minimum read is 1024 bits the resolution is 13 bits. An additional 3 bits are used to select the bank but this is done on-chip so will have little impact. Each bit doubles the number of memory look-ups so the PC will be doing a thousand times more memory look-ups per second than the Cell does. The Cell's memory busses will have more time free to transfer data and thus will work closer to their maximum theoretical transfer rate. I'm not sure my theory is correct but CPU caches use a similar trick.
What is not theoretical is the fact the Cell will use very high speed memory connections - Sony and Toshiba licensed 3.2GHz memory technology from Rambus in 2003 [Rambus]. If each cell has total bandwidth of 25.6 Gigabytes per second each bank transfers data at 3.2 Gigabytes per second. Even given this the buses are not large (64 data pins for all 8), this is important as it keeps chip manufacturing costs down.
100 Gigabytes per second sounds huge until you consider top end graphics cards are in the region of 50 Gigabytes per second already, doubling over a couple of years sounds fairly reasonable. But these are just the theoretical figures and never get reached, assuming the system I described above is used the bandwidth on the Cell should be much closer to it's theoretical figure than competing systems and thus will perform better.
APUs may need to access memory from different Cells especially if a long stream is set up, thus the Cells include a high speed interconnect. Details of this are not known other than they transfer data at 6.4 Gigabits / second per wire. I expect there will be busses of these between each Cell to facilitate the high speed transfer of data to each other. This technology sounds not entirely unlike HyperTransport though the implementation may be very different.
In addition to this a switching system has been devised so if more then 4 Cells are present they too can have fast access to memory. This system may be used in Cell based workstations. It's not clear how more than 8 cells will communicate but I imagine the system could be extended to handle more. IBM have announced a single rack based workstation will be capable of up to 16 TeraFlops, they'll need 64 Cells for this sort of performance so they have obviously found some way of connecting them.
Memory Protection
The memory system also has a memory protection scheme implemented in the DMAC. Memory is divided into "sandboxes" and a mask used to determine which APU or APUs can access it. This checking is performed in the DMAC before any access is performed, if an APU attempts to read or write the wrong sandbox the memory access is forbidden.
Existing CPUs include hardware memory protection system but it is a lot more complex than this. They use page tables which indicate the use of blocks of RAM and also indicate if the data is in RAM or on disc, these tables can become large and don't fit on the CPU all at once, this means in order to read a memory location the CPU may first have to read a page table from memory and read data in from disc - all before the data required is read.
In the Cell the APU can either issue a memory access or not, the table is held in a special SRAM in the DMAC and is never flushed. This system may lack flexibility but is very simple and consistently very fast.
This simple system most likely only applies to the APUs, I expect the PU will have a conventional memory protection system.
Anand, Can you please explain the Cell architecture and do hypothetical analysis of strengths and weaknesses. I'd really appreciate if you can publish an article on this. It'll be great to understand the working and differences compared to x86 and how this may have an effect on the present offerings from Intel and AMD. Thanks.
man, 8ghz... if the cell processor is running at 4ghz is that going to be enough?
these are some incredibly high clock speeds, I hope to god that the simplicity of the design cuts down on heat output and electricity consumption... I don't want to have to set up phase change cooling and another powerline to my living room just to play final fantasy 15
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
6 Comments
Back to Article
James - Wednesday, February 9, 2005 - link
In case anyone wants to read all the gory details, here is the patent:http://www.freepatentsonline.com/6779049.html
Peter Vojzola - Tuesday, February 8, 2005 - link
Cell Architecture Explained - Part 2: Again Inside The Cell - MemoryIn the Cell design there are 8 banks of 8MB each and if the minimum read is 1024 bits the resolution is 13 bits. An additional 3 bits are used to select the bank but this is done on-chip so will have little impact. Each bit doubles the number of memory look-ups so the PC will be doing a thousand times more memory look-ups per second than the Cell does. The Cell's memory busses will have more time free to transfer data and thus will work closer to their maximum theoretical transfer rate. I'm not sure my theory is correct but CPU caches use a similar trick.
What is not theoretical is the fact the Cell will use very high speed memory connections - Sony and Toshiba licensed 3.2GHz memory technology from Rambus in 2003 [Rambus]. If each cell has total bandwidth of 25.6 Gigabytes per second each bank transfers data at 3.2 Gigabytes per second. Even given this the buses are not large (64 data pins for all 8), this is important as it keeps chip manufacturing costs down.
100 Gigabytes per second sounds huge until you consider top end graphics cards are in the region of 50 Gigabytes per second already, doubling over a couple of years sounds fairly reasonable. But these are just the theoretical figures and never get reached, assuming the system I described above is used the bandwidth on the Cell should be much closer to it's theoretical figure than competing systems and thus will perform better.
APUs may need to access memory from different Cells especially if a long stream is set up, thus the Cells include a high speed interconnect. Details of this are not known other than they transfer data at 6.4 Gigabits / second per wire. I expect there will be busses of these between each Cell to facilitate the high speed transfer of data to each other. This technology sounds not entirely unlike HyperTransport though the implementation may be very different.
In addition to this a switching system has been devised so if more then 4 Cells are present they too can have fast access to memory. This system may be used in Cell based workstations. It's not clear how more than 8 cells will communicate but I imagine the system could be extended to handle more. IBM have announced a single rack based workstation will be capable of up to 16 TeraFlops, they'll need 64 Cells for this sort of performance so they have obviously found some way of connecting them.
Memory Protection
The memory system also has a memory protection scheme implemented in the DMAC. Memory is divided into "sandboxes" and a mask used to determine which APU or APUs can access it. This checking is performed in the DMAC before any access is performed, if an APU attempts to read or write the wrong sandbox the memory access is forbidden.
Existing CPUs include hardware memory protection system but it is a lot more complex than this. They use page tables which indicate the use of blocks of RAM and also indicate if the data is in RAM or on disc, these tables can become large and don't fit on the CPU all at once, this means in order to read a memory location the CPU may first have to read a page table from memory and read data in from disc - all before the data required is read.
In the Cell the APU can either issue a memory access or not, the table is held in a special SRAM in the DMAC and is never flushed. This system may lack flexibility but is very simple and consistently very fast.
This simple system most likely only applies to the APUs, I expect the PU will have a conventional memory protection system.
Peter Vojzola - Tuesday, February 8, 2005 - link
I think you should do an article on the new Cell Processor technology.A detailed interpretation of the filed patents technology can be found here (done in a very detailed but easy to understand style)
It's literally jaw-dropping!
http://www.blachford.info/computer/Cells/Cell0.htm...
It would be quite an eye opener for a "things to come" article for AnandTech fans.
avijay - Tuesday, February 8, 2005 - link
Anand, Can you please explain the Cell architecture and do hypothetical analysis of strengths and weaknesses. I'd really appreciate if you can publish an article on this. It'll be great to understand the working and differences compared to x86 and how this may have an effect on the present offerings from Intel and AMD. Thanks.crtfanboy - Monday, February 7, 2005 - link
man, 8ghz... if the cell processor is running at 4ghz is that going to be enough?these are some incredibly high clock speeds, I hope to god that the simplicity of the design cuts down on heat output and electricity consumption... I don't want to have to set up phase change cooling and another powerline to my living room just to play final fantasy 15
sheik124 - Monday, February 7, 2005 - link
so what does this mean for rambus? 90% royalties or something :P