Bookmark this page:
Yahoo!
Windows Live
del.icio.us
digg
Netscape
|
|
|||||||||||||||||||||||||
|
Posted by Alexei A. Frounze on June 9, 2006, 12:52 am
Please log in for more thread options 2 (off-CPU memory cache) helps just like the other cache. It's basically a herarchy of caches, each working at its own speed and the closer the cache to the CPU the faster data retrieval. But if the cache does not contain the information the CPU needs, the dirty work will have to be done anyway, i.e. read from the memory. 3 (interrupts and multithreading in general): suppose you're waiting for a key in your application, and all your system and application software is single-threaded, i.e. no multiprocessing of any kind, no parallelism. The easiest and the least effective is a loop like this: while (!kbhit()) ; // <conio.h> used
This simply wastes the CPU time, which could have been used for something
more useful, like parallel calculations in some background activity, whatever. This is where interrupts help -- instead of waiting in an infinite loop and doing nothing, you set your keyboard interrupt routine that is called once per key hit/release, opposed to some millions of calls to kbhit() in a loop. You advance your state machine upon the keyboard event, using as little of the CPU time as needed, with no excessive overhead. 4 (DMA): this tiny bit of circutry does memory-to-device I/O transparent to the CPU, it goes w/o too much of the CPU time overhead because the CPU is interrupted only at the times when there's some data ready for it or can be taken from it. Just that, no loops like in the above. Yet DMA usually works with blocks of data, which again helps to minimize the overhead (you get one interrupt on a block of bytes as opposed to getting on each byte). Read some computer architecture book, like Tanenbaum's... Alex | |||||||||||||||||||||||||
|
Posted by on June 9, 2006, 12:52 am
Please log in for more thread options >
basically a
> 2 (off-CPU memory cache) helps just like the other cache. It's > herarchy of caches, each working at its own speed and the closer the
cache
> to the CPU the faster data retrieval. But if the cache does not
contain the
> information the CPU needs, the dirty work will have to be done
anyway, i.e.
> read from the memory.
for a
> > 3 (interrupts and multithreading in general): suppose you're waiting > key in your application, and all your system and application software
is
> single-threaded, i.e. no multiprocessing of any kind, no parallelism.
The
> easiest and the least effective is a loop like this:
something
> while (!kbhit()) ; // <conio.h> used > This simply wastes the CPU time, which could have been used for > more useful, like parallel calculations in some background activity,
infinite
> whatever. This is where interrupts help -- instead of waiting in an > loop and doing nothing, you set your keyboard interrupt routine that
is
> called once per key hit/release, opposed to some millions of calls to
event,
> kbhit() in a loop. You advance your state machine upon the keyboard > using as little of the CPU time as needed, with no excessive
overhead.
>
transparent to
> 4 (DMA): this tiny bit of circutry does memory-to-device I/O > the CPU, it goes w/o too much of the CPU time overhead because the
CPU is
> interrupted only at the times when there's some data ready for it or
can be
> taken from it. Just that, no loops like in the above. Yet DMA usually
works
> with blocks of data, which again helps to minimize the overhead (you
get one
> interrupt on a block of bytes as opposed to getting on each byte).
> > Read some computer architecture book, like Tanenbaum's... > > Alex Thanks Alexei, I know about all this - trust me, but I fail to see how the external cache, or the use of interrupts, or DMA can cause the CPU to execute more than 1 instruction in a hardware clock cycle. What I'm trying to say is that I fail to see how external cache, or the use of interrupts, or DMA constitute an internal interface for/of the CPU. (I really like Maxim's answer ;-) . Are you there Maxim?) - Olumide | |||||||||||||||||||||||||
|
Posted by Alexei A. Frounze on June 9, 2006, 12:52 am
Please log in for more thread options > > cache, or the use of interrupts, or DMA can cause the CPU to execute
> > more than 1 instruction in a hardware clock cycle. What I'm trying to >
they are
> Several execution units can execute several instructions per cycle, if > not dependent on one another.
Right, and now you may have CPUs with several cores or that hyperthreading feature, so, you can effectively have more than 1 instruction per clock due to the parallelism. intel x86 CPUs probably have not a lot of useful instructions that take just 1 clock :) What I was trying to say in my previous posts is that even though the circuitry that is connected to the CPU can be rather slow (effectively running with slower clocks than that of the CPU), it just doesn't mean the CPU itself starts running as slow as they do. Alex | |||||||||||||||||||||||||
|
Posted by on June 9, 2006, 12:52 am
Please log in for more thread options
Alexei A. Frounze wrote: > > > cache, or the use of interrupts, or DMA can cause the CPU to
execute
> > > more than 1 instruction in a hardware clock cycle. What I'm
trying to
> >
if
> > Several execution units can execute several instructions per cycle, > they are
> > not dependent on one another.
>
hyperthreading
> Right, and now you may have CPUs with several cores or that > feature, so, you can effectively have more than 1 instruction per
clock due
> to the parallelism. intel x86 CPUs probably have not a lot of useful
(effectively
> instructions that take just 1 clock :) > > What I was trying to say in my previous posts is that even though the > circuitry that is connected to the CPU can be rather slow > running with slower clocks than that of the CPU), it just doesn't
mean the
> CPU itself starts running as slow as they do.
> All modern CPUS (since about 1980) are pipelined in some form, meaning that the work of an individual instruction is broken up into many units, each taking a clock cycle. A common analogy is doing laundry: there is a washer and a dryer. When the first load A finishes washing, we can put it in the dryer, but while A is drying, we can start the next load B in the washer. Then A finishes drying and B finishes washing. A is now done, and B moves to drying while the next load C starts washing. If the time for washing and drying is T, then we achieve 1/T loads throughput, while each load actually takes 2T to complete. In modern CPUs like the Athlon or Pentium 4, the pipeline can be as long as 10 or 20 stages. Therefore even though each instruction takes 10 or 20 cycles, they are pipelined so that we can achieve 1 instr/cycle throughput. For more information, see: Computer Architecture: A Quantitative Approach, John Hennessy, David Patterson | |||||||||||||||||||||||||
|
Posted by Maxim S. Shatskih on June 9, 2006, 12:52 am
Please log in for more thread options > In modern CPUs like the Athlon or Pentium 4, the pipeline can be as
> long as 10 or 20 stages. Therefore even though each instruction takes > 10 or 20 cycles, they are pipelined so that we can achieve 1 > instr/cycle throughput. More so. Even P5 Pentium was capable of running 2 instructions in the flow in parallel, provided they do not depend on one another (operands of second are not altered by first). This feature is called "superscalar". Sparc CPU is even more great in such ability. The weak point of superscalar is that the decision on paralleling is done in runtime by CPU hardware, which cannot keep large context. The Very Long Instruction Word (VLIW) CPU like IA-64 loads this burden to the compiler. The compiler (which can keep huge context) decides how to parallel the operations between several CPU cores. The back sides are huge complexity of compiler and assembler language (nearly impossible to write manual assembler, too much context to keep in head). Yet another approach to fast CPUs. Throw away any complexity, use the saved silicon space for cache and raise the frequency as fast as it is possible. Pentium 4 and Alpha go this way (Alpha even sacrificed any complexity away from assembler language - it only has 64bit arithmetics, if you want byte one - write a subroutine). -- Maxim Shatskih, Windows DDK MVP StorageCraft Corporation maxim@storagecraft.com http://www.storagecraft.com | |||||||||||||||||||||||||
| Similar Threads | Posted |
| Re: Simple Hardware Clock question | June 9, 2006, 12:52 am |
| Simple RAM question | January 1, 2006, 11:48 am |
| simple but slightly OT question ... | October 27, 2005, 12:30 pm |
| Old hardware: question | October 21, 2005, 2:10 pm |
| Hardware question | July 2, 2008, 12:43 pm |
| TV Tuner / Laptop Hardware Question... | November 21, 2005, 5:12 pm |
| Memory/Drive Question MemTest86 plus some other hardware stuff | January 5, 2006, 6:37 pm |
| Simple Networking | December 1, 2006, 8:01 pm |
| DVD+/-RW simple query | January 3, 2007, 4:22 pm |
| Dual Boot: Is it as simple as this? | July 1, 2005, 4:15 am |
| Want a simple keyboard with a big backspace key | December 2, 2005, 2:25 pm |
| Does anybody make this simple little item? | January 8, 2006, 12:58 pm |
| Looking for very simple usb graphic card | October 31, 2007, 3:05 am |
| 5 drives in 3 bays... Simple bracket? | March 2, 2008, 2:41 am |
| Replacing controller on RAID5: simple swap or other? | July 4, 2005, 4:37 pm |

Re: Simple Hardware Clock question
Yahoo!
Windows Live
del.icio.us
digg
Netscape 







>
> I undesrstand how numbers (1) and (5) can help, but not the others.
> Putting your answer together with Maxim's, is it correct to say all
> these techniques do NOT require the external interface?