Linked by Roger Finger on Tue 4th Mar 2003 18:51 UTC
Intel Digital media applications are unique in that they can generally consume all the performance they can get. Unlike other tasks that execute in a few seconds, the rendering of stills, audio and video can take several minutes or even hours. Applications in the digital media space can translate increases in performance to increases in end-user productivity, and it is therefore beneficial for them to take advantage of the latest platform technologies.
E-mail Print r 0   · Read More · 32 Comment(s)
Order by: Score:
REAL desktops anyone!?
by Elver Loho on Tue 4th Mar 2003 19:36 UTC

How about aiming for quieter and cooler systems rather than going for the speed and paying the price of having to live in the same room with a machine that makes as much noise as a jet engine?

Plus I still cant see how having one CPU emulating two CPUs can make much difference? Sure, latency would drop a bit but for the same money or even for less you could get a real dual CPU system.

How about comparing the new Pentium with HT against one of those real dual CPU systems?

Oh and no offence but is this a review or an advertisement?

Hmm... I reached the end of the article. And yes, it is indeed an advertisement ;)

smp kernel
by pnut on Tue 4th Mar 2003 19:37 UTC

I read something a while back about the improvements in performance of the 2.5 smp-kernel when Hyperthreading was turned on. Is it possible to use the smp-kernel on a single-processor machine using HT to fool the kernel into thinking there are more available processors? Are there benefits to this set-up over the straight-up regular kernel?

RE: REAL desktops anyone!?
by Eugenia on Tue 4th Mar 2003 19:40 UTC

>Oh and no offence but is this a review or an advertisement?

Neither. It is a "paper" about HT, explaining in generic terms what's up with HT and multimedia. Advertisements are paid, and we certainly weren't paid for this article (neither we paid for it ;) . In fact, we have one more HT article to publish next week. And we are looking for articles about P4 optimizations, which is of interest for our developer readers.

RE: REAL desktops anyone!?
by Elver Loho on Tue 4th Mar 2003 19:43 UTC

How about some countering criticism to bring out the slightly bad/worse sides of Intel's new puppy? I heard it lost to a 1.6ghz AMD CPU in a Tom's Hardware test once.

re: smp kernel
by xlnx-x on Tue 4th Mar 2003 20:16 UTC

pnut, hyperthreading shows up to the os as 2*physical processors. To take advantage, you have to have an smp kernel, but it helps if the scheduler is ht aware mostly because of cache reasons. Im not sure how much has been integrated back to 2.4, but a lot has been going on in linux 2.5 with ht scheduling, and freebsd is working on it also. I've heard a possible 30% speed up from HT, but I havent done any benchmarks myself since i'm still running on a P3 800 and an Athlon 1ghz just fine.

Don't be so eager to criticize
by Casey Winans on Tue 4th Mar 2003 20:20 UTC

True I read the some review (or part of it) but you forgot to mention the review said that the AMD processor did out perform it on some test but was totally smoked in (most) other areas.

Re: REAL desktops anyone!?
by renZYX@hotmail.com on Tue 4th Mar 2003 20:25 UTC

>How about aiming for quieter and cooler systems [...]

Well Intel works also on low power CPU: nothing prevents you to buy one, or to buy a liquid-cooled computer.

It's about choice, YOUR choice: computers makers provides both possibilities.

>Plus I still cant see how having one CPU emulating two CPUs can make much difference?

Think about it as the next step above normal superscalar CPU: you want to fill all those rarely used execution units by executing several threads at the same time.

Simple economics tells me that SMT CPU will be much,much cheaper than SMP setup: much less silicium used, a much simpler motherboard. And most of all: much more single-CPU computer sold than SMP computers..

great article
by jodie on Tue 4th Mar 2003 20:56 UTC

Great article. Will code written explicitly for hyperthreading processors work in older procesors?.

Re: HT/DM
by Anonymous on Tue 4th Mar 2003 21:23 UTC

The whole article looks like it was made by copy & pasting some marketing departments press releases, technical errors included.

If a few choice words were replaced in about every other paragraph, the article would be fine.

Nice to see some of the results with pretty graphs though.

What about a dual system with these?
by David Stidolph on Tue 4th Mar 2003 21:28 UTC

Given that 4 processor systems are not generally available without a bank loan, what about a pair of these in a dual system? Any boards support it yet?

Just a thought,

David Stidolph,
Austin, TX

Hyperthreading on older processors
by Roger Finger on Tue 4th Mar 2003 21:47 UTC

Multi-threaded code runs just fine on older processors, though a stall in one thread will block other threads that are attempting to run. Many applications are already multi-threaded and can take advantage of HT technology without modification. For developers, the existing Microsoft threading API's are all you need to take advantage of this new feature.

Typo? He isnt 21
by Taras on Tue 4th Mar 2003 22:00 UTC

Alright, the guy's picture looks anything but 21..maybe 31..but probly 41. Expecially considering his experience

editorial policy
by Rick on Tue 4th Mar 2003 22:04 UTC

Although this is not an advertisement and there is no attempt to conceal the fact that it's by an Intel employee, I find including this paper on a supposedly neutral magazine-style site a bit of a shame. It's like reading an article about the benefits of quattro four wheel drive in a car magazine that's written by an employee of Audi. That wouldn't happen: a journalist would write it. Editorial content here should likewise be written by independent people. Anything else just undermines the credibility of the site.

HT use
by Ed Page on Tue 4th Mar 2003 22:23 UTC

Elver Loho:
While your computer is doing one thing, most of your CPU is going unused. The point of HT is to try and use as much of the processor as possible.
If you want a good technical article on HT, I suggest Arstechnica, I just diont feel like looking it up, sorry.

pnut:
www.kerneltrap.com has some good sumamries of Linux discussions on HT. Liek someone said, it shows up as 2 processors. A good example of this is Windows. In XP, it shows as 2 logical processors, but in 2k, it shows as 2 physical processors. I cannot remember what it is, maybe cache or programs migratiung between processors, but an OS that has special SMP scheduler for HT can be faster then one that doesnt. Linus has not merged the HT specific optimizations. Instead he merged some NUMA (*drool*) stuff and he believes the same principles work for HT, so it will b an even better solution with less chance of bugs

So it looks like hardware is taking over the concepts of software to a new degree..

Why don't we just make better software from the beginning? Threading is almost completely a software issue (and a smart one), why reinvent a dual-chip as one instead of making the best use cycles in the first place? I think this greatly overcomplicates the CPU internals (and opens oh-so-many cans of code - er worms..). Athlon chips already run with up to 9 internal RISC processors with an X86 face - no special rewrites needed. I'm sorry but if your code can't make the best of that (running@N*Ghz), you might rather invest in some programming lessons instead of a newly-hatched schizophrenic CPU!

Sorry, could you tell - I'm a Be user. Latency? I guess somebody must be having some latency issues.... ;)

If all it takes is an SMP kernel, an OpenBeOS implementation would blow everyone(OS) else out of the water.

HT witrh Linux 2.4, I tried it...
by Emmanuel on Tue 4th Mar 2003 22:49 UTC

Friday I installed RedHat 8.0 on a brand new Dell server pe2650, dual Xeon 2.4Ghz (HT capable), 2GB RAM. After booting, top actually reported 4 CPUs... but... 16MB free! just a plain RH8 without any application running was using 2GB RAM! 1.6GB was used in buffers, and there were tons of strange errors in syslog, apparently somer context-switching problems. I switched HT off, and it works fine now, I mean with no application running I have around 1700MB free ;)

wait n see
by JJ on Tue 4th Mar 2003 23:22 UTC

I have still haven't encoded any mp3s let alone a video, so Intel still has away to go to persude me to up & buy one HT or not. I am still inclined to go dual MP, atleast I can count on real 2x speed for some threaded apps or at least responsiveness, but I can also count on 2x heat & noise and very limited choice of mobo/case without latest & greatest built in features (USB2, FW, SATA etc). The Toms HW article tells me it will only double my 1GHz Athlon speed most of the time without HT, sometimes 3x.

Anyway I think Intel understands this and is concentraing on the lower end with more & more integrated systems which can more than satisfy most peopls need.

RE: Typo? He isnt 21
by KAMiKAZOW on Tue 4th Mar 2003 23:38 UTC

At the beginning I thought this too, but I'm sure it means that the guy works for Intel since 21 years. ;)

Re: What about a dual system with these?
by Ronald on Wed 5th Mar 2003 00:31 UTC

My brother has a cheap Dell Server with 1 P4 Xeon HT. He just needs to add another P4 Xeon HT and voilą!

And he also needs Win2K server but that's another story ;)

How about dedicated processors instead?
by Anonymous on Wed 5th Mar 2003 00:36 UTC

Why don't desktop manufacturers provide fast dedicated multimedia processors instead. Realtime video is possible on a reasonably modest CPU (eg 1.8 GHz) if you have a hardware MPEG encoder.

A 3 GHZ processor is still to slow (hot and noisy) for realtime video editing with software.

Current realtime encoding solutions are still too expensive for most home users.

eheh@RH8 taking all but 16megs of 2gigs of RAM. You seem to be reading the results slightly wrong. Linux (and windows aswell i'd suppose) uses RAM to cache parts of your harddrive for faster access. So instead of giving all of the ram to the apps right off the bad and having slower harddrive access, it buffers up a bunch, and when apps allocate memory, they are favoured and less stuff is cached. If you want to know how much RAM the apps are actually taking, read the "used" line from the output of "free". If you only had 16megs free you'd swap in a hurry, but thats not the case, or atleast i hope it's not, or else i'll have another reason to be biased against redhat ;) .

re: Typo? He isnt 21
by Kevin on Wed 5th Mar 2003 01:04 UTC

Notice the wording of the article, it says he is a 21 year employee not a 21 year old employee. The term 21 year employee implies that he's been there for 21 years.

Re: Re : HT witrh Linux 2.4, I tried it...
by Anonymous on Wed 5th Mar 2003 01:20 UTC

eheh@RH8 taking all but 16megs of 2gigs of RAM. You seem to be reading the results slightly wrong. Linux (and windows aswell i'd suppose) uses RAM to cache parts of your harddrive for faster access. So instead of giving all of the ram to the apps right off the bad and having slower harddrive access, it buffers up a bunch, and when apps allocate memory, they are favoured and less stuff is cached. If you want to know how much RAM the apps are actually taking, read the "used" line from the output of "free". If you only had 16megs free you'd swap in a hurry, but thats not the case, or atleast i hope it's not, or else i'll have another reason to be biased against redhat ;) .

Used + Free memory equals total available memory. So no - he WAS reading the output of 'top' correctly. He never mentions what kernel he was using ... not having SMP could be the problem

Jacob

Actually HW should take over the SW role in fine grained processing. The more cpu resources there are on a chip, the more likely they will be idle and warming your house without HT. With HT at least they can do more work more often to justify the heat.

Think how much sand can flow though an hour glass. Smaller grains flows though the narrow hole faster. Bigger grains block or even stick. A bigger grain compares to a memory op that isn't in cache.

You say it complicates the design of the cpu chip, how would you know? It can actually dramatically simplify the design since a whole slew of other complexities can be thrown away, betting everything on HT will clean up the design. I won't be including including much of that junk prediction & speculation, out of order logic that was previously tech of the day in my project. The only reason HT didn't take off earlier is because SW folks have been avoiding PAR programming and forcing Intel to fix up the clock, well it don't work so well, threading will eventually lead to more real simpler cpus on chip instead of 1 uber fast monster design.

HT has been around at least 20yrs, only now are most folks getting exposed to it. If HT were done really well, it wouldn't be limited to 2 or 4 threads but would be open to any no. In addition, the threads should be able to communicate, syncronize & pass messages with each other, then it would look like a modern Transputer.

Eventually the OS will fine tune the scheduling so that cooperating threads of each program will share the HT threads. If HT is used to timeshare a cpu over many single threaded apps, then the cache will also be shared between them and that will slow things down.

Also the Athlon may have umteen internal ALUs or units but it certainly doesn't have 9 internal processors. Try and write some C code for a benchmark, optimize it and look at the asm output, measure the no of opcodes that must have been executed per sec. Guess what, its closer to your clock speed, ie you only get about one op per cycle on random memory intensive apps. Keep everthing in cache, and it can get a bit better.

Even a BeOS user should see some benefit, but I would expect it to be <<30% claimed by Intel.

JJ

Why HT
by Joe P on Wed 5th Mar 2003 02:40 UTC

The P4 has a 20 stage pipeline and only 8 registers; thus, it gets a lot of stalls because it guessed wrong on a branch or it couldn't process the instruction because the register was updated from the previous instruction yet. To get around this, they just added a second set of registers and a second front end to issue instructions. Now when the front end detects a register conflict or doesn't want to miss predict the branch, it just turns control over the other front end.

The bad part is that they can show a 30% speed increase! That means that 30% of the time in a non-HT processer is wasted because of processor stalls (can you say bad design).

The reason you want the kernel to know about HT instead of treating them as 2 processors is simple. Both front ends share the same L2 cache and memory mapping registers; this means that it's better to run two thrieds of the same program on the processor because they share memory and will have better cache hits.

Nice technical writing but...
by Nacs on Wed 5th Mar 2003 04:07 UTC

I personally prefer Intel over AMD but this 'technical paper' read like a 4 page advertisement/OSNews endorsement.

A little better 'balanced' writing about the functions of HT would have made me actually stick around to read pages 2,3, and 4 instead of feeling like I was watching an infomercial disguised as a documentary.

question
by gumby on Wed 5th Mar 2003 04:21 UTC

if you have say two xeon's with HT making the computer seem to have four CPU's can Windows XP pro or 2000 pro use all four? if not any plans for a patch or something along those lines?

Re: Joe P, Elver Loho
by Bascule on Wed 5th Mar 2003 05:10 UTC

Joe P: The bad part is that they can show a 30% speed increase! That means that 30% of the time in a non-HT processer is wasted because of processor stalls (can you say bad design).

It's excellent design from a marketing standpoint. While the P4's pipeline may be too deep and its branch predictor too inaccurate for the combination of both to make for an efficient processor, Intel knows one thing: clock speed sells.

I accually did the calculations, and it turns out that the percentage of CPU cycles wasted by the Pentium 4 is approximately equal to the percentage of branch instructions in the code being executed (it's just how the figures worked out for the P4, it's not a general rule or anything)

However, Intel has won the clock speed competition hands down. That's all that matters to them.

Elver Loho: How about aiming for quieter and cooler systems rather than going for the speed and paying the price of having to live in the same room with a machine that makes as much noise as a jet engine?

I'm writing this from a Dell Precision 4550 workstation. It runs completely silent.

HT yet again
by Ed Page on Wed 5th Mar 2003 05:30 UTC

To clear up soem confusion, please go read this
http://arstechnica.com/paedia/h/hyperthreading/hyperthreading-1.htm...

If I rememebr right, HT only required like 10% more transistors


Jacob Munoz: Even if Athlons had 9 internal independent processors, it could not handle it transparently. You would need multi-threaed code or else it would starve all the other processors. Also Saying HT is making hardware around software is liek saying SMP is.

I would imagine BeOS could do soem nice stuff with HT due to all of its threading. Any body know if there are even approprioate patches to get it to run?

RE: REAL desktops anyone!?
by Anonymous on Wed 5th Mar 2003 07:39 UTC

@ Elver.. well, your question is a tell tale. Only because the cheapest systems that can be had may be loud due to el cheapo fans, this does not at all mean that you can't have quite systems, even with the strongest CPU. If your system is loud this only means that _you_ personally opted for the cheap fan and now complain about it. Even these little Shuttle bare bones are almost silent...

HT yet again
by Earl Colby Pottinger on Wed 5th Mar 2003 15:26 UTC

As far as I know HT already works with BeOS. I am looking for a good dual processor board with HT. To BeOS that would look like a four way system.

Does anyone know if Intel plans to take HT or MP further, ie a single chip that looks like or is four processors?

HT or Mp upto 4 on die
by Ed Page on Wed 5th Mar 2003 17:27 UTC

I owuld guess 2 logical/physical would be the sweet spot for HT or else you would get too much time wasted from different threads waiting on one another.

Now I know IBM with their Power4 has 2 procs/die which I guess has some advantages. Im not sure if Intel will do that.

I would really like to see IBM add HT soon, especially shortly after they release the 970. My next computer will be in two years, I would love a dual HT IBM 970 like chip, but if I had to, I could drop the dual or HT from my dream. IBM said the 970 is meant for desktop/workstation, so whether apple even uses it or not, IBM might have someone else in mind. If I cant get that dream, then just a Dual HT Pentium