One thing that Intel has learned through the successive years of the reiterating the Skylake microarchitecture on the same process but with more cores has been optimization – the ability to squeeze as many drops out of a given manufacturing node and architecture as is physically possible, and still come out with a high-performing product when the main competitor is offering similar performance at a much lower power.
Intel has pushed Comet Lake and its 14nm process to new heights, and in many cases, achieving top results in a lot of our benchmarks, at the expense of power. There’s something to be said for having the best gaming CPU on the market, something which Intel seems to have readily achieved here when considering gaming in isolation, though now Intel has to deal with the messaging around the power consumption, similar how AMD had to do in the Vishera days.
Intel has been able to eek some god performance out of these processors, but all at the expense of power consumption.
According to ars: 3% faster than ryzen in single threaded benchmarks and a wooping 37% SLOWER than ryzen in multithreaded. and the intel part is almost 300usd more. And kt requires investment in a new motherboard… again. Intel changes sockets too often.
NaGERST,
For future reference, please provide links when your are citing data, here is the link you seem to be referring to though:
arstechnica.com/gadgets/2020/05/intels-new-i9-10900k-fast-yes-competitive-not-so-much/
Anyways, they only show two benchmarks: cinebench and passmark. The passmark scores are incomplete as they only show the multithreaded score. I tried searching passmark for this CPU and they don’t have any results for these new CPUs, so we don’t have the data to confirm or any single threaded scores there at the moment.
For cinebench you are right.
When you read the article they say the i9-10900k CPU was not running to intel’s specs.
It could easily be the case that they need a BIOS update to unlock the full CPU, which is something we’ve seen before. More data is clearly need, which brings up a rather severe deficiency in arstechnica’s review: normally they test the system with a barrage of benchmarks. Why didn’t they perform and publish those scores like usual?? I’m really baffled why they only published cinebench scores in full, I really hope they weren’t cherry picking.
Anyways, anandtech’s cinebench scores aren’t far off, they have even higher scores for ryzen…
http://www.anandtech.com/show/15785/the-intel-comet-lake-review-skylake-we-go-again/11
Wow, you’d need to analyze the photos to see who won that race, haha.
I looked into it more, and I believe the reason amd and intel cpus are so close with cinebench is because it’s heavy use of AVX512 instructions, which intel actually clocks beneath the CPU’s normal clock rate nullifying all of its clock speed advantages. So I’d say if your software makes heavy use of AVX512, intel’s advantages are pretty much zilch.
However I don’t think this is representative of typical single threaded software. The same demographics who rely on heavy CPU vectoring are also relying on highly parallel multithreading. In other words in the real world it just doesn’t make sense to run heavy vector applications on a CPU optimized for single threaded performance. Vector software scales very well with more cores running in parallel.
Even if intel CPUs had scored 10-30% faster for cinemark singlethreaded, I think everyone would agree that it’s still the multithreaded performance that matters for this type of workload anyways. Single threaded performance becomes more significant for software that is not embarrassingly parallel, and this is where intel still has a lead. For example:
This also translates to a lead in many games where the CPU is basically feeding the GPU and the GPU is performing the parallel computations.
I’ll repeat what I said last time, I’d choose AMD for a new build today. For my server work with VMs, the high number of cores and multithreaded performance is an easy win for AMD, and that’s just performance before we even look at costs & power. On the other hand I’m working with some high perf GPU projects where intel has a comfortable lead, which factored into my choosing intel last time. I’m willing to bet that most intel fans aren’t too happy that intel is still stuck on the skylake microarchitecture and personally I was hoping 2020 would bring a new process bump on intel’s side. Still, regardless of which product you prefer, we’re all benefiting from increased CPU competition… now we need the same thing to happen with GPUs 🙂
AMD doesn’t even support AVX512 and AVX256 is implemented in two 128 bit wide steps. So, if anything, AVX512 should give a big advantage to Intel CPUs. Unless this extension is simply not worth the cost. AVX512 requires a lot of die area and power – two resources Intel is short on. Maybe without this extension they could afford more cache or CPU cores?
Speed difference between Intel 14nm and TSMC 7nm processes is nil. If anything, Intel’s process being heavily skewed to high performance digital circuits could perform a bit better. TSMC’s main advantage is higher transistor density (~50%) and, to a lesser degree, higher power efficiency (~30%). Transistor density is not very important in this market (high performance CPUs are not sensitive to wafer cost). Power efficiency is important, especially with larger numbers of cores, but gap between Intel and AMD is too big to be explained by the process alone.
Intel needs to lower the pricing, improve micro-architecture, catch up with the process development (14nm will not be able to compete with 5nm EUV) and deal with customers moving to other CPU architectures. Not a good position to be in, especially for an organization not used to fierce competition.
As for the single/multithreaded performance, 10 years ago my company had several critical use cases that still required high single-threaded performance. Now all these applications have moved to multi-threading and in some cases to GPUs, and the only applications that are left single-threaded are those where higher performance is not required, let alone ~10% higher performance. One place where Intel CPUs still have a meaningful advantage over AMD is floating point performance, but this is also going away soon.
ndrw,
Wow that’s a good observation. I didn’t know that and I’m not actually sure how cinebench handles this. What can I say, you’ve brought up an excellent point! For an apples to apples test, all CPU would need to run the same code path, but I don’t really know if that’s the case with cinebench. Without access to the source I’m not sure that I can easily find out.
AVX512 instructions are known for poor performance on intel CPUs, something I’ve confirmed running benchmarks myself. Although I’m not really sure if it’s because the CPU is throttling or if it just lacks the execution units needed to run everything in parallel. Does anyone know? It’s entirely possible that AVX512 offers no advantages whatsoever at least on today’s CPUs.
In the past, conventional wisdom was that the increased density helps performance because the closer transistors are together, the faster we expect a signal to propagate down the line. Although with these circuits getting so small on a molecular level, it’s possible that other physical effects beyond propagation delay are taking over. AMD shows that 7nm is an obvious win for power and core counts, but so far neither intel nor AMD have been able to demonstrate it’s better for raw core performance. This has been predicted for a while, but maybe we’re finally reaching the performance limits of using silicon transistors, Maybe 7nm is not suited for reaching higher speeds? I don’t know, I’d be interested in seeing more research on this!
Yeah I’d like to see more from intel too.
It depends on what you do. There’s a lot of tasks where adding more cores won’t help even one bit whereas increasing core speed will. For example, a long running SQL query I was working on this week pegged a single CPU to 100% for a sustained period whereas the other cores were mostly idle. The gimp is the same way. There’s no one size fits all rule, but on my desktop system I tend to hit single threaded caps quite a bit more frequently than SMP caps. That’s not to say there aren’t good uses for having more cores. I love having cores to reduce build times!
All in all at this point I’d lean to AMD today as the more competitive option, but hypothetically if intel could be more competitive on price and power, I still think a lot of us would benefit more from better core speeds than more cores. But again it obviously depends on what you do.
I try not to make assumptions because both companies have the potential to improve. It’s future amd versus future intel, not future amd versus present intel or visa versa. Either way I’m glad they are competing.
AVX has a significant performance benefits (theoretical 8x throughput for 64 bit operations) but this is very much application specific. SIMD is much more constrained than multithreading, so gains vary a lot. As for slower clocking, I think Intel had to sacrifice the clock speed because of length of critical paths but that should still result in a good net performance gain.
The problem with AVX is the overhead it brings to the rest of the core when is not in use. The whole data path has to be 512 bit wide. Even with aggressive clock gating, these gates and wires are still sitting in the middle of the core, increasing load and delays for signals that are used more often. That might have been a good idea when Intel was a leader in process development but now it exacerbates the problems of their technology.
Recently routing is scaling slower than the transistor gate size (lithography constraints, electromigration) and is getting much more complicated. For example, all signal tracks are implemented a uniform parallel buses and cut to size with cut layers. The result is that the digital circuit size hasn’t really gone down as much as transistor or digital gate size. Meanwhile, parasitics (RC constants) per unit length of the wire have increased. This is one reason people are hopeful about EUV – it has a potential of simplifying routing and making it more efficient again but even then we are not talking about big gains, as electromigration is still a concern.
Maximum speed (assuming power consumption is not a concern) has slowly been degrading since ~28nm. Here the issue is mostly reliability (self-heating, electromigration, HCI) – new devices are much faster at the same bias current but they can’t accept nowhere near as much current as older bulk devices.
To really improve things we would need better devices that could switch at lower gate voltage swing. If not for that limit, today we would have CMOS devices supplied from a 70mV voltage, consuming 100x less power per gate, and routing density would only be limited by lithography and material properties. There are some improvements in the pipeline but nothing dramatic (think: less than transition from bulk to Fin-FET).
So am I. I am really glad AMD has come up with good CPUs for desktops at reasonable retail prices and I welcome lower prices of 10gen Intel CPUs. But it is not just about dektops. X86 has already lost mobile and is quickly losing datacenters to custom/non-x86 CPUs, and compute markets to GPUs. So the market is now in enterprise, laptops and desktops – nothing wrong with that but it simply won’t be large enough to support a company like Intel.
ndrw,
The whole data path would need to be 512bits wide to execute the instructions completely in parallel, but I’m not sure if current generation intel CPUs have enough execution units in silicon to do that. The performance profile suggests that perhaps they do not. Are you able to find any definitive documentation?
Yes, I’m aware of the existence of such constraints, but I don’t really know how significant they are at any given nm size however. I couldn’t answer how close we are to the points of diminishing returns…
The increased power and heat dissipation are obvious, but you really think that 28nm could be faster if we didn’t care about power? Did you factor in the propagation delay required by larger die size? I don’t have expertise in this area, do you know of any reference material that breaks down the maximum theoretical performance for any given process size? Maybe as a function?
I’ve heard that some substrates might replace silicon, and I’ve even heard that the vacuum tube could make a comeback to replace transistors.
https://www.extremetech.com/extreme/185027-the-vacuum-tube-strikes-back-nasas-tiny-460ghz-vacuum-transistor-that-could-one-day-replace-silicon-fets
http://jacobsschool.ucsd.edu/news/news_releases/release.sfe?id=2060
It all sounds interesting, but when it comes to CPU engineering I’m merely an armchair observer not qualified to judge the merits of these approaches. I think it’d be cool if the old technology could be redeemed, haha.
I’ve been saying this as well.
I shouldn’t speak too loudly, but I think there’s been a long decline in american engineering in part due to a lack of focus on competing and the egotistical assumption that we are better than everyone else.
Uhhh hate to break the news to ya but go check out Gamersnexus, Linus Tech Tips, Hardware Unboxed, etc because they are all showing the exact same results which is….Intel AT BEST gets between 5-12% on some single threaded workloads (which really its 2020 who is only running one task these days?) while getting frankly curbstomped in multi by has high as 37%..
Lets face it man, they are “pulling a Netburst” and just cranking up the voltage to get higher clocks on an arch that is well past its sell by day, I mean how many years have they been pushing the 14nm Lake arches now? And their “gamer’s chip” marketing tag is about to get kicked in the nads thanks to PS5 and XB1X both being 8c/16t chips which ironically are made by AMD so the ports will run better on Ryzen.
At this point Intel really needs to come up with a new arch and fast as there really is no compelling reason to buy Team Blue anymore, from the security patches that rob performance to the constant switching of motherboards and the boards for Intel being more expensive to the lack of PCIe 4.0 to getting more cores/threads for the same money with Ryzen all they have left to hang their hat on is a measly 15% single thread performance so they are either gonna have to be aggressive as hell and try to undercut AMD on prices like AMD tried with Bulldozer or as Gamersnexus reported with their last flagship there is gonna be a lot of OEMs sitting on unsold stock.
bassbeast,
I don’t know why so many people want to disagree with me even though nothing I’m saying should be controversial when looking at the empirical data, which I’ve been totally transparent on and I’ve always cited clearly. Some of the reviewers have been very biased against intel, probably because they’re tired of intel’s price gouging and lack of improvements on micro-architecture, but nevertheless it’s not a good excuse to not give intel credit where it’s due.
I found ndrw’s post refreshing because it was insightful and there are things I can learn from him. That’s a good line of discussion. There’s nothing informative in your post however, it just seems like your not happy with what I said, but it doesn’t change the facts.
Did you just fabricate this on the spot? I wouldn’t be pointing this out if you said “ON AVERAGE”, since I think ~10-20% is typical but since you specifically said “AT BEST”, well that’s just not true.
http://www.anandtech.com/show/15785/the-intel-comet-lake-review-skylake-we-go-again/10
I’ll also included the multithreaded score too so you don’t respond accusing me of bias, although it doesn’t contradict anything I’ve been saying at all.
To be clear, I’m not suggesting 30%+ is typical and I don’t want to be accused of promoting intel using cherry picked data, however it’s clearly wrong to state 12% as an upper limit “AT BEST”. Honestly I don’t know why you would say that.
Here’s a benchmark from an application I actually do use frequently:
http://www.anandtech.com/show/15785/the-intel-comet-lake-review-skylake-we-go-again/6
I included Ryzen 7 because something interesting happens here, the Ryzen 9 3950x is not always the fasted in the ryzen lineup. It’s not a total anomaly because I’ve seen this happen in other benchmarks as well. I’m open to hearing anyone’s explanation for it 🙂
Again, I hear a lot of people including you saying intel is loosing the game and isn’t competitive, but it doesn’t look like we actually disagree on much beyond the general notion that intel is winning any races at all. Just because I point out that AMD has not won across the board doesn’t mean I don’t think AMD has the more competitive offering, which I’ve said many times already. Will intel loose their ST lead in the future? Maybe, but it hasn’t happened yet. I have nothing against AMD, I genuinely hope they keep improving. This competition is good for consumers!
Alfman,
Mind you, that’s not for digital circuits, at least ones constructed out of standard cells (which is most of them). You are right that overall circuit size affects the performance (adds routing parasitics) so 28nm would not be able to match performance of 7nm regardless of how much current we dump into devices. But in some applications (high speed front end of SerDes, PLLs, RF transceivers) it is common to have some devices pushed to their limit.