New around here? Register your SA Forums Account here!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $10! We charge money because it costs us money per month for bills alone, and since we don't believe in shady internet advertising, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Arivia
Mar 17, 2011

WARNING: I MAY HAVE A MELTDOWN IF I READ THE WORD DUDE

magimix posted:

I have a n00b question. What is 'the ring', in this context? It is talked about much in that video. I'm inferring from the commentary that it refers to some common 'line' or circuit that delivers power to multiple cores generally. Is that roughly on the mark?

If so, what does it mean for 'the ring' to 'fall apart'? The 'electromigration' that other commentators have speculated upon?

https://skatterbencher.com/intel-ring-downbin/

Adbot
ADBOT LOVES YOU

pyrotek
May 21, 2004



Dr. Video Games 0031 posted:

https://x.com/IanCutress/status/1812809082148892806

Ian Cutress speculates that the Intel degradation issue may actually be caused partially by the ILM design. We know that Intel's ILM for LGA 1700 causes the chip to warp, and that it gets worse over time. It's not implausible that pin to pad contact could be getting worse over time as a result, leading to weird stability issues. This is pretty easily testable: if users who have used contact frames since day 1 are encountering stability issues at a lower rate than people using the stock ILM, then this would be proof that the ILM is contributing to the problem.

If that was the case, wouldn't 12th gen parts be suffering the most (instead of not at all) since they've been around the longest?

Mental Hospitality
Jan 5, 2011

AMD should bring back the tri-core for their budget Athlon line.

AirRaid
Dec 21, 2004

Nose Manual + Super Sonic Spin Attack

Mental Hospitality posted:

AMD should bring back the tri-core for their budget Athlon line.

With how intel is doing, AMD doesn’t even need to tri.

Tuna-Fish
Sep 13, 2017

magimix posted:

I have a n00b question. What is 'the ring', in this context? It is talked about much in that video. I'm inferring from the commentary that it refers to some common 'line' or circuit that delivers power to multiple cores generally. Is that roughly on the mark?

If so, what does it mean for 'the ring' to 'fall apart'? The 'electromigration' that other commentators have speculated upon?

Intel desktop CPUs are organized with a ringbus. It's not for power, but for data. When a core needs something that's not in their own cache, they send a request one way on the ring bus to the L3 slice, and they eventually get a response coming back the other way. If the requested line is not in L3, the L3 slice will forward the request to the memory controller, that's also on the ringbus.

The ring is fingered for the suspect because the failures happen in very many different ways, including PCIE and memory failures, and the ring is something that's common to all the failure modes. There's nothing that conclusively proves it's the problem, but unless there are multiple different kinds of faults that are getting mixed up together, there are few other candidates that would explain all the different problems.

Comfy Fleece Sweater
Apr 2, 2013

You see, but you do not observe.

Maybe disabling some P-Cores or E-Cores would save my 14th gen cpu?

with a huge performance hit, I imagine

Probably better to wait and see if they can actually fix it in BIOS

lol Intel sucks so bad

Number19
May 14, 2003

HOCKEY OWNS
FUCK YEAH


Comfy Fleece Sweater posted:

Maybe disabling some P-Cores or E-Cores would save my 14th gen cpu?

Intel says: https://www.youtube.com/watch?v=7JYJhWIwGUw&t=4s

HalloKitty
Sep 30, 2005

Adjust the bass and let the Alpine blast

SpaceDrake posted:

And buried within that thread: one of the most reliable leakers on the planet casually confirms that the Raptor Lake-based enterprise parts (the Emerald Rapids Xeons) suffer from the exact same defect.

https://x.com/kopite7kimi/status/1813399866397315272

That is... that's A Lot to take in.

Yeah this is exactly what I feared. Holy poo poo.

SpaceDrake
Dec 22, 2006

I can't avoid filling a game with awful memes, even if I want to. It's in my bones...!

Tuna-Fish posted:

Intel desktop CPUs are organized with a ringbus. It's not for power, but for data. When a core needs something that's not in their own cache, they send a request one way on the ring bus to the L3 slice, and they eventually get a response coming back the other way. If the requested line is not in L3, the L3 slice will forward the request to the memory controller, that's also on the ringbus.

The ring is fingered for the suspect because the failures happen in very many different ways, including PCIE and memory failures, and the ring is something that's common to all the failure modes. There's nothing that conclusively proves it's the problem, but unless there are multiple different kinds of faults that are getting mixed up together, there are few other candidates that would explain all the different problems.

And more to the point, despite the name the ringbus is situated slap in the middle of the monolithic processor die. (The name, as I understand, comes from it operating similar to a "token ring" network from the 80s and 90s, albeit on a monumentally faster scale than a computer network.) This is a pretty good picture of the CPU die with parts labeled.

As you can see, magimix, if the ringbus manages to get warped or degraded from heat or something, since it runs spinally across the entire length of the die it has the potential to cause all sorts of other problems.

Alder Lake (the 12000s) used an essentially similar design but that didn't run into problems (and notably still seem to be doing okay), so for Raptor Lake Intel plowed ahead with what you see here.

Gee, I wonder if shoving sixteen additional Pentium 4Ms into our already hot-running eight core CPU and right along one of the ends of our spinal ringbus might introduce problems. Nah, go for it.

SpaceDrake fucked around with this message at 22:13 on Jul 17, 2024

BobHoward
Feb 13, 2012




FuturePastNow posted:

But then why are fewer i7s showing up in the failure charts than i9s, despite there being far more i7s out in the world? Why are i5s almost completely missing from the failures? Don't they all use the same ILM design?

That's exactly what bothers me about this theory. Also, it's just speculation not suggested by evidence.

Tuna-Fish posted:

The ring is fingered for the suspect because the failures happen in very many different ways, including PCIE and memory failures, and the ring is something that's common to all the failure modes.

Such failures don't necessarily implicate the ring bus. Failures inside a processor core can easily cause a bunch of weird symptoms.

movax
Aug 30, 2008

CPUs aren't supposed to fail. You pay $100s for them and then 10 years later of running / being thinking sand it sits in a cardboard box or you sell it for $5 or something like that and will fire back up if you don't ESD it or crack it.

Intel innovating as always.

Dr. Video Games 0031
Jul 17, 2004

SpaceDrake posted:

And buried within that thread: one of the most reliable leakers on the planet casually confirms that the Raptor Lake-based enterprise parts (the Emerald Rapids Xeons) suffer from the exact same defect.

https://x.com/kopite7kimi/status/1813399866397315272

That is... that's A Lot to take in.

He's an Nvidia leaker primarily. he's probably just repeating hearsay, but he is definitely in position to hear the hearsay. this could be a pretty catastrophic problem for intel if it turns out to be affecting xeons too.

priznat
Jul 7, 2009

Let's get drunk and kiss each other all night.

movax posted:

CPUs aren't supposed to fail. You pay $100s for them and then 10 years later of running / being thinking sand it sits in a cardboard box or you sell it for $5 or something like that and will fire back up if you don't ESD it or crack it.

Intel innovating as always.

CPUs cannot fail, they can only BE failed

WhyteRyce
Dec 30, 2001

time for the Intel sales group to earn that free cruise they got for team building

8-bit Miniboss
May 24, 2005

CORPO COPS CAME FOR MY :filez:
Should have never taught sand to think.

Comfy Fleece Sweater
Apr 2, 2013

You see, but you do not observe.

WhyteRyce posted:

time for the Intel sales group to earn that free cruise they got for team building

alright I've heard references to this mythical cruise, but seems to have happened while the pandemic was raging or something? I missed it, what's the Cliff's notes?

WhyteRyce
Dec 30, 2001

Comfy Fleece Sweater posted:

alright I've heard references to this mythical cruise, but seems to have happened while the pandemic was raging or something? I missed it, what's the Cliff's notes?

Intel has, understandably, been working under austerity conditions. Things like temporary paycuts that were restored a year later via RSUs that vest over 4 years and have already lost value, firings, blocking promos, team outings, reductions of workplace benefits, firings, etc., firings

The sales and marketing group last year booked out two cruise ships in the Bahamas to hold face to face meetings, training, education, etc. The justification for this was that it was actually cheaper to rent the cruise ships than to book out a Holiday Inn or whatever and fly everyone in.

I heard that Intel didn't bother to check/research if people's work visas would allow them to go and employees assumed Intel was managing the whole thing, so when some people got there they immediately had to go home

Subjunctive
Sep 12, 2006

ask me about nix or tailscale

WhyteRyce posted:

The sales and marketing group last year booked out two cruise ships in the Bahamas to hold face to face meetings, training, education, etc. The justification for this was that it was actually cheaper to rent the cruise ships than to book out a Holiday Inn or whatever and fly everyone in.

I was pricing out a 10-person F2F event for the all-remote company I was working for last year, and someone found that it was materially cheaper to take everyone to some Caribbean island than Toronto or NYC.

Rinkles
Oct 24, 2010

What I'm getting at is...
Do you feel the same way?
Any chance anyone has a Zen 3 CPU lying about, that they could part with on the cheap? I was hoping for deeper Prime discounts this year to upgrade my brother's pc.

Shipon
Nov 7, 2005

8-bit Miniboss posted:

Should have never taught sand to think.

the failure to do so seems to be the problem here if anything

hobbesmaster
Jan 28, 2008

Subjunctive posted:

I was pricing out a 10-person F2F event for the all-remote company I was working for last year, and someone found that it was materially cheaper to take everyone to some Caribbean island than Toronto or NYC.

Aren’t Toronto or NYC worst case for something like that? My first impression would be that Montreal or Boston would even be significantly better.

Arrath
Apr 14, 2011


Subjunctive posted:

I was pricing out a 10-person F2F event for the all-remote company I was working for last year, and someone found that it was materially cheaper to take everyone to some Caribbean island than Toronto or NYC.

That's why you book the conference center of a 3rd tier casino in Reno

buffbus
Nov 19, 2012
Probably but they aren't going to justify a trip to the Bahamas by comparing the cost to a Best Western in Toledo Ohio.

Subjunctive
Sep 12, 2006

ask me about nix or tailscale

hobbesmaster posted:

Aren’t Toronto or NYC worst case for something like that? My first impression would be that Montreal or Boston would even be significantly better.

Yeah, but the budget was approved for Toronto, and we had facilities there to use as well as some people already local, so there was some discussion ensuing.

Comfy Fleece Sweater
Apr 2, 2013

You see, but you do not observe.

Seems like Epic/Fortnite players must be affected by the Intel crashes more and more, because they've changed the recommendations in their Support page to a plain link to the Oodle post about the Intel failures ( https://www.radgametools.com/oodleintel.htm )

It used to be a bunch of random recommendations to throttle your CPU and such, but I'm guessing they gave up

Frequent crashes in Fortnite on i9-13900K/KF/KS or i9-14900K/KF/KS CPUs: https://www.epicgames.com/help/en-U...cpus-a000086852


edit: I'm reading the Oodle doc, and their main recommendation seems to be "return that piece of poo poo to the manufacturer"

Comfy Fleece Sweater fucked around with this message at 03:55 on Jul 18, 2024

priznat
Jul 7, 2009

Let's get drunk and kiss each other all night.
Oh man GamersNexus Steve is gonna keep on raising hell with this one if it affects his beloved Fortnite!!

gradenko_2000
Oct 5, 2010

HELL SERPENT
Lipstick Apathy
looking forward to the 60 Minutes-style expose on INTEL'S BOEING MOMENT

movax
Aug 30, 2008

Comfy Fleece Sweater posted:

Seems like Epic/Fortnite players must be affected by the Intel crashes more and more, because they've changed the recommendations in their Support page to a plain link to the Oodle post about the Intel failures ( https://www.radgametools.com/oodleintel.htm )

It used to be a bunch of random recommendations to throttle your CPU and such, but I'm guessing they gave up

Frequent crashes in Fortnite on i9-13900K/KF/KS or i9-14900K/KF/KS CPUs: https://www.epicgames.com/help/en-U...cpus-a000086852


edit: I'm reading the Oodle doc, and their main recommendation seems to be "return that piece of poo poo to the manufacturer"

I gotta say I admire the commitment of RAD to their website design and content from the 90s. It just works(TM)

8-bit Miniboss
May 24, 2005

CORPO COPS CAME FOR MY :filez:

gradenko_2000 posted:

looking forward to the 60 Minutes-style expose on INTEL'S BOEING MOMENT

Oh no, Steve and Wendell… :ohdear:

BurritoJustice
Oct 9, 2012

Comfy Fleece Sweater posted:

Seems like Epic/Fortnite players must be affected by the Intel crashes more and more, because they've changed the recommendations in their Support page to a plain link to the Oodle post about the Intel failures ( https://www.radgametools.com/oodleintel.htm )

It used to be a bunch of random recommendations to throttle your CPU and such, but I'm guessing they gave up

Frequent crashes in Fortnite on i9-13900K/KF/KS or i9-14900K/KF/KS CPUs: https://www.epicgames.com/help/en-U...cpus-a000086852


edit: I'm reading the Oodle doc, and their main recommendation seems to be "return that piece of poo poo to the manufacturer"

The guide suggests to turn off TVB, which counterintuitively instead of making it so it never goes into the TVB clocks will actually make it ignore the temperature of the CPU and always go into TVB clocks. One of the points of Intel's recent defaults bioses is to stop vendors disabling TVB by default.

At least they're no longer recommending users enable SVID Fail-Safe, that's a surefire way to explode your CPU in record time. It sets AC_LL to maximum leading to huge overvoltage under load, it's a debugging feature meant to ensure stability with a failing or insufficient VRM that is excessively drooping, on a functioning VRM you will get +300mV or more of overshoot.

Comfy Fleece Sweater
Apr 2, 2013

You see, but you do not observe.

BurritoJustice posted:

The guide suggests to turn off TVB, which counterintuitively instead of making it so it never goes into the TVB clocks will actually make it ignore the temperature of the CPU and always go into TVB clocks. One of the points of Intel's recent defaults bioses is to stop vendors disabling TVB by default.

At least they're no longer recommending users enable SVID Fail-Safe, that's a surefire way to explode your CPU in record time. It sets AC_LL to maximum leading to huge overvoltage under load, it's a debugging feature meant to ensure stability with a failing or insufficient VRM that is excessively drooping, on a functioning VRM you will get +300mV or more of overshoot.

I tried their first recommendation (lowered the Performance Core multiplier from x57 to x53), with the associated kick in the nuts to performance (the XTU app reported a benchmark of 11,000 points). No crashes so far but I’ve only played a mildly demanding game (outer worlds, but it used to crash a lot).

I’ll take the lower speed if it means more stability, hopefully the chip lasts long enough for intel to find a solution…

BurritoJustice
Oct 9, 2012

Comfy Fleece Sweater posted:

I tried their first recommendation (lowered the Performance Core multiplier from x57 to x53), with the associated kick in the nuts to performance (the XTU app reported a benchmark of 11,000 points). No crashes so far but I’ve only played a mildly demanding game (outer worlds, but it used to crash a lot).

I’ll take the lower speed if it means more stability, hopefully the chip lasts long enough for intel to find a solution…

Are you on the latest BIOS?

FuturePastNow
May 19, 2014


BobHoward posted:

That's exactly what bothers me about this theory. Also, it's just speculation not suggested by evidence.

Such failures don't necessarily implicate the ring bus. Failures inside a processor core can easily cause a bunch of weird symptoms.

Yeah. Electromigration (if that's happening) will cause all kinds of strange failures

repiv
Aug 13, 2009

Comfy Fleece Sweater posted:

Seems like Epic/Fortnite players must be affected by the Intel crashes more and more, because they've changed the recommendations in their Support page to a plain link to the Oodle post about the Intel failures ( https://www.radgametools.com/oodleintel.htm )

all unreal engine games use oodle by default since epic acquired RAD

FlapYoJacks
Feb 12, 2009
I’m not a business expert, so I need to ask: Is a near 100% failure rate on two generations of flagship SKUs a bad thing?

in a well actually
Jan 26, 2011

dude, you gotta end it on the rhyme

It's that 100% of the failures reported are on the flagships not that 100% the flagship has failed, I think.

Not great, regardless.

WhyteRyce
Dec 30, 2001

quote:

Looks like a big name Intel executive is leaving the company. Any guesses who this SemiAccurate exclusive is about?

Can’t say much above the fold for obvious reasons, and we do like the person in question so no snark this time, sorry.

Comfy Fleece Sweater
Apr 2, 2013

You see, but you do not observe.

BurritoJustice posted:

Are you on the latest BIOS?

Yes, says it addresses some eTVB issues

FlapYoJacks posted:

I’m not a business expert, so I need to ask: Is a near 100% failure rate on two generations of flagship SKUs a bad thing?

it’s good for the business execs - they get to put “managed to achieve historic milestones during my tenure” in their CV

Comfy Fleece Sweater fucked around with this message at 16:12 on Jul 18, 2024

CommieGIR
Aug 22, 2006

The blue glow is a feature, not a bug
Pillbug
Yeah just heard about the Intel failures, I have mostly AMD so I haven't noticed although my servers are Intel, but older models so not affected.

Is it just gaming or is it affecting Xeons too?

Adbot
ADBOT LOVES YOU

BlankSystemDaemon
Mar 13, 2009





CommieGIR posted:

Yeah just heard about the Intel failures, I have mostly AMD so I haven't noticed although my servers are Intel, but older models so not affected.

Is it just gaming or is it affecting Xeons too?
According to a leaker (so very unconfirmed, though they're usually surprisingly accurate), it also affects the chips that're going into Xeons.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply