820-3662 - Sudden power off

smiba · Nov 22, 2017

I've not been able to make it crash anymore, its been running a 480p video with 3% cpu load avg. for hours.

But I've noticed that plugging in the charger to the AC while the magsafe is already in the laptop will make it crash though, but only if the probe is connected...

I've moved the probes into 10x mode to hopefully separate them even more, however this is bringing in a lot of inconsistencies which are making it hard to see between actual crashes and crashes introduced by the exposed wires or scope

EDIT: It seems I didn't had J9510 plugged in all the way! Figured this out because the USB power/data and SD slot woudn't work.
Would this be the reason it was not longer crashing? Lets find out!

EDIT2: I've been able to make it crash again! But with more then one hour between plugging J9510 back in makes me unsure if that could've been it. More likely is the fact that I just drained the battery to 70% while all the other tests were with the battery above 95% or connected to AC. Due a bad trigger configuration the data was unusable so I'm hoping I can catch it again but this time right

MOberdick Still working

? Also if possible can you test it when its around 70% battery capacity? (PPBUS around 10.7V) - Crashing it on >90% is way harder for me then it was on 60-70%

smiba · Nov 23, 2017

The crashes have actually changed in behaviour...

Before:
8/10 of the time it would directly fully power off
2/10 of the time the screen video would disappear (blank/black screen), audio stopping but the backlight and fans still rolling for ~5 seconds before it fully powers off

Now:
I've been able to make it crash about 4 times... But they all show the last issue where it turns into a blank screen and after a few seconds powers off.

Are we still looking at the same issue? Any tips on what to diagnose next?
It definitely happens more often with a lower PPBUS. PPBUS at 11.0V and its running alright, I stress test it on battery to drain the battery to ~10.4V and it will crash within minutes. (But only after stopping the test of course..)

dukefawks · Nov 23, 2017

Resolder U7200 with the iron as to not introduce heat to the board too much.

smiba · Nov 23, 2017

No Title

Check this latest capture from the osiloscope! This time I'm not checking the PGOOD signal but the actual VCore voltage
I'm really starting to think U7200 is at fault as well, going to resolder it in a moment

EDIT: Resoldered it with just the iron but no luck, its not posting anymore even though VCore is at ~1.68V. Maybe it has unstable fluctuations that my multimeter isn't catching.
Tried to be bad and do a full reflow of the chip with hot air in case I just sucked at solder but still no luck.

Ordered a new IC and hopefully I have some luck with that... Annoying and definitely a bummer. Even if this will solve the issue I probably won't stop wondering if it was fixed by the heat or U7200

smiba · Nov 28, 2017

Soldered on a new U7200 and its posting again.

Hopefully this will be the end of it

smiba · Nov 28, 2017

The device is still crashing sadly.

No idea what it could be since PPVCC_S0_CPU is obviously being unstable as the last screenshot I posted shows.

CPUVR_PGOOD stays high so I doubt PP5V_S0_CPUVR_VDD or PPVIN_S0_CPUVR_VIN are being unstable as the chip resetting would definitely trigger CPUVR_PGOOD.

Could it possibly be one of the switching ICs? (U7310/20/30)

dukefawks · Nov 29, 2017

Can you catch some more of those Vcore glitches and set that against CPUVR_PGOOD. Once we have a more clear picture of what is happening to Vcore then needs to be examined if it is caused by the regulator or that communication from the CPU is lost to set the actual Vcore.

dukefawks · Nov 29, 2017

Also resolder U7330, it is close to the mounting hole so it may have cracked traces. Only the side with the control pins is important and that is the side closest to the screw hole too.

smiba · Nov 29, 2017

No Title

I've not yet resoldered U7330, I want to do that after catching two crashes.

I've only been able to crash it once today and I've attached the osciloscope output. I have one zoomed in output which clearly shows PGOOD going down before the VCore does this time and one big overview
Due me being a dumbass I do not know the time between divisions, but it should be 40ms or less on the zoomed out view

U7330 looks acceptable, still want me to resolder it even after this graph?

dukefawks · Nov 29, 2017

So lets assume the PGOOD signal drops before Vcore drops below the threshold that would trigger PGOOD to go low. This would could mean 2 things, either it detects an over-current or the communication with the CPU is lost that sets the output voltage. Over current would be a good suspect if any of the output FETs would go rogue by bad connections and causing current spikes.
You could monitor the current feedback line and set that against the PGOOD signal. Maybe you can use the IMON signal, but that has already passed through the regulator so there may be some delay in there, but I suspect it will just simply be the output of a differential amplifier so pretty direct. You should see a current spike before shutdown in this setup.

smiba · Nov 29, 2017

By looking at the block diagram I think using the IMON signal is a good call. I'll try to see if there is a peak on there while also keeping an eye on CPUVR_PGOOD.
I have a small backlog so I'll probably have the results in a couple of days after I've finished some of the other repairs waiting

MOberdick · Dec 1, 2017

Checking back in, I have not yet heard anything back from my customer about any issues. So I'd assume working fine still? It's for a doctors office so I know they are using it daily.

Sykulski · Jan 10, 2018

Is there any further investigation on this issue?

smiba · Jan 10, 2018

I'm almost done with my current queue and hope to investigate the IMON signal a bit more to see if at some point a big amount of current goes to ground. (Indicating a possible failure with one of the mosfets)

No other updates so far though

dukefawks · Jan 11, 2018

I have a decent logic analyzer on the way too. Now if I can buy some extra hours in a day I can also dig into these.

smiba · Jan 16, 2018

So far I've been able to detect two types of crashes:

1. The system fully goes from S0 into S5 within <10ms because CPUVR_PGOOD goes down.
2. The system shows a blank screen but ALL_SYS_PWRGD and CPUVR_PGOOD stays high, fans will start spinning up and after 10 seconds the system powers off into S5

From what I can see based on this is that the system does not have obvious issues with CPU VCore, the core voltage is stable.
However the system DOES show signs of CPU issues, especially the black screen without power off concerns me as it sounds like the CPU feezes and eventually it just shuts down the system because it detects something's up. Its fairly similar to when a device freezes when you're undervolting the CPU too much.

Running a mkv file in VLC is my #1 way for making it crash, nothing will crash it any faster. Usually before even 10 minutes of the file is played the system will crash.

The image I've attached is of a type #2 crash with the centre aligned with the point where the screen turned black.
I've tried to get a type #1 crash but after 10 hours of collecting samples I've only been able to make it crash running a mkv file, resulting in a type #2 crash.

EDIT: Actually if the CPU freezes the screen should not turn black but stay in position. Let me see if this board has a separate power supply for the internal GPU

EDIT2: AFAIK there is no separate GPU power supply so the signs still point to the CPU itself being damaged.

Any other ideas?

dukefawks · Jan 17, 2018

I really have no idea either. I do know that even Apple's so called "new" boards still present this issue as well. I doubt this will be figured out without a massive effort and I'm not even sure if Apple will even bother with it.
It is a large CPU with a large die, so there will be lots of thermal stress going on in that package, but who knows......

Revive · Jan 17, 2018

Not sure if this is helpful but has anyone noticed that the lower right heatsink binding post on this model logic board are frequently broken ie popped-off. I have a few that exhibit the random shutdown and broken post, wonder if its an indication of something not being right in that area during manufacturing.

dukefawks · Jan 17, 2018

Yeah the screw holes popping off is also common but not related to this problem I think.

PPVCC_S0_CPU · Oct 17, 2018

Have someone found any solutions yet?

820-3662 - Sudden power off

New member

New member

Administrator

New member

Attachments

New member

New member

Administrator

Administrator

New member

Attachments

Administrator

New member

Attachments

New member

Member

New member

Administrator

New member

Attachments

Administrator

Member

Administrator

Member