820-3662 - Sudden power off

smiba

New member
I've not been able to make it crash anymore, its been running a 480p video with 3% cpu load avg. for hours.

But I've noticed that plugging in the charger to the AC while the magsafe is already in the laptop will make it crash though, but only if the probe is connected...

I've moved the probes into 10x mode to hopefully separate them even more, however this is bringing in a lot of inconsistencies which are making it hard to see between actual crashes and crashes introduced by the exposed wires or scope

EDIT: It seems I didn't had J9510 plugged in all the way! Figured this out because the USB power/data and SD slot woudn't work.
Would this be the reason it was not longer crashing? Lets find out!

EDIT2: I've been able to make it crash again! But with more then one hour between plugging J9510 back in makes me unsure if that could've been it. More likely is the fact that I just drained the battery to 70% while all the other tests were with the battery above 95% or connected to AC. Due a bad trigger configuration the data was unusable so I'm hoping I can catch it again but this time right :)

MOberdick Still working :)? Also if possible can you test it when its around 70% battery capacity? (PPBUS around 10.7V) - Crashing it on >90% is way harder for me then it was on 60-70%
 
Last edited:

smiba

New member
The crashes have actually changed in behaviour...

Before:
8/10 of the time it would directly fully power off
2/10 of the time the screen video would disappear (blank/black screen), audio stopping but the backlight and fans still rolling for ~5 seconds before it fully powers off

Now:
I've been able to make it crash about 4 times... But they all show the last issue where it turns into a blank screen and after a few seconds powers off.

Are we still looking at the same issue? Any tips on what to diagnose next?
It definitely happens more often with a lower PPBUS. PPBUS at 11.0V and its running alright, I stress test it on battery to drain the battery to ~10.4V and it will crash within minutes. (But only after stopping the test of course..)
 

smiba

New member
No Title

Check this latest capture from the osiloscope! This time I'm not checking the PGOOD signal but the actual VCore voltage
I'm really starting to think U7200 is at fault as well, going to resolder it in a moment :)


EDIT: Resoldered it with just the iron but no luck, its not posting anymore even though VCore is at ~1.68V. Maybe it has unstable fluctuations that my multimeter isn't catching.
Tried to be bad and do a full reflow of the chip with hot air in case I just sucked at solder but still no luck.

Ordered a new IC and hopefully I have some luck with that... Annoying and definitely a bummer. Even if this will solve the issue I probably won't stop wondering if it was fixed by the heat or U7200
 

Attachments

  • photo1492.png
    photo1492.png
    67.8 KB · Views: 0
Last edited:

smiba

New member
The device is still crashing sadly.

No idea what it could be since PPVCC_S0_CPU is obviously being unstable as the last screenshot I posted shows.

CPUVR_PGOOD stays high so I doubt PP5V_S0_CPUVR_VDD or PPVIN_S0_CPUVR_VIN are being unstable as the chip resetting would definitely trigger CPUVR_PGOOD.

Could it possibly be one of the switching ICs? (U7310/20/30)
 
Last edited:

dukefawks

Administrator
Can you catch some more of those Vcore glitches and set that against CPUVR_PGOOD. Once we have a more clear picture of what is happening to Vcore then needs to be examined if it is caused by the regulator or that communication from the CPU is lost to set the actual Vcore.
 

dukefawks

Administrator
Also resolder U7330, it is close to the mounting hole so it may have cracked traces. Only the side with the control pins is important and that is the side closest to the screw hole too.
 

smiba

New member
No Title

I've not yet resoldered U7330, I want to do that after catching two crashes.

I've only been able to crash it once today and I've attached the osciloscope output. I have one zoomed in output which clearly shows PGOOD going down before the VCore does this time and one big overview
Due me being a dumbass I do not know the time between divisions, but it should be 40ms or less on the zoomed out view

U7330 looks acceptable, still want me to resolder it even after this graph?
 

Attachments

  • photo1500.png
    photo1500.png
    56.8 KB · Views: 1
  • photo1501.png
    photo1501.png
    36.7 KB · Views: 1

dukefawks

Administrator
So lets assume the PGOOD signal drops before Vcore drops below the threshold that would trigger PGOOD to go low. This would could mean 2 things, either it detects an over-current or the communication with the CPU is lost that sets the output voltage. Over current would be a good suspect if any of the output FETs would go rogue by bad connections and causing current spikes.
You could monitor the current feedback line and set that against the PGOOD signal. Maybe you can use the IMON signal, but that has already passed through the regulator so there may be some delay in there, but I suspect it will just simply be the output of a differential amplifier so pretty direct. You should see a current spike before shutdown in this setup.
 

smiba

New member
By looking at the block diagram I think using the IMON signal is a good call. I'll try to see if there is a peak on there while also keeping an eye on CPUVR_PGOOD.
I have a small backlog so I'll probably have the results in a couple of days after I've finished some of the other repairs waiting :)
 

Attachments

  • 27809_504980191d2ba92b9508c58476b76a40[1].png
    27809_504980191d2ba92b9508c58476b76a40[1].png
    7.9 KB · Views: 0

MOberdick

New member
Checking back in, I have not yet heard anything back from my customer about any issues. So I'd assume working fine still? It's for a doctors office so I know they are using it daily.
 

smiba

New member
I'm almost done with my current queue and hope to investigate the IMON signal a bit more to see if at some point a big amount of current goes to ground. (Indicating a possible failure with one of the mosfets)

No other updates so far though
 

dukefawks

Administrator
I have a decent logic analyzer on the way too. Now if I can buy some extra hours in a day I can also dig into these.
 

smiba

New member
So far I've been able to detect two types of crashes:

1. The system fully goes from S0 into S5 within <10ms because CPUVR_PGOOD goes down.
2. The system shows a blank screen but ALL_SYS_PWRGD and CPUVR_PGOOD stays high, fans will start spinning up and after 10 seconds the system powers off into S5

From what I can see based on this is that the system does not have obvious issues with CPU VCore, the core voltage is stable.
However the system DOES show signs of CPU issues, especially the black screen without power off concerns me as it sounds like the CPU feezes and eventually it just shuts down the system because it detects something's up. Its fairly similar to when a device freezes when you're undervolting the CPU too much.

Running a mkv file in VLC is my #1 way for making it crash, nothing will crash it any faster. Usually before even 10 minutes of the file is played the system will crash.

The image I've attached is of a type #2 crash with the centre aligned with the point where the screen turned black.
I've tried to get a type #1 crash but after 10 hours of collecting samples I've only been able to make it crash running a mkv file, resulting in a type #2 crash.

EDIT: Actually if the CPU freezes the screen should not turn black but stay in position. Let me see if this board has a separate power supply for the internal GPU

EDIT2: AFAIK there is no separate GPU power supply so the signs still point to the CPU itself being damaged.

Any other ideas?
 

Attachments

  • 820-3662 IMON scope.png
    820-3662 IMON scope.png
    114.3 KB · Views: 1
Last edited:

dukefawks

Administrator
I really have no idea either. I do know that even Apple's so called "new" boards still present this issue as well. I doubt this will be figured out without a massive effort and I'm not even sure if Apple will even bother with it.
It is a large CPU with a large die, so there will be lots of thermal stress going on in that package, but who knows......
 

Revive

Member
Not sure if this is helpful but has anyone noticed that the lower right heatsink binding post on this model logic board are frequently broken ie popped-off. I have a few that exhibit the random shutdown and broken post, wonder if its an indication of something not being right in that area during manufacturing.
 
Top