820-3435 - Stuck on EFI part of boot with OSX - Boots Linux and Windows just fine

smiba

New member
I don't have the Medusa 2 tool sadly, I've thought about purchasing one but I'm too busy with other work and life events to actively accept new repairs at the moment which makes it an expensive investment when I might only use it once. (If you have one, could you possibly clean the firmware.bin for me? Although honestly I doubt ME is the issue at the moment)

Intel's Image flash tool extracts the whole BIOS area which is 6.5MB big, this file contains the Fsys string and two $SVS strings.
Where would I need to edit the checksum values? Although honestly I might not care too much about the checksum because the >= OS X 10.12 installers will need to update this anyways so the bios supports APFS. (Which would presumably correct the checksum value?)

EDIT: Replaced the BIOS with one from a online dump, got upped from B21 firmware to B22, a few months newer. Same issue however
 
Last edited:

2informaticos

Administrator
Staff member
I have no idea what happens.
Had something similar with 820-00165 past year.
There was problem with ME region, reported as chipset error even by ASD; which finishes witoutout any other error.
I tried lot of dumps, cleaning ME, but no way.
None of many Internet tutorials reffered to such error helped me.
Machine always stuck on progress bar, 70-80%.
However, it correctly booted into Linux WiFiSlax...
Probably some internal issue of PCH.
 

smiba

New member
The weirdest is that it doesn't even have a loading bar, it never progresses out of boot.efi into the actual mach_kernel booting OS X.

All linux and Windows distros work 100% ok but the OS X bootloader just does some weird stuff.
I wonder if it tries to access a device or read out a value that just ends up not like its expecting causing it to freeze, debugging actual CPUs however is harder then debugging them virtualised :)

Do you have any idea if the PCH also has a updatable firmware which might mismatch with the BIOS? I'll keep on digging for now but by the looks of it its a dead end
 

smiba

New member
Well I gave up, tried bootroms as far down as B03 and as high as B22 and none of them resolve this issue
(B03 actually breaks booting from my USB CD-Drive as well but thats probably unrelated as its just really old (2013)).

Tried different ways of importing the ME area as well and all the same results (except the one I tried that was known not clean and not from this machine and that /actually/ did show ME issues. But that was just to see what would happen if we had a broken ME area)

Currently have Windows installed and everything is working as expected, no crashes (been through 200 updates so far to be installed) and sleep / hybernate both work. This is with the B21 bootrom that was originally in this BIOS (So checksum matches) and with a clean ME area I created.

If anyone ever figures it out, let me know. But since this device works just fine as a Windows machine I don't think its worth replacing its motherboard
 
Last edited:

smiba

New member
Actually, I do have something new.

I decided to boot into OS ASD for the heck of it after coming home and to my surprise it actually booted!
It passes ASD, except that all thunderbolt tests fail.
(66004 Thunderbolt driver returned error & 12114 Cannot find the PCIe capabilities register)

Is this a known issue with the 3S156 ASD?

Click image for larger version  Name:	image_2001.jpg Views:	1 Size:	945.9 KB ID:	40413

I also had an logged issue about the NVRAM, not sure if intresting
Click image for larger version  Name:	IMG_20181009_173408.jpg Views:	1 Size:	644.4 KB ID:	40416
 

Attachments

  • image_2001.jpg
    image_2001.jpg
    945.9 KB · Views: 0
Last edited:

smiba

New member
Alright so after installing OS X 10.12 alongside windows (because I knew for sure it was going to fail at some point again) it didn't want to boot into OS X anymore and got stuck.
I let the device cool down for one hour and booted it back up again, which to my surprise allowed it to boot into OS X!

It is starting to look like a heat / time related issue, if I leave the device alone for a while it will actually go past the OS X bootloader, any idea what might be causing this? Where to start looking.
Its so weird because even through it doesn't boot OS X, its more then fine booting Windows.....
 
Last edited:

smiba

New member
I'm just fixing this because its a device I own and honestly I've yet to find a complete dead end.

Thunderbolt is broken on this device, it does not even detect the port.
I took a screenshot on a locally booted version of OS X 10.12, plugging in any devices does not make the devices appear but they do seem receive power.

The motherboard still looks pristine and I can't find any issues visually that can cause thunderbolt issues.
Not working thunderbolt a sign of ME or PCH issues? Would removing the T29 power IC disable thunderbolt in a way it could resolve this or would it just be a risky move that only makes the motherboard in a worse condition?

image_2007.png

Anyone who is stil with me, cheers :)
 
Last edited:

smiba

New member
Remove Q3080 first.
If doesn't help, remove U3210/20 and finally U2800.

Thanks, I'll give it a try.
Would just removing R3081 be sufficient enough? That would stop the mosfet from activating iirc. (I prefer removing SMD components, components where I have to use hot air on take a bit more time and effort)

Also possibly interesting finding: https://www.youtube.com/watch?v=1dpfWzgz6LY (Recording from macbook)
Disconnecting the power prevents most sensors from working, only the sensors directly pulled from the CPU keep working. Sensors that afaik come from the SMC won't read or only will read for a split second before dying again

I don't understand what could cause all these sensors to die out without DC IN being available, maybe its just a software bug? I've never seen this happen before

EDIT:

Also, to disable thunderbolt. Wouldn't it make more sense to disable PP1V05_TBTLC & PP1V05_TBTCIO & PP3V3_TBTLC? Because that would actually run the thunderbolt host without power
(I should probably check these rails by the way, no idea if they actually have power. Might be the issue because the whole thunderbolt host is not being detected by the OS / System)
 
Last edited:

2informaticos

Administrator
Staff member
Thunderbolt chip still connected to PCH/CPU.
Even there is no power, a strange value from one data line could block it...
 

smiba

New member
Alright I did some verification of the Thunderbolt circuit

It seems I might have been mistaken that thunderbolt power is working by thinking if my thunderbolt ethernet adapter makes the lights on my router blink its working

PP15V_TBT is missing because TBT_A_HV_EN stays low, not sure if its supposed to activate on a ethernet adapter?

PP1V05_TBTLC = 1.033V
PP3V3_TBTLC = 3.32V
PP1V05_TBTIO = 1.034V

PP3V3_S4_TBTAPWR = 3.32V

Where should I start looking next? Maybe the TBT controller is having issues reading out its firmware from U2890? one or more PCI-e caps damaged? The shield around the TBT controller, does it use regular solder or an easier to remove low-melt?

Still an OK idea to remove the power to all TBTLC and TBTIO? I'm deciding on removing U3210 since that should effectively disable all thunderbolt power. (No PP3V3_S4_TBTAPWR also means the TBTLC and TBTIO will never be created)

Display port does work by the way
 
Last edited:

smiba

New member
Actually, I have managed to get the thunderbolt to work again a couple of times.

If you keep the device powered off for 2-4 days and boot it up it works straight away and thunderbolt hardware is actually being found: image_2052.png


I've tested its functionality by using a thunderbolt ethernet adapter and it works without issues, as long as the controller itself can be found.

Putting the device into sleep mode for a few minutes and then powering it on back from sleep again kills thunderbolt.


Another finding is that if you wait for ±10 seconds in the drive selection menu (Keeping Alt) pressed and then booting into the system there is a good 50% chance it actually boots into the OS.
Why this is I don't know but I found this to be repeatable, where just booting straight into the OS (not through the drive selection menu) gives me a poor chance of success <10%.

What I think is that the thunderbolt controller is having issues starting up, but can actually in fact work. If its initialised properly it will keep on working until S0 power is dropped.

-----

Any idea where to look? I would like for thunderbolt to work but I also would like a macbook that doesn't give me a 50/50 gamble of not booting up.

Again the board looks pristine (and not like "used' pristine, but fresh from the factory pristine) so the only thing I could think of is reflowing the thunderbolt IC, but then the issue would also kinda change depending on the runtime of the board (Since with temperature metal/solder ever so slightly expends / shrinks). The reason why I think the trick of waiting at the drive selection works is because we reach some kind of hardware timeout that gives up on initialising TB?

I don't think its any of the PCIe caps or connections as that would result in the device cutting out at some point between attempts.

Short list of what changes behaviour:

Waiting ±10 seconds before booting up the system (drive selection menu): Increases chance of successful boot
Leaving the device powered off for a few days: Extremely increases chance of successful boot + working thunderbolt

Running off the battery or charger: Does not affect behaviour in any way
Having something connected to the TB port: Does not affect behaviour in any way
Board temperature: Does not seem to affect behaviour, but haven't managed to be 100% sure. Unlikely however as it sometimes does boot both hot and cold
Putting slight pressure / flex on the board: Does not affect behaviour in any way
Putting clean BIOS / ME Area on the ROM: Does not affect behaviour in any way
Running in SMC Bypass: Does not affect behaviour in any way
 
Last edited:

2informaticos

Administrator
Staff member
I suppose you can only discard TBT if you remove from the board all its chips which have direct connection with PCH.
Means desolder U2800 + U3210/20...

Try first to force U3030 output TBT_PWR_ON_POC_RST_L at 0V.
This should disable U2800.
Maybe need to force TBTPOCRST_MR_L too...
 
Top