36GB of VRAM for $98: Building a Dual-GPU LLM Rig
- CPU
- Intel Core i9-14900K
- Board
- Gigabyte Z790 AORUS MASTER X
- GPU 1
- NVIDIA RTX 4090 (24 GB)
- GPU 2
- NVIDIA RTX 4070 (12 GB)
- Adapter
- K43SG (ADT-Link K3G V1.4)
The Problem
I've been running 70B parameter models locally - Llama 3, DeepSeek, Qwen - and the ceiling is always the same: 24 GB of VRAM on the 4090. You can quantize down to Q4_K_M and squeeze into 24 GB, but the quality drop is noticeable compared to Q6 or Q8. I needed more VRAM.
The obvious answer: add a second GPU and split the model across both. The 4070 sitting in my parts drawer has 12 GB. Together, 36 GB - enough for 70B at Q6_K with room to breathe. The less obvious part was how to physically connect it.
The Z790 AORUS MASTER X has a second PCIe x16 slot (PCIEX4, wired x4 through the chipset). Perfect, except mine was defective. It negotiated Gen3 x1 regardless of what I put in it. The 4070 would enumerate but crawl at ~1 GB/s - useless for model splitting where the GPUs need to exchange activations at every layer boundary.
So the motherboard's second slot was out. I started looking at alternatives.
The Adapter: K43SG
The K43SG is a product from ADT-Link (sold under various brand names on Amazon/AliExpress) that converts an M.2 M-key NVMe slot into a PCIe x4 slot via a flexible ribbon cable. The idea: sacrifice one NVMe slot to get a working PCIe x4 connection routed somewhere outside the case where a GPU can physically fit.
The board has three M.2 slots. M2Q_SB connects through the Z790 chipset at PCIe 4.0 x4 - that's 8 GB/s of bandwidth, enough for a secondary inference GPU. I was already using an NVMe in M2A_CPU (direct to CPU, PCIe 5.0) for my boot drive, and M2B_CPU for a second NVMe. M2Q_SB was free.
The BOM
| Part | Notes | Price |
|---|---|---|
| JMT K43SG Gen4 50cm | ADT-Link K3G V1.4 ribbon cable adapter | $79.98 |
| GELRHONR 24-pin ATX Y-splitter | Powers K43SG from PSU (pin 9 snipped) | $9.98 |
| OPSFALCON 24-pin extension | Cable management | $7.99 |
| Total | $97.95 | |
The 4070 was already on hand. The Y-splitter is necessary because the K43SG needs ATX power to provide the 12V rails to the GPU's slot. The extension cable was just for routing - my PSU's ATX cable was too short to reach the splitter comfortably.
The Build: Three Days of POST Code 11
I thought this would take an afternoon. It took three days.
Day 1: Installation and Silence
Physical install was straightforward. The K43SG M.2 end went into M2Q_SB. The 50cm ribbon cable threaded through a rear expansion slot bracket. The PCIe end sat on a piece of cardboard next to the case with the 4070 plugged in. Y-splitter connected, 8-pin PCIe power connected to the 4070. Clean, if inelegant.
Hit the power button. Fans spin, LEDs light up - and the system hangs at POST code 11.
POST code 11 on Gigabyte Z790 means "pre-memory, early PCIe initialization." The board was hanging during PCIe bus enumeration. Pull the K43SG ribbon cable - boots fine. Plug it back in - code 11, hang.
I spent the rest of day 1 trying BIOS settings. Above 4G Decoding, Re-Size BAR, CSM, PCIe Gen settings - every permutation. Nothing worked.
Day 2: The Forum Crawl
Search results for "K43SG POST code 11" returned nothing useful. Most K43SG content is about eGPU setups on laptops, where the M.2 slot timing is different. Desktop boards have more aggressive POST sequences, and the K43SG was showing up on the bus before the board was ready to talk to it.
I found one thread on the eGPU.io forums - a user with an ASRock Z690 and the same adapter, same symptom. Buried in page 3 of the thread, someone mentioned "DIP switches on the adapter board." I pulled the K43SG back out and looked at the PCIe-end PCB.
There they were. Three unlabeled DIP switches and a jumper. No documentation in the box, no mention on the product listing, no manual.
Day 3: The DIP Switches
After more digging through Chinese-language forum posts and cross-referencing the ADT-Link K3G V1.4 reference design, I pieced together what the switches do:
The logic: desktop Z790 boards enumerate PCIe devices during early POST (code 10-15). If a device appears on the M.2 bus that doesn't look like an NVMe drive, the firmware doesn't know what to do with it and hangs. The delays let the board complete its initial bus scan with the M.2 slot appearing empty, then the GPU appears later - after the BIOS has moved past the problematic enumeration phase.
Set the switches. Hit power.
Clean POST. Windows booted. Device Manager showed both GPUs. nvidia-smi reported the 4090 on bus 01 and the 4070 on bus 06. PCIe link: x4 Gen4.
Three days of debugging, solved by three switches that nobody documents.
The Standby Power Problem
The dual-GPU setup worked. I ran some inference benchmarks, confirmed the model splitting was functional, and shut down for the night. Then I noticed the 4070's fans were still spinning. The card was drawing 65W at the wall with Windows fully shut down.
The Y-splitter passes all 24 ATX pins to both branches. Pin 9 is +5VSB - the 5-volt standby rail that stays energized whenever the PSU is plugged in. The K43SG uses this to keep its control circuitry alive (and by extension, the GPU) even when the system is off.
I confirmed with a multimeter: pin 9 on the K43SG branch read 4.282V with the system off. That's the standby voltage keeping the card alive.
The fix: snip the pin 9 wire on the Y-splitter's K43SG branch. Just that one wire. I cut it close to the connector, taped both cut ends with electrical tape. The motherboard branch keeps its +5VSB (needed for Wake-on-LAN and USB standby power), but the K43SG branch loses it.
0W standby on the 4070 after shutdown. WoL still works (motherboard still has +5VSB). GPU powers on cleanly when the system starts - the 12V rails on the other pins handle that.
Architecture
Here's how the PCIe topology shakes out:
BIOS Settings
The combination that works on Z790 AORUS MASTER X:
- Above 4G Decoding: Enabled (required for dual-GPU BAR mapping)
- Re-Size BAR: Auto
- CSM: Disabled (UEFI boot only)
- Initial Display Output: IGFX (boot display on iGPU, not the 4090 - prevents issues during POST)
- PCIe ASPM: Disabled (power management causes link drops on the ribbon cable)
- Fast Boot: Disabled (needs full POST for the delayed K43SG enumeration to work)
Results
70B parameter models at Q6_K quantization run at 25 tokens per second with the model split across both GPUs. The first ~40 layers sit on the 4090 (24 GB), the remaining layers overflow to the 4070 (12 GB). The x4 Gen4 link handles the inter-GPU activation transfers without becoming a meaningful bottleneck for sequential inference.
For comparison: the same 70B model quantized to Q4_K_M to fit in the 4090's 24 GB alone runs at ~30 tok/s. You lose 5 tok/s to the PCIe transfer overhead, but gain significantly better model quality from the higher quantization. That tradeoff is worth it for tasks where output quality matters.
Lessons Learned
- DIP switches on PCIe adapters are undocumented but critical. If you're getting POST hangs with a K43SG or similar M.2-to-PCIe adapter on a desktop board, look for timing delay switches on the adapter PCB. The defaults are set for laptops, not desktops.
- POST code 11 is a timing problem, not a configuration problem. No amount of BIOS settings will fix it. The GPU needs to be held in reset long enough for the board to complete early enumeration. PERST# and CLKRUN# delays are the answer.
- ATX Y-splitters pass +5V standby. If you're powering external hardware through a Y-splitter, check pin 9. Your "off" system might be feeding 65W to a GPU 24/7. A DMM and a wire snip solve it.
- Triple-fan GPUs don't fit two-deep in a case. The 4070 FE with its triple-fan cooler is 300mm long. With a 4090 already in the primary slot, there's no physical room. Plan for external mounting from the start. Cardboard works. No, really.