The $30 Adapter That Unlocked My 4070

I spent weeks tuning ollama configs, trying different quantizations, and blaming model architectures for my dual-GPU inference speeds. Turns out the problem was dumber than that: my RTX 4070 was running on a single PCIe Gen3 lane.

One lane. Gen3 x1. About 1 GB/s of bandwidth to feed a GPU that can do 200 TOPS of INT8 compute. The 4070 was basically breathing through a coffee stirrer.

How I Got Here

My setup is a dual-GPU rig for local LLM inference: an RTX 4090 handles the heavy lifting, and the 4070 acts as VRAM overflow for models too large to fit on a single card. The 4090 sits in the primary x16 slot and works fine. The 4070, though, was stuck in the board's second physical PCIe slot, which I assumed was x4 or x8.

It was not. A quick check with lspci -vv told the real story:

lspci - checking link status

$ lspci -vv -s 01:00.0 | grep -i "lnk"
# RTX 4090 - looking good
LnkCap: MaxSpeed 16GT/s, MaxWidth x16
LnkSta: Speed 16GT/s, Width x16

$ lspci -vv -s 02:00.0 | grep -i "lnk"
# RTX 4070 - oh no
LnkCap: MaxSpeed 16GT/s, MaxWidth x16
LnkSta: Speed 8GT/s, Width x1

The 4070 is capable of Gen4 x16. But the slot it was sitting in only negotiated Gen3 x1. Some quick math: Gen3 x1 is about 985 MB/s. Gen4 x16 is about 32 GB/s. My card was running at 3% of its rated bus bandwidth. I had been debugging the wrong layer of the stack for weeks.

Turns out the PCIEX4 slot on this board has a known defect - it's electrically wired as x1 despite the physical slot accepting larger cards. This isn't in any spec sheet. I found it mentioned in a two-year-old forum post buried on page 4 of search results.

The Fix: K43SG M.2-to-PCIe Adapter

The idea is simple: the motherboard has an unused M.2 NVMe slot (M2Q_SB, wired to the chipset) that supports Gen4 x4. The K43SG is a small adapter board that converts that M.2 M-key slot into an open-ended PCIe x4 slot. You plug the M.2 edge into the NVMe slot, screw it down, and slot your GPU into the adapter's PCIe connector.

It costs about $30 on AliExpress. It looks like a riser card had a baby with a breakout board. It has zero documentation. I ordered it anyway.

Installation

Physically straightforward but awkward. The adapter sits flat against the motherboard at NVMe height, so the GPU ends up at an unusual angle. I used a 3D-printed bracket to keep it stable, but zip ties would work too. The key constraint is that you're sacrificing an NVMe slot - in my case, the M2Q_SB chipset slot, which means one fewer SSD option. Worth it.

The Results

After installing the K43SG and rebooting, the link status told a different story:

lspci - after K43SG install

$ lspci -vv -s 02:00.0 | grep -i "lnk"
# RTX 4070 via K43SG adapter
LnkCap: MaxSpeed 16GT/s, MaxWidth x16
LnkSta: Speed 16GT/s, Width x4

# Gen4 x4 = ~7.88 GB/s (vs Gen3 x1 = ~0.98 GB/s)
# 8x bandwidth improvement

Gen4 x4. Not the full x16 the card supports, but 8x what it had before. For LLM inference, where the GPU mostly needs to receive weight data and send back tokens, x4 is more than enough. The bottleneck moved elsewhere.

Bandwidth

1 GB/s -> 8 GB/s

Inference Speed

+289%

10.6 -> 41.2 tok/s

Idle Power

-57%

~30W -> ~15W

Total Cost

$30

+ 1 NVMe slot

Benchmarks: Before and After

The model that made the difference most obvious was qwen2.5-vl-abliterated:32b, a 32-billion parameter vision-language model. This one splits across both GPUs during inference, so inter-GPU bandwidth directly determines how fast layers can shuttle activations back and forth.

ollama benchmark - qwen2.5-vl-abliterated:32b

# ── BEFORE: Gen3 x1 ──────────────────────────
$ ollama run qwen2.5-vl-abliterated:32b --verbose
total duration:       14.2s
load duration:        3.1s
eval count:           106 tokens
eval duration:        9.98s
eval rate:            10.6 tok/s

# ── AFTER: Gen4 x4 via K43SG ─────────────────
$ ollama run qwen2.5-vl-abliterated:32b --verbose
total duration:       5.8s
load duration:        2.9s
eval count:           106 tokens
eval duration:        2.57s
eval rate:            41.2 tok/s

# +289% improvement. Same model, same quant, same system.

But the real unlock was 70B models. Before the adapter, splitting a 70B Q2 model across both GPUs was barely usable - the x1 link created a hard ceiling around 8-10 tok/s and the 4070 would sit idle waiting for data half the time. After:

ollama benchmark - 70B Q2 dual-GPU

# 70B Q2 split: 4090 (21GB VRAM) + 4070 (9GB VRAM)
# GPU offload: 100%
$ ollama run llama3.1:70b-q2 --verbose
eval count:           256 tokens
eval duration:        10.24s
eval rate:            25.0 tok/s
gpu utilization:      94-98%

# 25 tok/s on a 70B model. Both GPUs fully saturated.
# This was not possible before.

25 tok/s on a 70B model. That's fast enough that you stop noticing the generation speed and start reading. The 4070's 12GB of VRAM acts as genuine overflow capacity now, not a decorative heatsink.

Bonus: Idle Power

Something I didn't expect: the 4070's idle power consumption dropped from 25-35W to about 15W. My guess is that with only a Gen3 x1 link, the card couldn't negotiate a proper low-power idle state - the link was already at minimum width, so the GPU stayed in a semi-active power state to maintain the connection. With Gen4 x4, the card has room to drop down through the PCIe link power states properly.

Over a month of 24/7 uptime, that's roughly 10-15 kWh saved. Not life-changing, but nice.

The Wire Snip That Matters

One thing to know if you're powering a K43SG through an ATX Y-splitter: the 4070 was drawing 65W after shutdown. Fans spinning, card fully powered, with Windows completely off. The Y-splitter passes all 24 ATX pins to both branches, including pin 9 - the +5VSB standby rail. That rail stays live whenever the PSU is plugged in. The K43SG uses it to keep the adapter board and GPU energized 24/7.

The fix is simple and critical: snip the pin 9 wire on the Y-splitter's K43SG branch. Just that one wire. I confirmed with a multimeter - pin 9 read 4.282V in standby, all other pins read 0V. Cut close to the connector, tape both ends with electrical tape. The motherboard branch keeps its +5VSB intact (needed for Wake-on-LAN and USB standby), but the K43SG branch loses standby power entirely.

Result: 0W at the wall after shutdown. The 4070 and K43SG power off completely when Windows shuts down, then come back on normally through the 12V rails when the system starts. Without this snip, you're burning 65W around the clock for nothing.

Current GPU Config

GPU	Link	Role	VRAM
RTX 4090	Gen4 x16	Primary compute	24 GB
RTX 4070	Gen4 x4 (K43SG)	Display + VRAM overflow	12 GB
UHD 770	Integrated	Third monitor	Shared

The iGPU handles a third monitor so the 4070's display outputs don't eat into inference performance. All three GPUs have a job. No wasted silicon.

Things to Know

You lose an NVMe slot. The K43SG physically occupies an M.2 M-key slot. If you're short on storage, this matters. I had a spare chipset slot, so it was a free trade.
Gen4 x4 is not x16. For gaming or CUDA training workloads that push massive textures or gradients, x4 is a real bottleneck. For inference, where data flows are mostly sequential and predictable, it's fine.
Physical mounting is janky. The GPU floats above the motherboard at NVMe height. You need some kind of support bracket or your GPU will slowly lever the M.2 connector out of its socket. I printed one; a GPU support bracket from Amazon would also work.
Not all M.2 slots are equal. CPU-direct NVMe slots often have more bandwidth. Chipset slots may share bandwidth with other peripherals. Check your motherboard manual. My M2Q_SB slot is chipset-routed, but the only other thing on that bus is a SATA controller, so contention is minimal.

Verdict: Worth every penny

$30 for an 8x bandwidth improvement, 57% lower idle power, and the ability to run 70B models at 25 tok/s across two GPUs. This is the kind of upgrade where the cost is so low that the only question is why I didn't do it sooner. If you have an unused M.2 slot and a bandwidth-starved GPU, just buy the adapter.

The 4070 went from being a glorified display adapter to a genuine compute partner. Sometimes the fix isn't a better model or a smarter config. Sometimes it's a $30 breakout board from AliExpress and twenty minutes with a screwdriver.