If I use the bios to allocate 96GB of my framework's memory to the GPU, I can get the 120b param GPT-OSS to respond very quickly, but within two prompts gnome crashes due to a failed vram allocation.
Step 1 is to debug that, but step 2 is to debug why dynamic allocation doesn't work
