[X.com] by @simonw

One of the quickest ways to start playing with a good local LLM on macOS (if you have ~12GB of free disk space and RAM) - using llama-server and gpt-oss-20b:

brew install llama.cpp llama-server -hf ggml-org/gpt-oss-20b-GGUF \ --ctx-size 0 --jinja -ub 2048 -b 2048 -ngl 99 -fa https://t.co/xsvruU8hDD

Image from tweet


Quoted Tweet:

Georgi Gerganov @ggerganov

The ultimate guide for using gpt-oss with llama.cpp - Runs on any device - Supports NVIDIA, Apple, AMD and others - Support for efficient CPU offloading - The most lightweight inference stack today https://t.co/a6E3DssXmw

View original tweet Tue Aug 19 15:06:20 +0000 2025

View original on X.com