Building a 4-agent autonomous coding pipeline on Apple Silicon — MLX backend questions

Hi,

I'm building ANF (Autonomous Native Forge) — a cloud-free, 4-agent autonomous software production pipeline running on local hardware with local LLM inference. No middleware, pure Node.js native.

Currently running on NVIDIA Blackwell GB10 with vLLM + DeepSeek-R1-32B. Now porting to Apple Silicon.

Three technical questions:

  1. How production-ready is mlx-lm's OpenAI-compatible API server for long context generation (32K tokens)?

  2. What's the recommended approach for KV Cache management with Unified Memory architecture — any specific flags or configurations for M4 Ultra?

  3. MLX vs GGUF (llama.cpp) for a multi-agent pipeline where 4 agents call the inference endpoint concurrently — which handles parallel requests better on Apple Silicon?

GitHub: github.com/trgysvc/AutonomousNativeForge

Any guidance appreciated.

Building a 4-agent autonomous coding pipeline on Apple Silicon — MLX backend questions
 
 
Q