Hi all,
I've been working on a pure-Swift port of Google's Gemma 4 text decoder
that plugs into mlx-swift-lm as a sidecar model registration. Sharing
it here in case anyone else hit the same wall I did, and to get feedback
from the MLX team and the community before I propose anything upstream.
Repo: https://github.com/yejingyang8963-byte/Swift-gemma4-core
Why
As of mlx-swift-lm 2.31.x, Gemma 4 isn't supported out of the box.
The obvious workaround — reusing the Gemma 3 text implementation with
a patched config — fails at weight load because Gemma 4 differs from
Gemma 3 in several structural places. The chat-template path through
swift-jinja 1.x also silently corrupts the prompt, so the model loads
but generates incoherent text.
What's in the package
A from-scratch Swift implementation of the Gemma 4 decoder
(Configuration, Layers, Attention, MLP, RoPE, DecoderLayer)
Per-Layer Embedding (PLE) support — the shared embedding table that
feeds every decoder layer through a gated MLP as a third residual
KV sharing across the back half of the decoder, threaded through the
forward pass via a donor table with a single global rope offset
A custom Gemma4ProportionalRoPE class for the partial-rotation rope
type that initializeRope doesn't currently recognize
A chat-template bypass that builds the prompt as a literal string
with the correct turn markers and encodes via tokenizer.encode(text:),
matching Python mlx-lm's apply_chat_template byte-for-byte
Measured on iPhone (A-series, 7.4 GB RAM)
Model: mlx-community/gemma-4-e2b-it-4bit
Warm load: ~6 s
Memory after load: 341–392 MB
Time to first token (end-to-end, 333-token system prompt): 2.82 s
Generation throughput: 12–14 tok/s
What I'd love feedback on
Is the sidecar registration pattern the right way to extend
mlx-swift-lm with new model families, or is there a more idiomatic
path I missed?
The chat-template bypass works but feels like a workaround. Is the
right long-term fix in swift-jinja, in the tokenizer, or somewhere
else entirely?
Anyone running into the same PLE / KV-sharing issues on other
Gemma-family checkpoints? I'd like to make sure the implementation
generalizes beyond E2B before tagging a 0.2.0.
Happy to open a PR against mlx-swift-lm if the maintainers think any
of this belongs upstream. Thanks for reading.
Topic:
Machine Learning & AI
SubTopic:
Core ML
0
0
22