Building Junto: On-Device ML on iOS with MLX-Swift

The Problem with Finance Apps

Most personal finance apps fall into two categories: they're either glorified spreadsheets, or they send all your data to a server and call it "AI-powered." Neither felt right to me.

Financial data is among the most sensitive data people have. I wanted to build something that actually reasons about your spending behavior — but without your data ever leaving your device.

That's what Junto is: a finance app where the AI runs entirely on your iPhone.

Why MLX-Swift?

When Apple released MLX-Swift, it opened something genuinely new: running fine-tuned language models on-device with Apple Silicon efficiency. Not just classification models — actual LLMs capable of reasoning in natural language.

The pitch for Junto was simple: analyze your transactions, answer questions about your finances, and generate personalized insights (change habits and behaviors). All locally. No API calls, no server, no data leaving your phone.

The Architecture

The stack ended up being more layered than I initially planned:

SwiftUI + SwiftData for the interface and local persistence
MLX-Swift for on-device LLM inference
QLoRA fine-tuning on a base Qwen model, specialized for Brazilian personal finance reasoning
RAG pipeline (VectorStore + EmbeddingService) to give the model context about the user's actual transactions and goals
Pluggy API for Open Finance bank syncing — connecting to 300+ Brazilian banks automatically
Gemini and Claude as cloud fallback for premium features
Supabase for auth and subscription tier management

Fine-Tuning: What Actually Happened

I trained a QLoRA adapter on top of Qwen using MLX's training pipeline. The goal was to make the model understand Brazilian financial terminology, transaction categories, and give advice grounded in real user data rather than generic financial tips.

The adapter training worked. The problem came at inference time.

The base model I started with — Qwen3 4B at 4-bit quantization — was around 2-3GB. Even quantized, it created serious memory pressure on most iPhones. Loading times were slow, the memory warnings were constant, and on older devices it simply wasn't viable.

The fix: migrate to a 1.5B parameter model. Smaller, faster, still capable enough for the classification and chat tasks Junto needs. The LoRA adapter needed to be retrained for the new architecture — not ideal, but the right tradeoff.

The RAG Layer

One thing that made a real difference in response quality: giving the model access to the user's actual data at inference time.

I built a local RAG pipeline from scratch:

Transactions, goals, and chat summaries get embedded and stored in a local VectorStore (SwiftData)
At query time, the top-K most relevant items are retrieved and injected into the model's context
The embedding model runs on-device via Core ML, with a hash-based fallback

This means when you ask "why did I overspend in January?", the model actually has your January transactions in context — not just a generic prompt.

What's Still Hard

On-device ML on iOS is genuinely difficult in ways that aren't obvious from the outside:

Model size vs. device capability is a constant negotiation. What runs fine on an iPhone 15 Pro may crash on an iPhone 13.

The simulator is useless for ML. MLX requires Metal, which doesn't run in the iOS simulator. Every inference test requires a physical device — I used my own iPhone for this.

Memory management is brutal. SwiftUI's memory model and MLX's memory requirements don't always cooperate. Auto-unloading the model on memory warnings, cache limits, and graceful degradation all had to be built manually.

Current Status

Junto is in final development before App Store beta. The core features — transaction tracking, AI chat, goal management, and the Marketplace (AI-generated financial reports) — are all working. Backend auth via Supabase is integrated. The model migration to 1.5B is in progress.

Despite the promising state of things, I don't think I've reached the performance level I want before launching this app — so for the next few months I'll be deep in this and other projects.

If you want early access or want to talk ideas that could add value, reach out.