MLX: On-device machine learning on Apple Silicon

April 22, 2024 • 4 min read • #swift, #macos, #mlx

The Rise of On-Device Machine Learning

In recent years, there has been a growing trend towards on-device machine learning, which involves running machine learning models directly on devices such as smartphones, tablets, and laptops. This approach offers several advantages over cloud-based machine learning, including improved privacy, reduced latency, and the ability to use machine learning even when an internet connection is unavailable.

Both Apple and Google are pushing hard to bring their machine learning capabilities to their mobile platforms such as iOS or Android.

With the recent trends of open source Large Language Models such as LLama by Meta, and tools such as llama.cpp or ollama, people can now easily download and run such LLM models on consumer-graded computers easily.

Apple Silicon: A Game-Changer for On-Device Machine Learning

One of the key enablers of on-device machine learning is the development of powerful and efficient processors. Apple has been at the forefront of this trend with its Apple Silicon chips, which are specifically designed for the company’s devices.

Apple Silicon chips, such as the M1, M2 and M3 series, offer impressive performance and energy efficiency. These chips are built using advanced manufacturing processes and feature a unified memory architecture, which allows for fast and efficient data transfer between the CPU, GPU, and other components. You can learn more about the unified memory architecture here and its advantages here

The combination of high performance and energy efficiency makes Apple Silicon chips particularly well-suited for on-device machine learning. With these chips, developers can create machine learning models that run quickly and smoothly on Apple devices, without draining the battery or generating excessive heat.

Introducing the MLX by Apple

Veteran machine learning practitioners are familiar with popular open source machine learning framework such as Tensorflow by Google or Pytorch by Meta. For iOS developers, it’s a challenge to use those frameworks since their APIs are mostly in Python.

Thankfully, the team at Apple has recently introduced a new machine learning library call MLX, which is a big step to enable ML researchers to experiment using Swift with the following features:

A comprehensive Swift API for MLX core
Higher level neural network and optimizers packages
An example of text generation with Mistral 7B
An example of MNIST training
A C API to MLX which acts as the bridge between Swift and the C++ core

4-bit Llama 3 70B + latest MLX generating some nice text at 15 toks-per-sec on M2 Ultra.

(should run fine on a 64GB machine, 4-bit weights are 37GB) pic.twitter.com/vGbrjnpk9A
— Awni Hannun (@awnihannun) April 20, 2024

In fact, people can run offline machine learning models on visionOS and iOS platforms already.

MLX by Tutorials

As someone who is a machine learning enthusiast and uses Swift daily, I’m very excited about the MLX library. I firmly believe that the future of AI will be local, on-device, and this capability will unfold a new way of working for developers.

This post is the starting point for my learning journey with the MLX library. Along the way, I’ll publish many tutorials on how to use MLX and MLX Swift. I’ll also build my native Swift app, Local-first AI Buddy app: Llamica, which will support different ways to run LLM models locally and will be available on all Apple platforms.

PS: In the mean time, I have published few articles for this serie: