MLX - Day 4: Evaluate LLM Models from CLI with MLX on macOS

April 28, 2024 • 4 min read • #swift, #ios, #mlx, #llm

In the previous article, we have built and run the MLX sample app on macOS, iOS and visionOS to evaluate LLM models locally.

For advanced users, it’s much faster to evaluate the LLM models from the command line interface.

The official mlx-swift-examples repository provides a sample CLI tool called llm-tool to evaluate LLM models from the shell terminal.

These are the steps to build and run the llm-tool

Step 1: Build

Due to special metal features needed for MLX, we need to use xcodebuild to build llm-tool instead of swift build

In the root folder of mlx-swift-examples repository, run the following command:

xcodebuild build -scheme llm-tool -destination 'platform=OS X' -skipPackagePluginValidation

After the build succeeds, the binary executable will be generated at the path Build/Products/Release/llm-tool in the default DerivedData folder

Step 2: Evaluate using the default model

Evaluate the prompt using the default Mistral-7B model. It will take some time when you run the command for the first time as the model needs to be downloaded from the internet first.

After the model is downloaded and the prompt is evaluated, you will be able to see the response directly in the terminal.

~ ./mlx-run llm-tool --prompt " what is swift programming language"
Model loaded -> id("mlx-community/Mistral-7B-v0.1-hf-4bit-mlx")
Starting generation ...
 what is swift programming language?

# Swift is a programming language for iOS, iPadOS, watchOS, tvOS, and macOS apps.

It is free and open-source, compiled at build time and run-time optimized by the Swift compiler.

Swift is designed to work with Apple’s Cocoa and Cocoa Touch frameworks and the large body of existing Objective-C code, but avoids many of the complexities of C, C++
------
Prompt Tokens per second:     31.151544
Generation tokens per second: 50.391269

Step 3: Evaluate using a custom model

The llm-tool provides some customisable parameters so that you can customise how to evaluate your prompt, e.g: running against a custom model.

In the following command, we will evaluate the same prompt "what is swift programming language" but using the Phi-2 model

~ ./mlx-run llm-tool --model mlx-community/phi-2-hf-4bit-mlx --prompt "what is swift programming language"
Model loaded -> id("mlx-community/phi-2-hf-4bit-mlx")
Starting generation ...
what is swift programming language
Swift is a general-purpose programming language developed by Apple for iOS, macOS, watchOS, and tvOS development. It is designed to be fast, safe, and intuitive, with a focus on simplicity and productivity. Swift is widely used for developing iOS and macOS applications, as well as for web development and software engineering. It is known for its strong type system, automatic memory management, and garbage collection. Swift supports both Objective-C and C-like syntax, making it easy for
------
Prompt Tokens per second:     94.887315
Generation tokens per second: 81.189934

Currently, the mlx-community has provided over 390 models on huggingface. With the llm-tool CLI, you can easily evaluate your prompts against those models without much overhead of building GUI apps.