About

What is LocalScore?

LocalScore is an open-source benchmarking tool designed to measure how fast Large Language Models (LLMs) run on your specific hardware. It is also a public database for the benchmark results.

Whether you're wondering if your computer can smoothly run an 8 billion parameter model or trying to decide which GPU to buy for your local AI setup, LocalScore provides the data you need to make informed decisions.

The benchmark results are meant to be directly comparable to each other. They also should give a fairly good indication of what real world performance you may see on your hardware. Unfortunately the benchmark suite cannot cover all possible scenarios (speculative decoding, etc), but it should give a rough idea of how well your hardware will perform.

How It Works

LocalScore measures three key performance metrics for local LLM performance.

  1. Prompt Processing Speed:How quickly your system processes input text (tokens per second)
  2. Generation Speed:How fast your system generates new text (tokens per second)
  3. Time to First Token:The latency before the first response appears (milliseconds)

These metrics are combined into a single LocalScore which gives you a straightforward way to compare different hardware configurations.

A score of 1,000 is excellent, 250 is passable, and below 100 will likely be a poor user experience in some regard.

Under the hood, LocalScore leverages Llamafile to ensure portability across different systems, making benchmarking accessible regardless of your setup.

The Tests

LocalScore has a straightforward test suite which is meant to emulate many common LLM tasks. The details of the tests are found in the table below:

PROMPT TOKENSTEXT GENERATIONSAMPLE USE CASES
1024tokens
16tokens
Classification, sentiment analysis, keyword extraction.
4096tokens
256tokens
Long document Q&A, RAG, short summary of extensive text.
2048tokens
256tokens
Article summarization, contextual paragraph generation.
2048tokens
768tokens
Drafting detailed replies, multi-paragraph generation, content sections.
1024tokens
1024tokens
Balanced Q&A, content drafting, code generation based on long sample.
1280tokens
3072tokens
Complex reasoning, chain-of-thought, long-form creative writing, code generation.
384tokens
1152tokens
Prompt expansion, explanation generation, creative writing, code generation.
64tokens
1024tokens
Short prompt creative generation (poetry/story), Q&A, code generation.
16tokens
1536tokens
Creative text writing/storytelling, Q&A, code generation.

Getting Started

  1. Download LocalScore
  2. Run the benchmark and view your results
  3. Optionally submit your results to our public database to help the community

When you submit your results, they become part of our growing database of hardware performance profiles, helping others understand what they can expect from similar setups.

We collect the following non personally identifiable system information:

  • Operating System Info: Name, Version, Release
  • CPU Info: Name, Architecture
  • RAM Info: Capacity
  • GPU Info: Name, Manufacturer, Total Memory

Supported Hardware

LocalScore currently supports:

  • CPUs (x86 and ARM)
  • NVIDIA GPUs
  • AMD GPUs
  • Apple Silicon (M1/M2/etc)

The benchmark currently only supports single-GPU setups, which we believe represents the most practical approach for most users running LLMs locally. Similar to how gaming has shifted predominately to single GPU setups. In the future we may support multi-GPU setups.

Windows Users

Due to limitations with Windows, you can't run Llamafile's which are larger than 4GB directly. Instead, you'll need to use LocalScore as a standalone utility and pass in your models in GGUF format to the benchmarking application.

Community Project

LocalScore is a Mozilla Builders Project. It is a free and accessible resource for the local AI community. It builds upon the excellent work of llama.cpp and Llamafile.

Mozilla Logo

We welcome contributions, suggestions, and feedback from the community. Whether you're interested in improving the benchmarking methodology, adding support for new hardware/models, or enhancing the user experience, your involvement is appreciated.

Join us in creating a transparent, useful resource that helps everyone make the most of running LLMs on local hardware.

You can find the code for the LocalScore CLI on GitHub along with detailed documentation, command-line options, and installation instructions. The code for the LocalScore website can be found in this GitHub repo.