Skip to content

Improving performance of LLaMPPL models

If your LLaMPPL model is running slowly, consider exploiting the following features to improve performance:

  • Auto-Batching — to run multiple particles concurrently, with batched LLM calls
  • Caching - to cache key and value vectors for long prompts
  • Immutability hinting - to significantly speed up the bookkeeping performed by SMC inference