Improving performance of LLaMPPL models
If your LLaMPPL model is running slowly, consider exploiting the following features to improve performance:
- Auto-Batching — to run multiple particles concurrently, with batched LLM calls
- Caching - to cache key and value vectors for long prompts
- Immutability hinting - to significantly speed up the bookkeeping performed by SMC inference