LLM's are the CPU stress testers, all 4 CPUs at 100%, that is bound to cause heat issues.The Pi5 stock speed is 2.4GHz. The Pi5 16GB I am using now runs at 3GHz with passive cooling (no fan) and it never comes close to throttling. You likely could go faster with a fan.
Not sure how much overclocking/extra cooling will help as 7b LLMs are not that fast.
Using a 13b LLM on a Pi5 16GB is bound to be slower, a few tokens per second?
Useful speed LLM's seem to be below 3b or < 2GB of ram.
Some benchmarks here.
https://github.com/b4rtaz/distributed-llama
A 25% boost is not going to make much difference on the big models.
To me a useful home LLM is something that can respond fast enough to speech to text input.
How many tokens/second is useful?
dllama on a few Pi5's would be fun if you have the Pi5's/$$$.
Would maybe cheaper than a PC with GPU.
A Pi5 cluster for a home AI that does video security and LLMs would be just about doable/useful now.
Another year in AI algorithm development should make it very useful I suspect.
The alternative is Jeff Geerling's GPU plus Pi5.
Or the latest Orin SBC? or..
Statistics: Posted by Gavinmc42 — Wed Feb 26, 2025 2:48 am