|
|
AIBOX-K3 LLM large model usage
Posted at 3 day before
View38
|
Replies0
Print
Only Author
[Copy Link]
1#
Last edited by 799959745 In 5/26/2026 15:34 Editor
Step 1:
Download the Bianbu firmware and flash it to the AIBOX-K3.
Firmware flashing tutorial link: https://wiki.t-firefly.com/en/AI ... tmode_spacemit.html
Step Two:
2.1 Connect the machine to an HDMI monitor. Open the terminal. Install dependencies. The inference tool llama.cpp-tools-spacemit needs to be installed.
- sudo apt-get update
- # Install the accelerated version of llama.cpp inference tool from Spacemit
- sudo apt install llama.cpp-tools-spacemit
Copy the code 2.2. Downloading the Model
When using llama.cpp, a GGUF format model is required. It is recommended to download the model to the default path ~/.cache/models/llm for easy management. For quick verification, you can download the model from the Spacemit server.
Model source:
Spacemit mirror (recommended): https://archive.spacemit.com/spacemit-ai/model_zoo/llm/
Multiple pre-installed GGUF models (such as Qwen2.5, Qwen3, Deepseek, etc.) can be downloaded directly to the default directory:- mkdir -p ~/.cache/models/llm
- cd ~/.cache/models/llm
- # Example: Download the qwen2.5 0.5b model
- https://archive.spacemit.com/spacemit-ai/model_zoo/llm/qwen2.5-0.5b-instruct-q4_0.gguf
- # Example: Download the qwen2.5 1.5b model
- https://archive.spacemit.com/spacemit-ai/model_zoo/llm/qwen2.5-1.5b-instruct-q4_0.gguf
- # Example: Download the glm-edge 1.5b model
- wget https://archive.spacemit.com/spacemit-ai/model_zoo/llm/glm-edge-1.5b-chat-q4_0.gguf
- # Example: Download the qwen2.5 3b model
- wget https://archive.spacemit.com/spacemit-ai/model_zoo/llm/qwen2.5-3b-instruct-q4_0.gguf
- # Example: Download the qwen3-30B-A3B model
- https://archive.spacemit.com/spacemit-ai/model_zoo/llm/Qwen3-30B-A3B-Q4_0.gguf
Copy the code
Hugging Face: Search for GGUF models on Hugging Face and download the .gguf file to ~/.cache/models/llm or a custom path.
2.3. Testing the Model
Verifying the Dialogue
- llama-cli -m ~/.cache/models/llm/qwen2.5-0.5b-instruct-q4_0.gguf -t 8 -p "Hello, please introduce yourself."
Copy the code
If you want real-time dialogue in the terminal, you can remove the -p parameter. The command is as follows:
- llama-cli -m ~/.cache/models/llm/qwen2.5-0.5b-instruct-q4_0.gguf -t 8
Copy the code
Browser-based web interface dialogue
- llama-server -m ~/.cache/models/llm/qwen2.5-0.5b-instruct-q4_0.gguf -t 8 --port 8080 &
Copy the code
Open the Chromium browser. Enter the URL.
Performance Verification
Performance testing is performed in the SDK root directory using Qwen3-30B-A3B-Instruct-2507-04_0.gguf.
- llama-bench -m ~/.cache/models/llm/Qwen3-30B-A3B-Instruct-2507-04_0.gguf -t 8 -p 128 -n 128 -mmp 0 -fa 1
Copy the code
Note:
Running the 30B large model requires 32GB of RAM. If you encounter insufficient memory, try disabling desktop display to reduce memory usage. Connect to a serial terminal to execute commands.
(The command to close the terminal is generally not needed with a large memory configuration.)
- systemctl stop sddm.service
Copy the code
|
|