happyme531

20 models • 2 total models in database

Sort by:

Qwen2.5-VL-3B-Instruct-RKLLM

- 推理速度(RK3588): 视觉编码器 3.4s(三核并行) + LLM 填充 2.3s (320 tokens / 138 tps) + 解码 8.2 tps - 内存占用(RK3588, 上下文长度1024): 6.1GB 2. 开发板的RKNPU2内核驱动版本必须>=0.9.6才能运行这么大的模型. 使用root权限运行以下命令检查驱动版本: rkllm-toolkit需要在这里手动下载: https://github.com/airockchip/rknn-llm/tree/main/rkllm-toolkit 2. 下载此仓库到本地, 但不需要下载`.rkllm`和`.rknn`结尾的模型文件. 3. 下载Qwen2.5-VL-3B-Instruct的huggingface模型仓库到本地. ( https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct ) 将`rkllm-convert.py`拷贝到Qwen2.5-VL-3B-Instruct的模型文件夹中，执行: 将`exportvisiononnx.py`拷贝到Qwen2.5-VL-3B-Instruct的模型文件夹根目录中，然后在该根目录下执行: 视觉编码器会导出到`vision/visionencoder.onnx`. 默认宽高为476，你可以自行通过`--height`和`--width`参数修改。从 https://github.com/happyme531/rknn-toolkit2-utils 下载`splitmatmulonnxprofile.py`, 之后运行: - 由于RKLLM的多模态输入的限制, 在整个对话中只能加载一张图片. - 没有实现多轮对话. - RKLLM的w8a8量化貌似存在不小的精度损失. - 可能由于RKNPU2的访存模式问题，输入尺寸边长不为64的整数倍时模型运行速度会有奇怪的明显提升。 Run the powerful Qwen2.5-VL-3B-Instruct-RKLLM vision large model on RK3588! - Inference Speed (RK3588): Vision Encoder 3.4s (3-core parallel) + LLM Prefill 2.3s (320 tokens / 138 tps) + Decode 8.2 tps - Memory Usage (RK3588, context length 1024): 6.1GB 1. Clone or download this repository locally. The model is large, so ensure you have enough disk space. 2. The RKNPU2 kernel driver version on your board must be `>=0.9.6` to run such a large model. Run the following command with root privileges to check the driver version: If the version is too old, please update the driver. You may need to update your kernel or consult the official documentation for help. Parameter Descriptions: - `512`: `maxnewtokens`, the maximum number of tokens to generate. - `1024`: `maxcontextlen`, the maximum context length. - `3`: `npucorenum`, the number of NPU cores to use. If the performance is not ideal, you can adjust the CPU scheduler to keep the CPU running at its highest frequency and bind the inference program to the big cores (`taskset -c 4-7 python ...`). The example output is shown in the Chinese section above. rkllm-toolkit needs to be downloaded manually from here: https://github.com/airockchip/rknn-llm/tree/main/rkllm-toolkit 2. Download this repository locally, but you don't need the model files ending with `.rkllm` and `.rknn`. 3. Download the Qwen2.5-VL-3B-Instruct huggingface model repository locally from: https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct Copy `rkllm-convert.py` into the Qwen2.5-VL-3B-Instruct model folder and execute: It uses w8a8 quantization by default. You can open the script to modify the quantization method and other settings. Copy `exportvisiononnx.py` to the root directory of the Qwen2.5-VL-3B-Instruct model folder, then execute the following in the root directory: The vision encoder will be exported to `vision/visionencoder.onnx`. The default height and width are 476, which you can modify using the `--height` and `--width` parameters. Download `splitmatmulonnxprofile.py` from https://github.com/happyme531/rknn-toolkit2-utils, then run: The optimized model will be saved as `visionencoderopt.onnx`. The converted model will be saved as `visionencoderopt.rknn`. To match the command in the "How to Use" section, you can rename it: - Due to limitations in RKLLM's multimodal input, only one image can be loaded per conversation. - Multi-turn conversation is not implemented. - The w8a8 quantization in RKLLM seems to cause a non-trivial loss of precision. - Possibly due to memory access patterns of the RKNPU2, weirdly the model runs faster when the input image dimensions are not multiples of 64.

NaNK

—

105

Qwen3 Embedding RKLLM

This model works fine, you can download and try to run it, but I'm currently too lazy to write a README. Haven't tested about the precision yet.

NaNK

license:agpl-3.0

InternVL3_5-2B-RKLLM

- 推理速度(RK3588): 视觉编码器 2.1s(三核并行) + LLM 填充 1s (265 tokens / 261 tps) + 解码 12.1 tps - 内存占用(RK3588, 上下文长度1024): 3.9GB 2. 开发板的RKNPU2内核驱动版本必须>=0.9.6才能运行这么大的模型. 使用root权限运行以下命令检查驱动版本: rkllm-toolkit需要在这里手动下载: https://github.com/airockchip/rknn-llm/tree/main/rkllm-toolkit 2. 下载此仓库到本地, 但不需要下载`.rkllm`和`.rknn`结尾的模型文件. 3. 下载InternVL3.5-2B的huggingface模型仓库到本地. ( https://huggingface.co/OpenGVLab/InternVL35-2B-HF ) 将`exportvisiononnx.py`拷贝到InternVL35-2B-HF的模型文件夹根目录中，然后在该根目录下执行: - 由于RKLLM的多模态输入的限制, 在整个对话中只能加载一张图片. - 没有实现多轮对话. - RKLLM的w8a8量化貌似存在不小的精度损失. - 没有实现原模型中的高清图像分块输入与视频输入功能. 原因是我懒得做了，以后可以考虑加上. Run the powerful InternVL3.5-2B large vision model on RK3588! - Inference Speed (RK3588): Vision Encoder 2.1s (3-core parallel) + LLM Prefill 1s (265 tokens / 261 tps) + Decode 12.1 tps - Memory Usage (RK3588, context length 1024): 3.9GB 1. Clone or download this repository locally. The model is large, so ensure you have enough disk space. 2. The RKNPU2 kernel driver version on your development board must be >=0.9.6 to run this model. Run the following command with root privileges to check the driver version: If the version is too low, please update the driver. You may need to update the kernel or refer to the official documentation for help. `rkllm-toolkit` needs to be downloaded manually from here: https://github.com/airockchip/rknn-llm/tree/main/rkllm-toolkit 2. Download this repository locally, but you don't need the `.rkllm` and `.rknn` model files. 3. Download the InternVL3.5-2B huggingface model repository locally. ( https://huggingface.co/OpenGVLab/InternVL35-2B-HF ) Copy `rkllm-convert.py` to the InternVL35-2B-HF model folder and run: The default quantization is w8a8. You can modify the script to change quantization methods. Copy `exportvisiononnx.py` to the root directory of the InternVL35-2B-HF model folder, and then execute it in that root directory: The vision encoder will be exported to `visionencoder.onnx`. - Due to limitations in RKLLM's multimodal input, only one image can be loaded throughout the conversation. - Multi-turn conversation is not implemented. - RKLLM's w8a8 quantization appears to have significant precision loss. - The high-resolution image tiling and video input features from the original model are not implemented. The reason is that I'm too lazy to do it, and it can be considered adding it later.

NaNK

—