thuml

16 models • 1 total models in database

Sort by:

sundial-base-128m

This model specializes in time series forecasting, utilizing 128 million parameters. It is trained on diverse datasets, including UTSD and Salesforce's large datasets, to provide accurate predictions for various time-dependent data.

license:apache-2.0

1,188,486

rt1-world-model-single-step-rlvr

See https://github.com/thuml/RLVR-World for examples for using this model.

NaNK

llama

rt1-world-model-multi-step-base

llama

Thoth-30B-A3B

NaNK

license:apache-2.0

rt1-world-model-multi-step-rlvr

See https://github.com/thuml/RLVR-World for examples for using this model.

NaNK

llama

rt1-world-model-single-step-base

See https://github.com/thuml/RLVR-World for examples for using this model.

llama

bytesized32-world-model-rlvr-task-specific-reward

See https://github.com/thuml/RLVR-World for examples for using this model. ``` @article{wu2025rlvr, title={RLVR-World: Training World Models with Reinforcement Learning}, author={Jialong Wu and Shaofeng Yin and Ningya Feng and Mingsheng Long}, journal={arXiv preprint arXiv:2505.13934}, year={2025}, }

NaNK

license:mit