manycore-research
SpatialLM1.1-Qwen-0.5B
SpatialLM1.1-Llama-1B
SpatialLM-Llama-1B
SpatialLM is a 3D large language model designed to process 3D point cloud data and generate structured 3D scene understanding outputs. These outputs include architectural elements like walls, doors, windows, and oriented object bounding boxes with their semantic categories. Unlike previous methods that require specialized equipment for data collection, SpatialLM can handle point clouds from diverse sources such as monocular video sequences, RGBD images, and LiDAR sensors. This multimodal architecture effectively bridges the gap between unstructured 3D geometric data and structured 3D representations, offering high-level semantic understanding. It enhances spatial reasoning capabilities for applications in embodied robotics, autonomous navigation, and other complex 3D scene analysis tasks. SpatialLM reconstructs 3D layout from a monocular RGB video with MASt3R-SLAM. Results aligned to video with GT cameras for visualization. | Model | Download | | :-----------------: | ------------------------------------------------------------------------------ | | SpatialLM-Llama-1B | ๐ค HuggingFace | | SpatialLM-Qwen-0.5B | ๐ค HuggingFace | In the current version of SpatialLM, input point clouds are considered axis-aligned where the z-axis is the up axis. This orientation is crucial for maintaining consistency in spatial understanding and scene interpretation across different datasets and applications. Example preprocessed point clouds, reconstructed from RGB videos using MASt3R-SLAM, are available in SpatialLM-Testset. Use `rerun` to visualize the point cloud and the predicted structured 3D layout output: To evaluate the performance of SpatialLM, we provide `eval.py` script that reports the benchmark results on the SpatialLM-Testset in the table below in section Benchmark Results. We provide a test set of 107 preprocessed point clouds, reconstructed from RGB videos using MASt3R-SLAM. SpatialLM-Testset is quite challenging compared to prior clean RGBD scans datasets due to the noises and occlusions in the point clouds reconstructed from monocular RGB videos. | Dataset | Download | | :---------------: | ---------------------------------------------------------------------------------- | | SpatialLM-Testset | ๐ค Datasets | Benchmark results on the challenging SpatialLM-Testset are reported in the following table: | Method | SpatialLM-Llama-1B | SpatialLM-Qwen-0.5B | | ---------------- | ---------------------- | ----------------------- | | Floorplan | mean IoU | | | wall | 78.62 | 74.81 | | | | | | Objects | F1 @.25 IoU (3D) | | | curtain | 27.35 | 28.59 | | nightstand | 57.47 | 54.39 | | chandelier | 38.92 | 40.12 | | wardrobe | 23.33 | 30.60 | | bed | 95.24 | 93.75 | | sofa | 65.50 | 66.15 | | chair | 21.26 | 14.94 | | cabinet | 8.47 | 8.44 | | dining table | 54.26 | 56.10 | | plants | 20.68 | 26.46 | | tv cabinet | 33.33 | 10.26 | | coffee table | 50.00 | 55.56 | | side table | 7.60 | 2.17 | | air conditioner | 20.00 | 13.04 | | dresser | 46.67 | 23.53 | | | | | | Thin Objects | F1 @.25 IoU (2D) | | | painting | 50.04 | 53.81 | | carpet | 31.76 | 45.31 | | tv | 67.31 | 52.29 | | door | 50.35 | 42.15 | | window | 45.4 | 45.9 | SpatialLM-Llama-1B is derived from Llama3.2-1B-Instruct, which is licensed under the Llama3.2 license. SpatialLM-Qwen-0.5B is derived from the Qwen-2.5 series, originally licensed under the Apache 2.0 License. All models are built upon the SceneScript point cloud encoder, licensed under the CC-BY-NC-4.0 License. TorchSparse, utilized in this project, is licensed under the MIT License. If you find this work useful, please consider citing: We would like to thank the following projects that made this work possible: Llama3.2 | Qwen2.5 | Transformers | SceneScript | TorchSparse
FLUX.1 Wireframe Dev Lora
`FLUX.1 Wireframe [dev] LoRA` is an improved version of FLUX.1-Layout-ControlNet, which serves as a key component of the SpatialGen. `FLUX.1 Wireframe [dev] LoRA` is a LoRA for [FLUX.1 [dev]](https://huggingface.co/black-forest-labs/FLUX.1-dev), capable of generating an image based on a text description while following the structure of the given wireframe image. ๐ Paper: SPATIALGEN: Layout-guided 3D Indoor Scene Generation ๐ Project Page ๐ป Code Repository FLUX.1-Wireframe-dev-lora is licensed under the FLUX.1-dev Non-Commercial License.
FLUX.1-Panorama-dev-lora
`FLUX.1 Panorama [dev] LoRA` is a LoRA model for [FLUX.1 [dev]](https://huggingface.co/black-forest-labs/FLUX.1-dev), designed to generate high-quality panoramic images from textual description, with a focus on realistic interior design scenes. The model falls under the FLUX.1-dev Non-Commercial License.
FLUX.1 Panorama Kontext Dev Lora
`FLUX.1 Panorama Kontext [dev] LoRA` is a LoRA model for [FLUX.1 Kontext [dev]](https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev), designed to generate high-quality panoramic images from perspective images, with a focus on realistic interior design scenes. The model falls under the FLUX.1-dev Non-Commercial License.
SpatialLM-Qwen-0.5B
SpatialGen-1.0
SpatialGen: Layout-guided 3D Indoor Scene Generation | Image-to-Scene Results | Text-to-Scene Results | | :--------------------------------------: | :----------------------------------------: | | | | TL;DR: Given a 3D semantic layout, SpatialGen can generate a 3D indoor scene conditioned on either a reference image (left) or a textual description (right) using a multi-view, multi-modal diffusion model. - [Sep, 2025] We released the paper of SpatialGen! - [Aug, 2025] Initial release of SpatialGen-1.0! - [x] Provide inference code of SpatialGen. - [ ] Provide training instruction for SpatialGen. - [ ] Release SpatialGen dataset. | Model | Download | | :-----------------------: | -------------------------------------------------------------------------------------| | SpatialGen-1.0 | ๐ค HuggingFace | | FLUX.1-Wireframe-dev-lora | ๐ค HuggingFace | Tested with the following environment: Python 3.10 PyTorch 2.3.1 CUDA Version 12.1 We provide SpatialGen-Testset with 48 rooms, which labeled with 3D layout and 4.8K rendered images (48 x 100 views, including RGB, normal, depth maps and semantic maps) for MVD inference. SpatialGen-1.0 is derived from Stable-Diffusion-v2.1, which is licensed under the CreativeML Open RAIL++-M License. We would like to thank the following projects that made this work possible:
FLUX.1 Layout ControlNet
`FLUX.1-Layout-ControlNet` is a key component of the SpatialGen. `FLUX.1 Layout Controlnet [dev]` is a ControlNet conditioned by a semantic image. It is capable of generating a 2D image based on a text description while following the semantic image layout. This ControlNet is applicable to [FLUX.1 [dev]](https://huggingface.co/black-forest-labs/FLUX.1-dev) and is leveraged within the SpatialGen framework for 3D scene synthesis. ๐ Paper: SPATIALGEN: Layout-guided 3D Indoor Scene Generation ๐ Project Page ๐ป Code Repository This serene Chinese-inspired bedroom with a traditional bed frame featuring intricate carvings. Use soft, muted colors for the bedding and walls. Hang delicate, calligraphic artworks above the bed. Place a refined wooden nightstand with a circular mirror beside the bed and a built-in wardrobe with sliding doors. Large windows should be adorned with lightweight, embroidered curtains that allow natural light to gently enter the room on the wooden board. This elegant European-style bedroom features a luxurious bed with a tufted headboard, adorned with fine linens. Incorporate classic abstract paintings above the bed and a sophisticated floating nightstand with a decorative mirror. Include a built-in wardrobe with ornate details and large windows draped with sheer, flowing curtains that let in soft, diffused light. This whimsical cartoon bedroom with a playful bed that has a fun, tufted headboard. The bedding should feature bright, cheerful patterns. Hang colorful, abstract cartoons on the walls above the bed. Include a quirky, floating nightstand with a round, animated mirror and a built-in wardrobe with open shelves filled with toys. Large windows with sheer, patterned curtains should let in plenty of light, creating a lively atmosphere. This futuristic cyberpunk bedroom with a sleek, high-tech bed featuring a digital display headboard. The bedding should have a metallic sheen with neon accents. Install holographic abstract art above the bed and a floating nightstand with a reflective, circular mirror. Add a built-in wardrobe with illuminated shelves and large windows with smart, translucent curtains that filter the city lights. The room should exude a cool, tech-savvy vibe. To use FLUX.1-Layout-Controlnet with the ๐งจ diffusers python library, first install or upgrade `diffusers`, `peft`. Then you can use the `FluxControlNetPipeline` to run it: FLUX.1-Layout-ControlNet is licensed under the FLUX.1-dev Non-Commercial License.