Real2Sim Asset Generation
DISCOVERSE integrates real-world acquired data, 3D AIGC, and existing 3D assets (including formats such as 3DGS (.ply), meshes (.obj/.stl), and MJCF physics models (.xml)) in a unified manner, supporting their use as interactive scene nodes (objects and robots) or background nodes (scenes). We adopt 3DGS as the unified visual representation and integrate laser scanning, state-of-the-art generative models, and physics-based relighting to enhance the geometric and appearance fidelity of reconstructed radiance fields.
Installation Instructions
This project has been tested on Ubuntu 18.04+.
Setting up the Python environment for DiffusionLight (Step 3) and Mesh2GS (Step 5):
conda create -n mesh2gs python=3.10
conda activate mesh2gs
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 # replace your cuda version
Please manually install other dependencies described in requirements.txt
.
To set up the Python environment for TRELLIS (Step 1), we recommend following the official guide to create a new, separate environment to avoid conflicts.
Additionally, please install Blender (recommended version: 3.1.2) for Step 4. We strongly recommend running the related scripts (blender_renderer/glb_render.py
and blender_renderer/obj_render.py
) in the Scripting
panel of the Blender executable. We do not recommend using the Blender Python API (bpy) due to potential version mismatch issues.
Step 1: Image-to-3D Generation using TRELLIS
Generate high-quality textured meshes from a single RGB image for use as object-level interactive scene nodes.
First, capture an RGB image of the target object. The object should be centered in the image and should not be too small (covering more than 50% of pixels). Note that the object does not need to be photographed in the simulation scene; we just need to keep the background as clean as possible (for instance segmentation) and ensure the ambient lighting is white, uniform, and bright.
Then, use state-of-the-art image-to-3D generation methods to reconstruct textured meshes from the captured RGB images.
-
TRELLIS is the latest, open-source, state-of-the-art 3D generative model that can generate high-quality textured meshes, 3DGS, or radiance fields. We recommend setting up a new environment for TRELLIS and following the official guide to run the image-to-3D generation pipeline. It's recommended to save textured meshes in
.glb
format for compatibility with subsequent lighting estimation, Blender relighting, and Mesh2GS steps. Note: For quick setup, if you do not need to align object appearance with the background, you can directly generate 3DGS (.ply
) assets for DISCOVERSE and skip steps 3~5. -
For higher quality 3D generation results, we recommend using commercial software such as Deemos Rodin (CLAY), Meshy, TRIPO, etc. They all offer free trials.
Step 2: 3D Scene Reconstruction
Reconstruct background nodes as 3DGS scenes using scanners or multi-view RGB images.
We recommend using the LixelKity K1 scanner and Lixel CyberColor to generate high-quality 3DGS scenes for use as background nodes. Without a scanner, you can also use the original 3DGS for scene reconstruction.
Step 3: Lighting Estimation using DiffusionLight
Estimate HDR environment lighting maps from a single RGB image to prepare for Step 4 (aligning object appearance with background nodes).
Note: If you do not need to align object appearance with the background, you can directly download any .exr
format HDR environment map from PolyHeaven, skip the following process, and proceed directly to Step 4.
Pre-trained Weights for Huggingface Models
First, prepare input images. Capture an RGB image for each target background and resize the image to 1024x1024. For this, we recommend cropping the image to include as much background information as possible. Alternatively, you can achieve this dimension by padding black borders around the image.
Place all processed images in a folder and set the absolute path of that folder as YourInputPath
, while specifying YourOutputPath
as the folder to save results. Then run the following commands:
cd DiffusionLight
python inpaint.py --dataset YourInputPath --output_dir YourOutputPath
python ball2envmap.py --ball_dir YourOutputPath/square --envmap_dir YourOutputPath/envmap
python exposure2hdr.py --input_dir YourOutputPath/envmap --output_dir YourOutputPath/hdr
The final .exr
results (saved in YourOutputPath/hdr/
) will be used for subsequent Blender PBR rendering.
Step 4: Physics-Based Relighting using Blender
Render target object meshes into multi-view images by uniformly sampling cameras on a sphere and using Blender (bpy) combined with custom HDR environment maps (simulating distant lighting effects) for (pre-)physics-based relighting, for use in 3DGS optimization.
Please note that this is not true PBR functionality; it simply bakes lighting effects into the SH appearance of 3DGS to simulate the color tone of background scenes.
Prepare .exr
HDR Environment Maps
Organize all HDR images to be used for (pre-)PBR into one folder, for example:
YourHDRPath
├── hdr_name_0.exr
├── hdr_name_1.exr
├── hdr_name_2.exr
...
└── hdr_name_n.exr
Render 3D Mesh Assets
For .glb
Assets (e.g., Objaverse / Rodin Assets)
We strongly recommend using 3D mesh assets in .glb
format similar to objaverse. All .glb
assets to be converted should be placed in the same folder, for example:
YourInputPath
├── model_or_part_name_0.glb
├── model_or_part_name_1.glb
├── model_or_part_name_2.glb
...
└── model_or_part_name_n.glb
Then, paste and run the blender_renderer/glb_render.py
script in the Scripting
panel of the Blender executable with the following parameters:
--root_in_path YourInputPath
--root_hdr_path YourHDRPath
--root_out_path YourOutputPath
The rendering results will be saved in YourOutputPath
, where each folder (named {hdr_name_i}_{model_or_part_name_i}
) corresponds to the rendering results of a 3D model under certain lighting conditions, containing RGB images, depth maps, camera parameters, and .obj
geometry files of that model.
If the rendering quality is unsatisfactory, you can optimize by adjusting the following parameters:
lit_strength
: Environment lighting intensity; higher values result in brighter rendering.lens
: Camera focal length. If the object is too small in the rendering (too many pixels are wasted in the image), try increasing this value. Conversely, if only part of the object is shown in the rendering, try decreasing this value.
For .obj
Assets (e.g., Robot Models)
If you're working with .obj
format assets, such as robot models, each model typically contains multiple textures and material maps. It's recommended to organize each model's data into separate folders as follows:
YourInputPath
├── model_or_part_name_0
│ └── ...
├── model_or_part_name_1
│ ├── obj_name_1.obj
│ ├── mtl_name_1.mtl
│ ├── tex_name_1.png
│ └── ...
├── model_or_part_name_2
...
└── model_or_part_name_n
Robot models developed by DISCOVER LAB, including MMK2, AirBot, DJI, RM2, etc., can be obtained through this link (extraction code: 94po).
Then, paste and run the blender_renderer/obj_render.py
script in the Scripting
panel of the Blender executable with the following parameters:
--root_in_path YourInputPath
--root_hdr_path YourHDRPath
--root_out_path YourOutputPath
This script uses the same parameters as blender_renderer/glb_render.py
.
Convert the camera parameters generated by Blender rendering to the format required by COLMAP by running:
cd blender_renderer
python models2colmap.py --root_path YourOutputPath
Ensure that when running obj_render.py
/ glb_render.py
and models2colmap.py
, the camera intrinsics (i.e., --resolution
, --lens
, --sensor_size
) remain strictly consistent.
Step 5: Mesh2GS
Convert textured meshes to 3DGS.
Run Mesh2GS for each 3D asset individually:
cd LitMesh2GS
python train.py -s YourOutputPath/model_or_part_name_i -m YourOutputPath/model_or_part_name_i/mesh2gs --data_device cuda --densify_grad_threshold 0.0002 -r 1
The 3DGS results converted from each 3D asset will be saved in a new folder mesh2gs
under YourOutputPath/model_or_part_name_i
.
Since 3DGS is inherently memory-inefficient, we recommend roughly controlling the number of generated 3DGS points by specifying --densification_interval
. The larger this value, the sparser the generated 3DGS scene and the less memory required.