This is the official PyTorch implementation of the paper "Learning Human Motion Representations: A Unified Perspective".
| Task | Document | | --------------------------------- | ------------------------------------------------------------ | | Pretrain | docs/pretrain.md | | 3D human pose estimation | docs/pose3d.md | | Skeleton-based action recognition | docs/action.md | | Mesh recovery | docs/mesh.md |
Using MotionBERT for human-centric video representations
> Hints > > 1. The model could handle different input lengths (no more than 243 frames). No need to explicitly specify the input length elsewhere. > 2. The model uses 17 body keypoints (H36M format). If you are using other formats, please convert them before feeding to MotionBERT. > 3. Please refer to modelaction.py and modelmesh.py for examples of (easily) adapting MotionBERT to different downstream tasks. > 4. For RGB videos, you need to extract 2D poses (inference.md), convert the keypoint format (datasetwild.py), and then feed to MotionBERT (inferwild.py). >
| Model | Download Link | Config | Performance | | ------------------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ---------------- | | MotionBERT (162MB) | HuggingFace | pretrain/MBpretrain.yaml | - | | MotionBERT-Lite (61MB) | HuggingFace | pretrain/MBlite.yaml | - | | 3D Pose (H36M-SH, scratch) | HuggingFace | pose3d/MBtrainh36m.yaml | 39.2mm (MPJPE) | | 3D Pose (H36M-SH, ft) | HuggingFace | pose3d/MBfth36m.yaml | 37.2mm (MPJPE) | | Action Recognition (x-sub, ft) | HuggingFace | action/MBftNTU60xsub.yaml | 97.2% (Top1 Acc) | | Action Recognition (x-view, ft) | HuggingFace | action/MBftNTU60xview.yaml | 93.0% (Top1 Acc) | | Mesh (with 3DPW, ft) | HuggingFace | mesh/MBftpw3d.yaml | 88.1mm (MPVE) |
In most use cases (especially with finetuning), `MotionBERT-Lite` gives a similar performance with lower computation overhead.
If you find our work useful for your project, please consider citing the paper: