Spatial–Temporal Joint Network for point cloud completion

Authors: An, L., Zhou, P., Zhou, M., Wang, Y., Geng, G., Shui, W., Tang, W.

Journal: Displays

Publication Date: 01/01/2026

Volume: 91

ISSN: 0141-9382

DOI: 10.1016/j.displa.2025.103245

Abstract:

Point cloud completion technology plays a vital role in three-dimensional interactive display systems such as virtual reality (VR) and augmented reality (AR), especially in maintaining visual integrity and interaction accuracy in complex environments. However, the significant differences in data generated by different sensors challenge the generalization and performance of algorithms. This paper proposes a Spatial–Temporal Joint Network (STJN) method to enhance and repair incomplete point cloud data caused by limitations of acquisition equipment or environment, improving the generalization and performance across different datasets. We introduce a multi-composite position encoding method, combining local position encoding, local angle encoding, and local feature information, allowing each point to express various angular relationships more fully. This effectively captures the geometric information of point clouds, enhancing the model's perception of local structures and angular relationships. Additionally, we employ a dual-branch adaptive Mamba network in the encoding part. Through adaptive local feature information modules and Mamba global feature information modules, local–local and local–global combined learning is performed to fully extract point cloud features. In the decoding part, we use the Spatial–Temporal Joint Network, alternating the dual-branch adaptive Mamba network with the Neighborhood Cross-Transformer, to further achieve the interaction between local and global point cloud information. Our method is validated on multiple datasets, including the CAD synthetic datasets PCN, Completion3D, ShapeNet-55/34, the real camera multi-view dataset MVP, and the LiDAR-acquired dataset KITTI. Experimental results demonstrate that the proposed method achieves strong generalization and superior completion performance across different types of datasets, highlighting its broad application potential in 3D display and human–computer interaction systems.

Source: Scopus