A contextual fusion pyramid network for semantic segmentation of scene point clouds

Authors: Tian, H., Jiang, Z., Zhang, J., Liu, Z.

Journal: Multimedia Systems

Publication Date: 01/02/2026

Volume: 32

Issue: 1

eISSN: 1432-1882

ISSN: 0942-4962

DOI: 10.1007/s00530-025-02118-4

Abstract:

Point cloud semantic segmentation is critical in 3D scene understanding and analysis. The U-Net architecture has been successfully employed in many point cloud semantic segmentation approaches. However, the contextual semantics and correlations within the feature pyramid of U-Net have yet to receive much attention and still need to be thoroughly explored. We propose ConFusion-Net, a context-aware framework that incorporates contextual learning into U-Net to improve the network’s ability for semantic segmentation of scene point clouds. Our network consists of three major modules, including an encoder incorporating bilateral feature aggregation into mixed residual expansion to learn feature representations by compensating across geometric and semantic domains, a contextual fusion module aiming at facilitating the structure-to-detail perception ability of the network by integrating feature information at varying scales, and a context-aware decoder using cross-attention and contextual connections to bridge the semantic gaps between the encoder and decoder. Ablation studies verify the contributions of each module design choice. Extensive experiments conducted on several benchmarks demonstrate that the proposed method achieves state-of-the-art performances and outperforms competing methods.

Source: Scopus