TCFAP-Net: Transformer-based Cross-feature Fusion and Adaptive Perception Network for large-scale point cloud semantic segmentation
Authors: Zhang, J., Jiang, Z., Qiu, Q. and Liu, Z.
Journal: Pattern Recognition
Volume: 154
ISSN: 0031-3203
DOI: 10.1016/j.patcog.2024.110630
Abstract:Point cloud semantic segmentation is an ingredient in understanding real-world scenes. Most existing approaches perform poorly on scene boundaries and struggle with recognizing objects of different scales. In this paper, we propose a novel framework that incorporates Transformer into the U-Net architecture for inferring pointwise semantics. Specifically, the Transformer-based cross-feature fusion module is designed first to employ geometric and semantic information to learn feature offsets to overcome the border ambiguity of segmentation results, and then it utilizes the Transformer to learn cross-feature enhanced and fused encoder features. Additionally, to facilitate the overall network's structure-to-detail perception capabilities, the adaptive perception module is designed, which employs cross-attention to adaptively allocate weights to encoder features at varying resolutions, establishing long-range contextual dependencies. Ablation studies validate the individual contributions of our module design choices. Compared with the existing competitive methods, our approach achieves state-of-the-art performance and exhibits superior results on benchmarks. Code is available at https://github.com/xiluo-cug/TCFAP-Net.
Source: Scopus