Empower dynamic scene understanding through scene flow estimation and object segmentation

Authors: Li, Z.

Conference: Bournemouth University, Faculty of Media and Communication

Publication Date: 2025

Abstract:

Understanding dynamic 3D scenes—critical for applications like autonomous navigation and mixed reality—requires pars- ing both motion (scene flow) and object interactions (segmen- tation). Scene flow captures 3D motion fields, while segmen- tation isolates objects, enabling systems to interpret evolving environments. Integrating these tasks offers a holistic view but faces computational challenges due to scene flow’s high dimensionality.

This work proposes a lightweight deep learning architecture combining an enhanced Point Transformer for efficient fea- ture extraction and a point-voxel correlation module for sta- ble motion estimation.

To bypass labor-intensive object annotations, scene flow is leveraged as auxiliary supervision. Instead of predicting masks for all points, this thesis focuses on key points, reducing com- plexity while maintaining accuracy. The proposed clustering- free approach achieves state-of-the-art results on indoor datasets.

For temporal consistency, an unsupervised method integrates continuous point cloud sequences (encoding spatial embed- dings) with time-independent queries (encoding object se- mantics). This enables gradual mask prediction across frames without direct labels, accommodating dynamic inputs. This framework advances dynamic scene understanding by harmo- nizing motion and segmentation, validated through competi- tive benchmarks and flexible input handling.

https://eprints.bournemouth.ac.uk/41016/

Source: Manual