SD-Net: joint surgical gesture recognition and skill assessment

Authors: Zhang, J., Nie, Y., Lyu, Y., Yang, X., Chang, J. and Zhang, J.J.

Journal: International Journal of Computer Assisted Radiology and Surgery

Volume: 16

Issue: 10

Pages: 1675-1682

eISSN: 1861-6429

ISSN: 1861-6410

DOI: 10.1007/s11548-021-02495-x

Abstract:

Purpose: Surgical gesture recognition has been an essential task for providing intraoperative context-aware assistance and scheduling clinical resources. However, previous methods present limitations in catching long-range temporal information, and many of them require additional sensors. To address these challenges, we propose a symmetric dilated network, namely SD-Net, to jointly recognize surgical gestures and assess surgical skill levels only using RGB surgical video sequences. Methods: We utilize symmetric 1D temporal dilated convolution layers to hierarchically capture gesture clues under different receptive fields such that features in different time span can be aggregated. In addition, a self-attention network is bridged in the middle to calculate the global frame-to-frame relativity. Results: We evaluate our method on a robotic suturing task from the JIGSAWS dataset. The gesture recognition task largely outperforms the state of the arts on the frame-wise accuracy up to ∼ 6 points and the F1@50 score ∼ 8 points. We also keep the 100% predicted accuracy for the skill assessment task using LOSO validation scheme. Conclusion: The results indicate that our architecture is able to obtain representative surgical video features by extensively considering the spatial, temporal and relational context from raw video input. Furthermore, the better performance in multi-task learning implies that surgical skill assessment has a complementary effects to gesture recognition task.

http://eprints.bournemouth.ac.uk/36142/

Source: Scopus

SD-Net: joint surgical gesture recognition and skill assessment.

Authors: Zhang, J., Nie, Y., Lyu, Y., Yang, X., Chang, J. and Zhang, J.J.

Journal: Int J Comput Assist Radiol Surg

Volume: 16

Issue: 10

Pages: 1675-1682

eISSN: 1861-6429

DOI: 10.1007/s11548-021-02495-x

Abstract:

PURPOSE: Surgical gesture recognition has been an essential task for providing intraoperative context-aware assistance and scheduling clinical resources. However, previous methods present limitations in catching long-range temporal information, and many of them require additional sensors. To address these challenges, we propose a symmetric dilated network, namely SD-Net, to jointly recognize surgical gestures and assess surgical skill levels only using RGB surgical video sequences. METHODS: We utilize symmetric 1D temporal dilated convolution layers to hierarchically capture gesture clues under different receptive fields such that features in different time span can be aggregated. In addition, a self-attention network is bridged in the middle to calculate the global frame-to-frame relativity. RESULTS: We evaluate our method on a robotic suturing task from the JIGSAWS dataset. The gesture recognition task largely outperforms the state of the arts on the frame-wise accuracy up to [Formula: see text] 6 points and the F1@50 score [Formula: see text] 8 points. We also keep the 100% predicted accuracy for the skill assessment task using LOSO validation scheme. CONCLUSION: The results indicate that our architecture is able to obtain representative surgical video features by extensively considering the spatial, temporal and relational context from raw video input. Furthermore, the better performance in multi-task learning implies that surgical skill assessment has a complementary effects to gesture recognition task.

http://eprints.bournemouth.ac.uk/36142/

Source: PubMed

SD-Net: joint surgical gesture recognition and skill assessment

Authors: Zhang, J., Nie, Y., Lyu, Y., Yang, X., Chang, J. and Zhang, J.J.

Journal: INTERNATIONAL JOURNAL OF COMPUTER ASSISTED RADIOLOGY AND SURGERY

Volume: 16

Issue: 10

Pages: 1675-1682

eISSN: 1861-6429

ISSN: 1861-6410

DOI: 10.1007/s11548-021-02495-x

http://eprints.bournemouth.ac.uk/36142/

Source: Web of Science (Lite)

SD-Net: joint surgical gesture recognition and skill assessment.

Authors: Zhang, J., Nie, Y., Lyu, Y., Yang, X., Chang, J. and Zhang, J.J.

Journal: International journal of computer assisted radiology and surgery

Volume: 16

Issue: 10

Pages: 1675-1682

eISSN: 1861-6429

ISSN: 1861-6410

DOI: 10.1007/s11548-021-02495-x

Abstract:

Purpose

Surgical gesture recognition has been an essential task for providing intraoperative context-aware assistance and scheduling clinical resources. However, previous methods present limitations in catching long-range temporal information, and many of them require additional sensors. To address these challenges, we propose a symmetric dilated network, namely SD-Net, to jointly recognize surgical gestures and assess surgical skill levels only using RGB surgical video sequences.

Methods

We utilize symmetric 1D temporal dilated convolution layers to hierarchically capture gesture clues under different receptive fields such that features in different time span can be aggregated. In addition, a self-attention network is bridged in the middle to calculate the global frame-to-frame relativity.

Results

We evaluate our method on a robotic suturing task from the JIGSAWS dataset. The gesture recognition task largely outperforms the state of the arts on the frame-wise accuracy up to [Formula: see text] 6 points and the F1@50 score [Formula: see text] 8 points. We also keep the 100% predicted accuracy for the skill assessment task using LOSO validation scheme.

Conclusion

The results indicate that our architecture is able to obtain representative surgical video features by extensively considering the spatial, temporal and relational context from raw video input. Furthermore, the better performance in multi-task learning implies that surgical skill assessment has a complementary effects to gesture recognition task.

http://eprints.bournemouth.ac.uk/36142/

Source: Europe PubMed Central

SD-Net: joint surgical gesture recognition and skill assessment.

Authors: Zhang, J., Nie, Y., Lyu, Y., Yang, X., Chang, J. and Zhang, J.

Journal: International Journal of Computer Assisted Radiology and Surgery

Volume: 16

Pages: 1675-1682

ISSN: 1861-6410

Abstract:

PURPOSE: Surgical gesture recognition has been an essential task for providing intraoperative context-aware assistance and scheduling clinical resources. However, previous methods present limitations in catching long-range temporal information, and many of them require additional sensors. To address these challenges, we propose a symmetric dilated network, namely SD-Net, to jointly recognize surgical gestures and assess surgical skill levels only using RGB surgical video sequences. METHODS: We utilize symmetric 1D temporal dilated convolution layers to hierarchically capture gesture clues under different receptive fields such that features in different time span can be aggregated. In addition, a self-attention network is bridged in the middle to calculate the global frame-to-frame relativity. RESULTS: We evaluate our method on a robotic suturing task from the JIGSAWS dataset. The gesture recognition task largely outperforms the state of the arts on the frame-wise accuracy up to [Formula: see text] 6 points and the F1@50 score [Formula: see text] 8 points. We also keep the 100% predicted accuracy for the skill assessment task using LOSO validation scheme. CONCLUSION: The results indicate that our architecture is able to obtain representative surgical video features by extensively considering the spatial, temporal and relational context from raw video input. Furthermore, the better performance in multi-task learning implies that surgical skill assessment has a complementary effects to gesture recognition task.

http://eprints.bournemouth.ac.uk/36142/

Source: BURO EPrints