Dr Yan Gong
- ygong at bournemouth dot ac dot uk
- http://orcid.org/0000-0003-2853-2108
- Lecturer in Computer Science
- P302, Poole House
Biography
Yan Gong is a Lecturer in Computer Science at Bournemouth University. He received his PhD in Computer Science from Loughborough University in 2023, specialising in NLP, Cross-modal Learning, Generative AI, and AI Agents. While pursuing his PhD, he also worked as a part-time research assistant in AI and NLP at Loughborough University, further honing his expertise in these cutting-edge technologies. Before pursuing his PhD, Dr Gong gained significant industry experience as a lead AI engineer for over six years, a role that allowed him to develop a deep understanding of practical AI applications and their impact on the tech industry. He also holds a Master's degree in Communications and Signal Processing with distinction from Newcastle University, achieved in 2012. Dr Gong is driven by a passion for applying his research to solve real-world problems, actively seeking partnerships with industry stakeholders to bring his innovative solutions to market.
Research
RESEARCH EXPERTISE
Natural Language Processing • Cross-modal Learning • Generative AI • Agentic AI • Embodied AI • Semantic Communication
Yan’s research investigates how artificial intelligence can understand, generate, and communicate information across different modalities, including text, images, video, audio, and embodied interaction. His work places particular emphasis on cross-modal learning, generative models, and intelligent agents, with the aim of developing AI systems that are more context-aware, adaptive, interpretable, and capable of supporting real-world applications.
PhD SUPERVISION
Yan welcomes enquiries from prospective PhD candidates interested in artificial intelligence, multimodal learning, generative AI, agentic systems, embodied AI, and semantic communication.
SUPERVISION INTERESTS
• Natural Language Processing and Generative AI: Research on language understanding, text generation, large language models, prompt-based learning, retrieval-augmented generation, and the responsible use of generative AI in practical applications.
• Cross-modal and Multimodal Learning: Methods for connecting and reasoning across text, image, video, and audio data, including cross-modal retrieval, vision-language models, multimodal representation learning, and multimodal reasoning.
• Agentic AI and Intelligent Systems: AI agents that can plan, reason, interact with tools, collaborate with users, and support decision-making in complex real-world environments.
• Embodied AI and Human-AI Interaction: Research exploring how AI systems can perceive, reason, and act in physical or simulated environments, including robotics, embodied agents, and human-centred AI interaction...
• Semantic Communication: AI-driven approaches to representing, transmitting, and reconstructing meaning rather than raw data, particularly in multimodal and resource-constrained communication scenarios.
moreExpertise related to UN Sustainable Development Goals
In 2015, UN member states agreed to 17 global Sustainable Development Goals (SDGs) to end poverty, protect the planet and ensure prosperity for all. This person's work contributes towards the following SDGs:
Quality education
"Ensure inclusive and equitable quality education and promote lifelong learning opportunities for all"
Decent work and economic growth
"Promote sustained, inclusive and sustainable economic growth, full and productive employment and decent work for all"
Industry, innovation and infrastructure
"Build resilient infrastructure, promote inclusive and sustainable industrialization and foster innovation"
Partnership for the Goals
"Strengthen the means of implementation and revitalize the Global Partnership for Sustainable Development"
Journal Articles
- Hou, J., Tan, Z., Hu, Q., Wang, P., Gong, Y., 2026. Multimodal hierarchical classification using cascade-of-thought. Information Processing and Management, 63 (3).
- Gong, Y., Chu, Z., Zhu, Z., Xiao, P., Zeng, M., Wang, Y., Pandey, H., Hou, J., 2026. Wireless Vision-Centered Semantic Communication for Smart City Environment: Pretrained Network and Quantization. IEEE Transactions on Consumer Electronics, 72 (1), 2383-2398.
- Hu, Q., Li, X., Hou, J., Wang, P., Gong, Y., 2026. Predicting Government Microblog Comment Popularity: Insights From Diffusion of Innovations. IEEE Transactions on Computational Social Systems, 1-15.
- Hou, J., Tan, Z., Hu, Q., Wang, P., Gong, Y., 2025. Multimodal hierarchical classification using cascade-of-thought. Information Processing & Management.
- Pandey, H.M., Gupta, A., Sarkar, S., Tomer, M., Johannes, S., Gong, Y., 2025. GEMMA-SQL: A Novel Text-to-SQL Model Based on Large Language Models. Applied Artificial Intelligence, 39 (1).
- Gong, Y., Cosma, G., Finke, A., 2024. VITR: Augmenting Vision Transformers with Relation-Focused Learning for Cross-modal Information Retrieval. ACM Transactions on Knowledge Discovery from Data, 18 (9).
- Gong, Y., Cosma, G., 2023. Improving visual-semantic embeddings by learning semantically-enhanced hard negatives for cross-modal information retrieval. Pattern Recognition, 137.
- Gong, Y., Cosma, G., Fang, H., 2021. On the limitations of visual-semantic embedding networks for image-to-text information retrieval. Journal of Imaging, 7 (8).
Conferences
- Gong, Y., Cosma, G., 2023. Boon: A Neural Search Engine for Cross-Modal Information Retrieval. Mmir 2023 Proceedings of the 1st International Workshop on Deep Multimodal Learning for Information Retrieval Co Located with mm 2023, 29-37.
- Gong, Y., Cosma, G., Finke, A., 2023. Neural-Based Cross-Modal Search and Retrieval of Artwork. 2023 IEEE Symposium Series on Computational Intelligence Ssci 2023, 264-269.
Preprints
- Pandey, H.M., Gupta, A., Sarkar, S., Tomer, M., Johannes, S., Gong, Y., 2025. GEMMA-SQL: A Novel Text-to-SQL Model Based on Large Language Models.
- Gong, Y., Cosma, G., 2023. Boon: A Neural Search Engine for Cross-Modal Information Retrieval.
- Gong, Y., Cosma, G., Finke, A., 2023. Neural-based Cross-modal Search and Retrieval of Artwork.
- Gong, Y., Cosma, G., Finke, A., 2023. VITR: Augmenting Vision Transformers with Relation-Focused Learning for Cross-Modal Information Retrieval.
- Gong, Y., Cosma, G., 2022. Improving Visual-Semantic Embeddings by Learning Semantically-Enhanced Hard Negatives for Cross-modal Information Retrieval.
Profile of Teaching PG
- L4 Introduction to Business Analytics; L5 Software Engineering; and L7 Industrial Skills and Professional Issues.
External Responsibilities
- Pattern Recognition (Elsevier ), Reviewer
- ACM MM 23 Conference, Reviewer
- IEEE Journal of Biomedical and Health Informatics, Guest Editor
Journal Reviewing/Refereeing
- Neural Networks, 27 Mar 2025
- Knowledge and Information Systems, 21 Feb 2025
- Neural Networks, 20 Feb 2025
- Pattern Recognition, Anonymous peer review, 12 Feb 2025
- AI Communications, 10 Sep 2024
- Pattern Recognition, 01 Feb 2024
Qualifications
- PhD in Computer Science (Loughborough University, 2023)