2026年机器视觉、检测与三维成像技术国际学术会议(MVDIT 2026)

KEYNOTE SPEAKERS 1

Prof. Lyu Ke, University of Chinese Academy of Sciences, China

Biography: Lyu Ke is a Distinguished Professor and Doctoral Supervisor at the University of Chinese Academy of Sciences. He is a leading figure of the National High-Level Talent Special Support Program ("Ten Thousand Talent Program"), a Young and Middle-Aged Science and Technology Innovation Leader under the Innovation Talent Promotion Program of the Ministry of Science and Technology, and a Distinguished Professor of the Beijing Higher Education Institution High-Level Talent Introduction and Cultivation Program. He also serves as a Dual-Appointed Professor at Peng Cheng National Laboratory. He is the Principal Investigator of a Group (Type A) project under the National Natural Science Foundation of China (NSFC) Innovation Research Group Program, the Chief Scientist of a special project under the National Key Research and Development Program ("Basic Scientific Research Conditions and Major Scientific Instrument and Equipment Development"), and a recipient of the State Council Special Government Allowance. His main research interests include image processing and intelligent information processing technologies. He has led over 30 research projects, including those funded by the NSFC, the National Key Research and Development Program, the Chinese Academy of Sciences' Instrument and Equipment Program, and the Beijing Municipal Education Commission. He has published over 150 papers in leading domestic and international academic journals and conferences, and has edited two books. His research achievements have earned him the Second Prize of the National Science and Technology Progress Award (2004 and 2009), the Second Prize of Beijing Municipal Science and Technology Award (2012), the Second Prize of the China Electronic Information Science and Technology Award by the Chinese Institute of Electronics (2012), the Chinese Academy of Sciences (Beijing Area) Technology Transfer Award (2017), and the Team Silver Award in the First National Postdoctoral Innovation and Entrepreneurship Competition (2021).

Speech Title: Key Technologies and Applications of Ultrasonic Microscope Development

Abstract: With the rapid advancement of intelligent manufacturing in China, the demand for intelligent detection equipment is increasing. As a core piece of equipment in the field of precision inspection, the ultrasonic microscope directly impacts the quality level of China's manufacturing industry. However, when tasked with increasingly complex inspection missions, domestic ultrasonic microscope products still face challenges such as a weak technological foundation and low detection precision, which severely constrain the industrial upgrading of China's intelligent manufacturing. Addressing the intelligent precision inspection demands of high-end manufacturing, how to break through key technologies—such as ultra-high frequency acoustic transducers, ultra-high frequency pulse reception, acquisition and preprocessing, high-speed precision scanning of complex surface profiles, visual 3D imaging inspection, and mechanical property evaluation of multilayer heterogeneous materials—and develop ultra-high frequency ultrasonic microscopes with domestically produced core components, is a crucial step in promoting the high-quality development of China's intelligent detection equipment industry. This report will introduce the key technologies and research progress in the development of ultrasonic microscope equipment.

>>SEE MORE<<

KEYNOTE SPEAKERS 2

Prof. Li Xi, Zhejiang University, China

Biography: Li Xi, male, PhD, is a Fellow of IAPR, IET, and AAIA, a Member of the National Academy of Artificial Intelligence (NAAI, USA), an IEEE Senior Member, a Distinguished Member of CCF and CSIG, a Qiushi Distinguished Professor at Zhejiang University, a recipient of the National Outstanding Youth Science Fund and the National Young Distinguished Expert title, and has been listed in the World's Top 2% Scientists (including both the Lifetime Scientific Influence Ranking and the Annual Scientific Influence Ranking); he is also a 2023 "Highly Cited Chinese Researcher" by Elsevier, the Chief Scientist of the Major Project on New-Generation Artificial Intelligence under the Science and Technology Innovation 2030 Program of the Ministry of Science and Technology, and the Principal Investigator of the National Natural Science Foundation Key Project, the Ministry of Education Key Program Research Project, the KJW Key Basic Research Project, the National Natural Science Foundation General Project, the Zhejiang Provincial Natural Science Foundation Major Project, and the Ningbo "Innovation & Entrepreneurship Yongjiang 2035" Key R&D Program Project; in addition, he holds the titles of Recipient of the Zhejiang Provincial Outstanding Youth Science Fund, Zhejiang Provincial Distinguished Expert, Hangzhou Qianjiang Distinguished Expert, and Member of the Second Level of Zhejiang Provincial 151 Talent Cultivation Project, and serves as a Distinguished Expert of the Expert Committee of the China Center for Information and Electronic Technology Development Strategy Research, with his main research and development work focusing on the fields of computer vision, pattern recognition, and machine learning.

Speech Title: Image and Video World Model Generation Based on Multimodal Representation

Abstract: Nowadays, image and video generation are hot and challenging topics in the field of artificial intelligence, especially the underlying approach to modeling interactive world models. This report focuses on data-driven AI learning methods and provides an in-depth analysis from multiple perspectives, including efficient multimodal generation, understanding, and representation. It systematically reviews the different developmental stages of multimodal feature representation and learning, and presents a series of representative research works and practical applications that our team has conducted in recent years on visual semantic analysis and understanding generation using feature learning. Special attention is given to the potential application of these technologies in building real-time interactive world simulators driven by video generation. A world simulator aims to simulate the evolution process of the real world by generating dynamic video sequences that conform to physical laws, thereby providing a foundation for decision-making, simulation, and content creation. This imposes core requirements on the efficiency, controllability, temporal consistency, and physical plausibility of the generative model. Finally, the report will discuss some open problems and challenges related to multimodal visual generation and understanding.

>>SEE MORE<<

KEYNOTE SPEAKERS 3

Prof. Min Liu, Hunan University, China

Biography: Min Liu is a Second-Level Professor at Hunan University and Secretary of the Party Committee of the College of Artificial Intelligence and Robotics. He is a recipient of the National Science Fund for Distinguished Young Scholars, a Young Chang Jiang Scholar of the Ministry of Education, and Chief Scientist of the National Key R&D Program of China. He received his bachelor’s degree from Peking University and his Ph.D. from the University of California, Riverside. He also serves as Vice President of the Hunan Association of Automation, Director of the Key Laboratory of Advanced Manufacturing Visual Inspection and Control Technology of the Machinery Industry, Council Member of the China Society of Image and Graphics, and Deputy Director of its Youth Working Committee.

Speech Title: An Initial Study on Embodied Surgical Robots

Abstract: Breakthroughs in the core technologies of high-end medical equipment, such as surgical robots, and their comprehensive transformation and upgrading toward intelligence constitute a major national strategic task oriented toward the frontiers of global science and technology, major national needs, and the protection of people’s life and health. They also provide decisive assurance and strong support for breaking the technological monopoly of Europe and the United States in high-end digital medical equipment. Current surgical robots lack effective multimodal collaborative perception systems for surgical targets and still impose high operational demands on surgeons, which severely limits their broader application in emergency response to major national contingencies, such as national defense security, epidemics, and disasters. Embodied intelligence, by establishing a closed-loop interaction mechanism of “perception–cognition–action”, enables surgical robots to understand surgical environments, adapt to complex scenarios, and make intelligent decisions in a manner similar to human surgeons. This represents a key pathway for achieving a leap in their autonomous capabilities. In response to these challenging issues, this lecture provides an in-depth introduction to the basic principles and key methods of multimodal perception in surgical robots across the preoperative, intraoperative, and postoperative stages. It also presents some preliminary progress made by our team in embodied-intelligence-driven autonomous surgical robotic operation, thereby providing important support for reducing medical accidents in China.

>>SEE MORE<<

KEYNOTE SPEAKERS 4

Prof. Yinjie Lei, Sichuan University, China

Biography: Yinjie Lei, Professor and Ph.D. Advisor, holds a distinguished chair at Sichuan University as the Party Secretary of the School of Cybersecurity. Professor Lei holds multiple national and provincial talent honors, notably the National Young Talent Program and the Sichuan Distinguished Young Scholars, while spearheading pioneering research in multimodal perception and synthesis. In recent years, he has published over 80 papers in top-tier conferences such as CVPR, IEEE, and ECCV, as well as in renowned journals including IEEE TPAMI, TIP, and TNNLS. He serves as Area Chair (AC) for ACM MM 2024–2026 and as a Senior Program Committee (SPC) member for AAAI 2021–2026.

Speech Title: Multimodal Learning for Visual Understanding and Generation

Abstract: In recent years, the rapid advancement of information technology has positioned multimodal data as a pivotal carrier across scientific research, industry, and daily life. Whereas data from a single source can capture only partial aspects of an entity, the integration of heterogeneous information from multiple sources enables a more comprehensive and nuanced representation of the target object. This report highlights the original contributions of our research group in multimodal learning, systematically presenting findings in tasks such as scene perception, semantic parsing, and visual synthesis. Looking ahead, we further explore the prospective applications of multimodal learning in emerging areas, including embodied intelligence, offering insights to guide technological innovation and industrial development in this rapidly evolving field.

>>SEE MORE<<