iEarLM: A Middle-Outer Ear Lesion Recognition and Clinical Diagnostic Report Generation System Based on Artificial Intelligence and Augmented Reality (AIR) Technology

Category: *Precision Health & Smart Medical

Exhibitor: NATIONAL KAOHSIUNG UNIVERSITY OF SCIENCE AND TECHNOLOGY

Booth No: N416

Characteristic

iEarLM: A Middle-Outer Ear Lesion Recognition and Clinical Diagnostic Report Generation System Based on Artificial Intelligence and Augmented Reality (AIR) Technology (hereinafter referred to as the system) consists of a digital otoscope, AR smart glasses, and an edge computing platform, integrated with AI-based image analysis and clinical report generation modules to form a practical intelligent medical assistance system for clinical use. During examination, eardrum images are captured using a digital otoscope and transmitted through a local network to the edge platform for real-time analysis. The system performs lesion localization and contour identification, and the results are simultaneously displayed on both the system interface and AR smart glasses. This enables physicians to directly observe eardrum images and AI analysis results within their field of view, without switching attention between the otoscope and external monitors, thereby facilitating intuitive understanding of lesion location and morphology.
For image analysis, the system adopts an instance segmentation approach to identify eardrum lesions, allowing precise delineation of lesion regions and contours. Compared with existing otoscopic image assistance systems or prior studies that typically provide classification results or bounding-box-based localization, the proposed method further presents lesion shape and spatial extent. This enables image interpretation to evolve from simple abnormality detection to structural understanding of lesions, thereby improving diagnostic consistency and interpretation accuracy. In terms of clinical workflow integration, the system converts recognition outputs into structured medical information and employs a large language model (LLM) with a retrieval-augmented generation (RAG) mechanism to automatically generate draft diagnostic reports. The generated content is grounded in actual recognition results and supplemented by retrieved medical knowledge, ensuring clinical relevance. Physicians are only required to review and refine the draft, enabling completion of clinical documentation within the same workflow. This design integrates image interpretation and report writing, reducing administrative burden and improving operational efficiency.
The system is deployed on an edge computing architecture, allowing both image analysis and report generation to be performed within the hospital environment to ensure real-time performance and data privacy. A functional prototype has been developed, supporting a complete workflow including image acquisition, real-time recognition, visualization, and report generation, and is planned for further clinical validation in real-world settings. The proposed system can be applied in hospital otolaryngology departments, primary care clinics, telemedicine, and remote screening scenarios. In addition, it demonstrates strong potential for integration with digital otoscope devices, medical equipment manufacturers, and healthcare information systems. The clinical data used in this study were approved by the Institutional Review Board (IRB) of Chang Gung Medical Foundation (Approval No. 202500083B0C5001), ensuring that data acquisition and usage comply with medical ethics and privacy regulations, thereby enhancing the reliability of clinical deployment.