Authors (including presenting author) :
NG GWY (1), CHEUNG VKL (1), SO SS (1), ZHANG X (2), LIU Y (2), LEUNG ASH (1), SO EHK (1), CHIA NH (1)
Affiliation :
(1) Multi-Disciplinary Simulation and Skills Centre, Queen Elizabeth Hospital, (2) Department of Computing, Hong Kong Polytechnic University
Introduction :
Clinical handover is a fundamental but critical day-to-day practice throughout patient journey. Reporting in sequence of Identity or Introduction, Situation, Background, Assessment, and Recommendation, aka “ISBAR”, has been adopted as structured protocol of clinical handover under the Hospital Authority. In 2020, Multi-disciplinary Simulation and Skills Centre (MDSSC) and Hong Kong Polytechnic University (HKPU) co-developed the first computerized program to mimic human-to-human-like communication pattern in clinical handover through human-machine interaction between medical interns and Artificial Intelligence (AI).
Objectives :
This innovative project aims to i) establish Grading System Analysis (GSA) on competence in ISBAR reporting and ii) evaluate inter-rater reliability among expert raters with medical or nursing background.
Methodology :
This is a reliability analysis on level of consistency of raters’ grading on clinical handover using ISBAR communication framework. Bias-free assessment quality relied on “Double-S Triple-A” approach, namely Screening, Standardization of tool, Adequate rater training, Appropriate rater selection, and Accountability for operation. As part of the new training modality, AI-ISBAR has been embedded onto the existing medical Intern training held in MDSSC since Spring 2021. Regarding sample size calculation, a desired width of “0.3 for 95% CI with 2 raters” or “0.25 for 95% CI with 3 raters” requires not less than 24 medical interns given expected Intra-Class Correlation (ICC) at 0.8. Two-way random-effects model with single measures and absolute agreement, or ICC (2,1), were used to examine the extent to which two or more raters giving identical results under the same rating protocol.
Result & Outcome :
Of 40 audio-clips undergone preliminary screening on quality of articulation and voice-to-text script, thirty (75%) were rated by 2 independent medical and nursing expert raters (plus 1 AI rater auto-generated by server based on voice-to-text transcription) using a standardized checklist (with 26 items under 5 categories “I-S-B-A-R”). Except “Situation” (ICC = .69), all categories were rated by different raters with excellent reliability ranging from .81 to .95. Raters’ accuracy in identifying “Very good/ Excellent” performer was at 80% (Nurse), 85% (Doctor), and 88% (AI) respectively. Our AI-ISBAR program can serve as a reliable and validated tool for education and assessment purpose, supported by substantial to almost perfect inter-rater agreement among raters and accuracy in identifying good performers. When it comes to optimization of cost-effectiveness and training capacity, utilizing AI-ISBAR could save 0.5 man-hour of experienced instructor for each user in every formal session by self for deliberate practice or by trained administrator for high-stake summative assessment.