AutoDiscern: Rating the Quality of Online Health Information with Hierarchical Encoder Attention-based Neural Networks


Patients increasingly turn to search engines and online content before, or in place of, talking with a health professional. Low quality health information, which is common on the internet, presents risks to the patient in the form of misinformation and a possibly poorer relationship to their physician. To address this, the DISCERN criteria (developed at University of Oxford) are used to evaluate the quality of online health information. However, patients are unlikely to take the time to apply these criteria to the health websites they visit. We built an automated implementation of the DISCERN instrument (Brief version) using machine learning models. We compared the use of a traditional model (Random Forest) with a hierarchical encoder attention-based neural network (HEA) model using two language embeddings based on BERT and BioBERT. The HEA BERT and BioBERT models achieved F1-macro scores averaging 0.75 and 0.74, respectively, on all criteria outperforming the Random Forest model (F1-macro = 0.69). Similarly, HEA BERT and BioBERT scored on average 0.8 and 0.81 (F1-micro) vs. 0.76 for the Random Forest model. Overall, the neural network based models achieved 81% and 86% average accuracy at 100% and 80% coverage, respectively, compared to 94% manual rating accuracy. The attention mechanism implemented in the HEA architectures provided ‘model explainability’ by identifying reasonable supporting sentences for the documents fulfilling the Brief DISCERN criteria. Our research suggests that it is feasible to automate online health information quality assessment, which is an important step towards empowering patients to become informed partners in the healthcare process.

Laura Kinkead
Scientific Software Developer

I am interested in applying data science to improve healthcare.