Text Classification for Clinical Trial Operations: Evaluation and Comparison of Natural Language Processing Techniques.
Journal:
Therapeutic innovation & regulatory science
Published Date:
Oct 30, 2020
Abstract
The ability to detect patterns and trends across protocol deviations (PDs) is key to ensure high data quality and sufficient oversight of patient safety. In clinical trial operations, some business processes and work instructions limit efficient protocol deviation trending because a majority of protocol deviations are left unclassified. When this occurs, it restricts clinical teams from determining systemic issues or signals in the data. The unstructured text in protocol deviation descriptions is an important component of trial operation knowledge. Natural language processing (NLP) can make protocol deviation descriptions more accessible and can support information extraction and trending analysis. This paper reviews how the natural language processing techniques of Term-Frequency Inverse-Document-Frequency (TF-IDF) combined with the supervised machine learning model of Support Vector Machines (SVM) and word embedding approaches such as word2vec can be used to categorize/label protocol deviations across multiple therapeutic areas. NLP is a key tool that will lead to more data driven decisions in clinical trial operations.