Thursday, July 19, 2012

Why NLP should be wrapped with knowledge base


I have been working on improve the machine understandability of the Electronic Medical Records (EMR) for past few months. The main goal is to improve the efficiency of the routine activities of the hospitals. Hospital staff has to process thousands of documents to find the qualified patients for certain criteria daily(The heart failure patients who are not given instructions on their diets, activity level, medications, follow up plan etc..), we have not been able to help them in promising manner despite the enormous advancement of the technology.

Recent decisions in healthcare industry have forced people to rethink about their approaches to deal with routine activities. As a result of this, academics have been approached by the industry to solve their problems. This stimulates the current trend of exploring how to leverage the current technology in healthcare domain. Natural Language Processing(NLP), Linguistics, Data Mining, Information Retrieval, Machine Learning and Semantic Web are the frontline technologies being explored on this domain.

Within the project that I have been working on for last few months with motivated industry partner, we are concentrated on understanding EMR documents and perform queries on these documents. The main challenge for this task is the unstructured nature of the EMR documents, only 10%-20% of the EMR data has proper structure. Making the things worse, the unstructured portion of the EMR documents has most of the information that we need to understand to answer the queries accurately. Unfortunately, traditional information retrieval techniques are not capable of accepting this challenge due its inherent nature of the keyword based indexing and searching. The following example will illustrate the complexity of the problem.

Text in EMR: ‘The patient is suffering from hypertension, but he refuses the recurrent of chest pain or shortness of breath. He was suffering from atrial fibrillation during his last visit, but it is resolved.’

The keyword based approach will retrieve this patient for the query which search for ‘patient with atrial fibrillation’. Furthermore, this patient will be in the results for query searching for ‘patients with chest pain’. But the semantics of the sentence convey the opposite.

NLP techniques showed some promise on dealing with these issues. There are few NLP engines which are optimized to deal with these issues. These engines are capable of identifying the concepts in the documents, the negation semantics, detect the temporal aspects to certain extent and convert the unstructured text to structured format. This structured format can be used for the querying purposes. Unfortunately, some of issues remain beyond the reach by NLP techniques as I discuss below.

Knowledge workers have crucial role to play in routine activities of healthcare domain. Significant amount of explicit and implicit knowledge is being used by the healthcare professionals in their routine activities. The subjective and dynamic nature of this knowledge makes this problem hard and hence interesting for me as a researcher. A successful solution aiming to cater healthcare domain requirements should consider how to incorporate expert knowledge. This is the main reason why our solution is backed by a knowledge base which encodes healthcare domain knowledge in formal manner. Furthermore, the knowledge base helps us to improve the output by the NLP engine.

Let me discuss why I feel NLP is not sufficient to solve these problems (even though I am not a NLP expert). NLP techniques mostly fail to understand the big picture of the document which is very crucial in this domain. The EMR document can contain the various information about the patient/his family from different person’s point of view. For example there can be a statement in the EMR document ‘He denies knowledge of hypertension’, but after the examination, doctor decides that he has hypertension. The NLP output of this document will states both facts which are contradicting, because its intention is to convey the semantics of each sentence. But this is not sufficient for the problems that we deal with. We don’t want this patient to appear in the results for queries which search for ‘patients with hypertension’ as well as ‘patients without hypertension’.

However, if the domain expert is given with all the concepts(symptoms, medication in this case) he would be able to determine whether the patient has hypertension or not with good accuracy (he doesn’t need to read the document to determine this). This is a classic example that point to the value of having a knowledge base.

Apart from above scenario, knowledge base can play a crucial role in retrieving the results for certain queries.  For example, healthcare professionals have to put lot of manual effort to find out the patients who takes beta-blockers(beta-blocker is a category of medications). There are 20+ medications which belongs to beta blocker category. The manual process of finding patients who take beta blocker will have to search for the presence of any of this medication in the document. NLP will help us to identify these 20+ medications as concepts, but it is not enough to include these documents for the query search for ‘patients taking beta-blockers’. Availability of comprehensive knowledge base can solve this problem.

Above illustration is not to state that knowledge base is the ultimate solution, but I believe that with some optimized molding of several technologies can achieve the expected quality of such a solution.