Natural Language Processing (NLP)

Organizational information resides not only in structured databases but also in unstructured mediums like contracts, memos, emails, among others.

Information from these unstructured sources often provide insight into a more structured data. While the market offers plenty of options to help in automatically extracting data from structured content, some businesses like healthcare, real-estate among others still rely heavily on unstructured documents and it is essential for management to have a 360^o view of the business to take informed decisions on marketing, investment, legal, and competition. Nihilent’s abstraction tool automates extraction of information from unstructured sources.

Based on our background and research in Natural Language Processing (NLP), we have developed a system using the Information Extraction (IE) and Information Retrieval (IR) algorithms to extract structured data from free text. A combination of rule based and Machine Learning based techniques is used to achieve this. The system reads and extracts data from sentences, paragraphs, or entire pages written in natural language using proprietary algorithms developed by Nihilent. Our information extraction solution involves three steps:

Information Extraction: Documents are scanned and uploaded in the system; the analyst can write specific rules to be applied on the documents, to extract the required data.
Information Retrieval: NLP and proprietary algorithms are applied to retrieve the data from the documents. A combination of machine-learned and rule-based approach is used to perform this.
QA and Results: After data is extracted, it can be checked to validate if the required data is retrieved and new rules can be applied. Once the QA is complete, the user can export the result in the desired format or a connector can be created to push the data to the IMS/BI system.