Astronaut: A Multi-Level Text Annotation Platform for Fine-Grained NER

Astronaut

In a lot of real-world NLP projects, companies usually have a lot of unstructured data which they want to make sense of; and there are a couple of techniques you can apply to that data to extract meaningful entities and structure them in a more usable format.

To apply information extraction techniques, mostly you would require to prepare or gather some training data. As that unstructured data may be very specific to your domain, you might even require to bootstrap the data and set up a data collection pipeline either fully or partially.

We, at Koverhoop, are helping insurance brokers to save time by automating some mundane form-filling work from lengthy insurance documents. And for this, we use one of the information extraction techniques known as Fine-Grained Named Entity Recognition. On the path in trying to solve this problem, we encountered several challenges which allowed us to build tools to set some standards for at least this particular and related techniques and help in its industry adoption.

Astronaut, helps mitigate any blockers for annotating such data. The data needs to be annotated and stored in a hierarchical fashion. For example, for a span like "$ 40, 000" (of course, with some context around it), we may have annotated it with a Coarse-Grained Entity at the first level, say Amount, and with a much Finer-Grained entity for it like Reduction Amount, or Maximum Amount (depending on the context, within which it appears).

Astronaut exactly helps in annotating such kind of data and storing it for later use for training, or for further processing.