LightTag Quick Start

A quick guide to thinking in LightTag

LightTag helps you annotate a Dataset with a Schema. A combination of a Datset and a Schema, together with a specification of the desired number of annotators per Example is called a TaskDefinition.

To give a motivating example, consider we want to train an entity recogniser on the bible. We’d start by asking which entities we might be interested in recognizing, and possible come up with

  • God
  • Person
  • Nation
  • Place
  • Pagan God
  • Action

which we will put together in a Schema and calll that Schema “BibTags”


LightTag uses the more general term Tag instead of entities (An action isn’t really an entity, is it? ). A collection of such Tags makes up a Schema. A schema’s tags should be mutually exclusive, that is a particular span of text can only be tagged as only one thing. Sometimes we’d like to annotate the same span of text as more than one thing. For instance “David” is a Person and also a “Proper Noun”. But those two concepts, Person and Proper Noun, exist in distinct schemas.

Being the good data scientists that we are, we split our initial dataset, the Bible, into two parts. From the first part we will make a “training set” and from the second part we will make a “test set”.

Generally we’d want to make sure that our test set was as accurate as possible even if we’d have to settle for a smaller set. Conversely, for the training set, we might tolerate a lower confidence for the sake of getting a much larger sample size.

To achieve this we’ll create two TaskDefinition.

Our first Task Definition would be “Annotate Dataset Train with Schema BibTags, ensuring every Example was annotated by at least 1 annotators.

Our second Task Definition would be “Annotate Dataset Test with Schema BibTags, ensuring every Example was annotated by at least 3 annotators.

Once we’ve defined our two Task Definitions, LightTag will automatically derive the work that needs to be done and allocate individual :ref:`task`s to annotators.

The next section will describe how LightTag expects to ingest your data and why.