Uploading Your First Dataset

The first time you log in to LightTag you’ll arrive at the OnBoarding section which begins with defining a Dataset.

A Dataset is a collection of Examples with some specifications about how we’d like to show them to our annotators.

When you first start with LightTag we’ll ask you to specify a Dataset and upload examples. Here is what that process looks like

Uploading a dataset

Naming Our Datset

We gave our dataset a name “Example Upload” then selected the JSON file with our data.

Then LightTag read the JSON and saw what fields exist on the individual JSON objects.

Defining what to annotate

LightTag asked us “What field in the objects contains the text you want to annotate?” and we said Message

Uploading a dataset

Specifying the Message is the field that contains the data to be annotated

If you look at the animation above, it goes through a few other options that we will come back to in a moment. For now we’ve defined a Dataset named “Example Upload”, uploaded the actual data and told LightTag we want to annotate the text in the Message field.

Adding a Unique Constraint

LightTag wants to maintain the integrity of your data. If your data has a field on it that should be unique in the dataset you can specify as much and LightTag will enforce that unique constraint for you. This can be useful if you plan to update your datasets and want to ensure you don’t add duplicates. In our simple example we don’t have a field for this and so left that option blank.

Selecting a unique id

Selecting (or in this case, leaving blank) a Unique id

Showing multiple examples together

The way we defined our Dataset, LightTag will show an annotator exactly one message at a time. When we are dealing with independent texts (like distinct articles) this might make sense. But when dealing with a conversation we might want to have the annotator go through all of the messages in the conversation so that they can be aware of the context.

To facilitate this, LightTag lets you specify an optional aggregation_key. This tells LightTag that we should show all Examples with the a shared value of that key together. In our example, “Conv_id” is a good aggregation key

Selecting an aggregation key

Choosing Conv_id as the field to aggregate Examples by

Of course, we’d like to show the messages in a conversation in the order in which they came. LightTag allows us to specify an order key which does exactly that. In our case, a natural choice for sorting would be Time

Danger

Since JSON has no native representation of time, LightTag can’t sort by your date and time strings correctly. To get over this problem, either convert your dates to epoch time or add a field with the sequential order.