Uploading Multiple Models to LightTag¶

Check out the video explaining this notebook here

In this tutorial we’ll use four different models to generate suggestions. We’ll then use LightTag’s review feature to compare the models performance, and generate a high precision labeled data set. To showcase the API and techniques, we’ll do this on two distinct datasets, Data from the Federal Register and a collection of politcal tweets.

The models we’ll be using are the Named Entity Recognition components from each of the following:

Spacy’s Small model
Spacy’s Big Model
Zalando’s Flair
Stanford CoreNLP running in a docker container (docker run -p 9000:9000 nlpbox/corenlp)

Outline¶

This guide is broken down into a few parts as follows:

First, we write utility functions for each of the models above which run them on our text and return the results in LightTag’s expected format
We’ll pull the datasets we want to process from LightTag and run each of the models on them
We’ll unify the model outputs, since some of them output different names for the same thing (ORG vs ORGANIZATION)
We’ll create a new Schema based on the unified tags
Upload the model suggestions
Review the Data in LightTag
Pull metrics

Part 1 - Adapters for our four models¶

[52]:

from ltsession import LTSession # Thin wrapper over LightTag's api, get it here (https://gist.github.com/talolard/793563397c48dca32f75c9d4b6f8f560)
import spacy
import requests
import pandas
import re
import pandas as pd # We use this to check ourselves
from flair.data import Sentence
from flair.models import SequenceTagger

CoreNLP¶

We’re running CORENLP In a docker container. CoreNLP trims whitespaces and sometimes returns overlapping annotations, so we need to handle those cases

[33]:

preWhiteSpace = re.compile('^\s+')

def stanford_to_lighttag_format(example,ent):
    '''
    Takes a LightTag example and a stanford entitty and returns a LightTag Suggestion
    '''
    match = preWhiteSpace.search(example['content'])
    offset = match.end(0) if match else 0 #CORENLP strips whitespaces so we use that regex to adjust offsets
    start = ent["characterOffsetBegin"] + offset
    end = ent["characterOffsetEnd"] + offset
    return {
                    "example_id":example["id"],
                    "start":start,
                    "end":end,
                    "tag":ent["ner"],
                    "value":example['content'][start:end]
#                     "tag_id":tagMap[sug["ner"]]
                }
# This is the URL of the CORENLP server running in a docker container
url='http://localhost:9000/?properties={"annotators":"ner","outputFormat":"json"}'
def process_with_stanford(example):
    '''
    Gets a LightTag example, runs coreNLP on it and returns a list of suggestions in LightTag format
    '''
    results = []
    txt = example['content'].encode('utf8') # We need to send it bytes
    data = requests.post(url,data=txt,).json() #Send to the container
    cursor =-1 # Track the last position of a corenlp annotation, so we can ignore overlapping
    for sentence in data['sentences']: #Corenlp does sentence parsing as well, which we dont care about

        for entity in sentence["entitymentions"]: #iterate over the entities
            sug = stanford_to_lighttag_format(example,entity) #covert stanford entitiy to lighttag format
            if sug['start']>cursor: # don't accept overlaps
                results.append(sug)
                cursor=sug['end']
    return results #The list of lighttag suggestions

Spacy Big and Small¶

We’re using two built in NER models from Spacy. It’s a little easier than stanford

[25]:

big_nlp = spacy.load("en_core_web_lg") # Load the big spacy model
small_nlp = spacy.load("en_core_web_sm") #Load the small spacy model

def spacyToSug(example,ent):
        return {
                    "example_id":example["id"],
                    "start":ent.start_char,
                    "end":ent.end_char,
                    "tag":ent.label_,
                    "value":example['content'][ent.start_char:ent.end_char]

                }
def process_with_spacy_big(example):
    results = []
    doc = big_nlp(example['content'])
    for ent in doc.ents:
        results.append(spacyToSug(example,ent))
    return results
def process_with_spacy_small(example):
    results = []
    doc = small_nlp(example['content'])
    for ent in doc.ents:
        results.append(spacyToSug(example,ent))
    return results

Flair¶

Zalandos’s Flair package has received many rave reviews and made some big claims. Will be interesting to see how it fares against others Only thing to note is that if you are on CPU, use the ner-fast model, the regular one is very slow

[4]:

ftagger = SequenceTagger.load('ner-fast')
def flair_to_suggestions(example,ent):
            return {
                    "example_id":example["id"],
                    "start":ent.start_pos,
                    "end":ent.end_pos,
                    "tag":ent.tag,
                    "value":example['content'][ent.start_pos:ent.end_pos]
#                     "tag_id":tagMap[sug["ner"]]
                }

def process_with_flair(example):
    doc = Sentence(example['content'])
    ftagger.predict(doc)
    return [flair_to_suggestions(example,ent) for ent in doc.get_spans('ner')]

2019-10-17 14:53:55,792 loading file /home/tal/.flair/models/en-ner-fast-conll03-v0.4.pt

[18]:

Putting them together¶

Here we write a helper function, that receives a list of examples and process each one with each of the models. We collect the list of example_ids that we have seen and submit a testament for each model,example pair. This tells LightTag that the model saw the example, even if it made no predictions, and then we can give accurate analytics

[45]:

def process_multiple_examples(examples):
    models={ # Dictionary of models, each has a list of suggestions
        'spacy_big':[],
        'spacy_small':[],
        'stanford':[],
        'flair':[],

    }
    example_ids = [] # we use this to track which examples have been seen. Later we'll submit a testament to LightTag for each model
    for num,example in enumerate(examples):
        models['spacy_big']+=(process_with_spacy_big(example))
        models['spacy_small']+=(process_with_spacy_small(example))
        models['flair'] += (process_with_flair(example))
        models['stanford'] +=(process_with_stanford(example))
        example_ids.append(example['id']) # Take note of the example_id we just processed
        if num %10 ==0:
            print(num)
    return {'models':models,'example_ids':example_ids}

Part 2 Get The Data From LightTag¶

We’ve already uploaded the data to our LightTag workspace, but if you’d like to follow along on yours, you can find the raw data here The important point here, is that you pull the data from your LightTag worksapce so that you have the example_ids

[21]:

session = LTSession(workspace='demo',user='lighttag',pwd='Shiva666') # Start an API session

[62]:

fed_reg_examples =session.get('v1/projects/default/datasets/fedreg/examples/').json() #Retreive the examples from the fedreg dataset
trump_examples =session.get('v1/projects/default/datasets/tweets/examples/').json() # Retreive the examples from the tweeets dataset
examples = fed_reg_examples[:250] + trump_examples[:250] #We take a subset because Flair is slow

[23]:

len(fed_reg_examples),len(trump_examples),len(examples)

[23]:

(1818, 6444, 8262)

[63]:

#Run all of the models on the data
result_dict = process_multiple_examples(examples)

[64]:

# This is what the results look like
model_outputs = result_dict['models']
model_outputs['stanford'][0]

[64]:

{'example_id': '40e46279-6602-4d97-bf61-66878d565d1d',
 'start': 4,
 'end': 23,
 'tag': 'MISC',
 'value': 'Trade Agreement Act'}

Part 3 - Normalizing the Output¶

We ran different models, and while all of them do “NER”, they use different terms and different granularities. In order to compare them, we need to normalize the tags they use which is what we do below. We make a dictionary that maps from the tag we want to replace to it’s replacement value, then iterate over the suggestions and apply it when necasary

[65]:

maper_dict = dict(ORGANIZATION='ORG',CARDINAL='NUMBER',LOCATION='GPE',LOC='GPE',COUNTRY='GPE',STATE_OR_PROVINCE='GPE',
            NATIONALITY='NORP',WORK_OF_ART='MISC',CITY='GPE',IDEOLOGY='NORP',PER='PERSON',ORDINAL='NUMBER',
             PRODUCT='MISC',RELIGION='NORP'
            )
replace_if_need = lambda tag: maper_dict.get(tag,tag) #if the tag is in the dict, give its replacement, otherwise keep it
def normalize_suggestion(suggestion):
    suggestion['tag'] = replace_if_need(suggestion['tag'])
    return suggestion
for model_name in model_outputs:
    model_outputs[model_name] =list(map(normalize_suggestion,model_outputs[model_name]))

Checking ourselves¶

[66]:

AllSuggestions = pd.DataFrame()
for model_name in model_outputs:
    suggestions_pd = pd.DataFrame(model_outputs[model_name])
    suggestions_pd['model'] = model_name
    AllSuggestions = AllSuggestions.append(suggestions_pd)

It’s really useful to look at a pivot table of the tags vs models, counting how often each model said each tag. This tells us nothing about who did a better job, but it can help us recognize overlapping tag names or tags we might not care about

[68]:

AllSuggestions.pivot_table(index='model',columns='tag',values='start',aggfunc=len).fillna(0).T

[68]:

model	flair	spacy_big	spacy_small	stanford
tag
CAUSE_OF_DEATH	0.0	0.0	0.0	7.0
CRIMINAL_CHARGE	0.0	0.0	0.0	3.0
DATE	0.0	335.0	332.0	398.0
DURATION	0.0	0.0	0.0	32.0
EVENT	0.0	6.0	15.0	0.0
FAC	0.0	7.0	25.0	0.0
GPE	351.0	565.0	563.0	596.0
HANDLE	0.0	0.0	0.0	90.0
LAW	0.0	239.0	164.0	0.0
MISC	490.0	51.0	97.0	257.0
MONEY	0.0	58.0	43.0	31.0
NORP	0.0	52.0	56.0	71.0
NUMBER	0.0	585.0	608.0	846.0
ORG	726.0	1013.0	1012.0	552.0
PERCENT	0.0	19.0	25.0	24.0
PERSON	130.0	175.0	184.0	207.0
QUANTITY	0.0	1.0	1.0	0.0
SET	0.0	0.0	0.0	20.0
TIME	0.0	26.0	21.0	17.0
TITLE	0.0	0.0	0.0	121.0
URL	0.0	0.0	0.0	138.0

Part 4 - Defining a New Schema¶

In case we don’t already have a Schema defined in LightTag that contains all of these tags, we can create one now with the API. We’ll take a list of the tags that appeared from the dataframe we just calulated, then define a new schema

[70]:

tags=AllSuggestions.tag.unique().tolist()

[73]:

schema_def = {
    'name':'ner-model-comparison',
    'tags':[{'name':t,'description':t } for t in tags]
}
new_schema =session.post('v1/projects/default/schemas/bulk/',json=schema_def)

[74]:

schema_id = new_schema.json()['id']

[75]:

[75]:

Part 5 - Registering the models and uploading suggestions.¶

As we saw before we need to register a model before we can upload suggestions to it. Models belong to a schema, like the one we just defined. In this example, we’ll iterate over the models we calculated and register them, upload suggestions and submit testaments in one go

[77]:

registerd_models = {} # Capture the models we registered already
for model_name in model_outputs:
    model_def = {  #definition of the model
        'schema':schema_id,
        'name':model_name,
        'metadata':{
            'anything':['you','want']
        }
    }
    response = session.post('v2/models/',json=model_def) # Send it to LightTag
    model = response.json() # Get back the model we just regitered
    registerd_models[model_name] = model #Store it for later

    session.post(model['url']+'suggestions/',json=model_outputs[model_name]) #Send the suggestions
    session.post(model['url']+'testaments/',json=result_dict['example_ids']) # Testaments, tells LightTag all of the examples this model has seen

Part 6 - Reviewing The Results In LightTag¶

Now that our suggestions have been uploaded, we want to know how our models compare to each other. We can get a rough sense by looking at the Inter Model Agreement in LightTag’s analytics dashboard. But agreement isn’t enough we want to know who was right and who was wrong, and we’ll use LightTag’s review feature for that

Inter Model Agreement¶

A quick way to see if our models tend to agree or conflict. In our case, looks like lots of disagreement.

Inter Model Agreement API¶

Review¶

Still the question remains, are any of these models better than others ? Do they perform differently on the two datasets ? Only one way to find out, by reviewing the data. Luckily LightTag makes this easy with Review Mode

[87]:

IMA = session.get("/v1/metrics/model/iaa/",params={"schema_id":schema_id}).json()
IMA = pd.DataFrame(IMA)
IMA.head()

[87]:

	dataset	id	model_x	model_y	num_agree	schema	size
0	42d4181d-934f-4c58-850d-ecdf6fdeb830	62f92ec5-8ae5-4843-aa16-99a638360dc5/62f92ec5-...	62f92ec5-8ae5-4843-aa16-99a638360dc5	62f92ec5-8ae5-4843-aa16-99a638360dc5	2691	f46c639f-7359-4978-9104-62d23b20656d	2691
1	8f5bd425-ae8c-45f2-9cdd-f297ae6a5806	62f92ec5-8ae5-4843-aa16-99a638360dc5/62f92ec5-...	62f92ec5-8ae5-4843-aa16-99a638360dc5	62f92ec5-8ae5-4843-aa16-99a638360dc5	455	f46c639f-7359-4978-9104-62d23b20656d	455
2	42d4181d-934f-4c58-850d-ecdf6fdeb830	62f92ec5-8ae5-4843-aa16-99a638360dc5/96f8d8a1-...	62f92ec5-8ae5-4843-aa16-99a638360dc5	96f8d8a1-7e00-4ba8-b806-76ae745bf13e	1881	f46c639f-7359-4978-9104-62d23b20656d	2691
3	8f5bd425-ae8c-45f2-9cdd-f297ae6a5806	62f92ec5-8ae5-4843-aa16-99a638360dc5/96f8d8a1-...	62f92ec5-8ae5-4843-aa16-99a638360dc5	96f8d8a1-7e00-4ba8-b806-76ae745bf13e	346	f46c639f-7359-4978-9104-62d23b20656d	455
4	42d4181d-934f-4c58-850d-ecdf6fdeb830	62f92ec5-8ae5-4843-aa16-99a638360dc5/b24cf488-...	62f92ec5-8ae5-4843-aa16-99a638360dc5	b24cf488-8b83-4766-b6cd-cee6afac2fe7	353	f46c639f-7359-4978-9104-62d23b20656d	2691