{"cells":[{"cell_type":"markdown","execution_count":null,"metadata":{},"outputs":[],"source":"# Uploading Multiple Models to LightTag \n\n [Check out the video explaining this notebook here](https://youtu.be/OMi_JXUDxeM?t=20)\n\n\nIn this tutorial we'll use four different models to generate suggestions. \nWe'll then use LightTag's review feature to compare the models performance, and generate a high precision labeled data set.\nTo showcase the API and techniques, we'll do this on two distinct datasets, Data from the [Federal Register](https://www.federalregister.gov/) and a collection of politcal tweets. \n\nThe models we'll be using are the Named Entity Recognition components from each of the following: \n\n* [Spacy's Small model](https://spacy.io/models/en#en_core_web_sm)\n* [Spacy's Big Model](https://spacy.io/models/en#en_core_web_lg)\n* [Zalando's Flair](https://research.zalando.com/welcome/mission/research-projects/flair-nlp/) \n* [Stanford CoreNLP](https://stanfordnlp.github.io/CoreNLP/) running in a [docker container](https://github.com/NLPbox/stanford-corenlp-docker) (docker run -p 9000:9000 nlpbox/corenlp)\n\n## Outline \nThis guide is broken down into a few parts as follows: \n\n1. First, we write utility functions for each of the models above which run them on our text and return the results in LightTag's [expected format](suggestions.ipynb#3.-Create-your-suggestions)\n2. We'll pull the datasets we want to process from LightTag and run each of the models on them \n3. We'll unify the model outputs, since some of them output different names for the same thing (ORG vs ORGANIZATION) \n4. We'll create a new Schema based on the unified tags\n5. Upload the model suggestions \n8. Review the Data in LightTag \n9. Pull metrics"},{"cell_type":"markdown","execution_count":null,"metadata":{},"outputs":[],"source":"## Part 1 - Adapters for our four models"},{"cell_type":"code","execution_count":52,"metadata":{},"outputs":[],"source":"from ltsession import LTSession # Thin wrapper over LightTag's api, get it here (https://gist.github.com/talolard/793563397c48dca32f75c9d4b6f8f560)\nimport spacy\nimport requests\nimport pandas\nimport re\nimport pandas as pd # We use this to check ourselves\nfrom flair.data import Sentence\nfrom flair.models import SequenceTagger\n"},{"cell_type":"markdown","execution_count":null,"metadata":{},"outputs":[],"source":"#### CoreNLP \nWe're running CORENLP In a docker container.\nCoreNLP trims whitespaces and sometimes returns overlapping annotations, so we need to handle those cases"},{"cell_type":"code","execution_count":33,"metadata":{},"outputs":[],"source":"preWhiteSpace = re.compile('^\\s+')\n\ndef stanford_to_lighttag_format(example,ent):\n '''\n Takes a LightTag example and a stanford entitty and returns a LightTag Suggestion\n '''\n match = preWhiteSpace.search(example['content'])\n offset = match.end(0) if match else 0 #CORENLP strips whitespaces so we use that regex to adjust offsets\n start = ent[\"characterOffsetBegin\"] + offset\n end = ent[\"characterOffsetEnd\"] + offset\n return {\n \"example_id\":example[\"id\"],\n \"start\":start,\n \"end\":end,\n \"tag\":ent[\"ner\"],\n \"value\":example['content'][start:end]\n# \"tag_id\":tagMap[sug[\"ner\"]]\n }\n# This is the URL of the CORENLP server running in a docker container\nurl='http://localhost:9000/?properties={\"annotators\":\"ner\",\"outputFormat\":\"json\"}'\ndef process_with_stanford(example):\n '''\n Gets a LightTag example, runs coreNLP on it and returns a list of suggestions in LightTag format\n '''\n results = []\n txt = example['content'].encode('utf8') # We need to send it bytes\n data = requests.post(url,data=txt,).json() #Send to the container\n cursor =-1 # Track the last position of a corenlp annotation, so we can ignore overlapping\n for sentence in data['sentences']: #Corenlp does sentence parsing as well, which we dont care about\n \n for entity in sentence[\"entitymentions\"]: #iterate over the entities\n sug = stanford_to_lighttag_format(example,entity) #covert stanford entitiy to lighttag format\n if sug['start']>cursor: # don't accept overlaps\n results.append(sug)\n cursor=sug['end']\n return results #The list of lighttag suggestions"},{"cell_type":"markdown","execution_count":null,"metadata":{},"outputs":[],"source":"#### Spacy Big and Small \nWe're using two built in NER models from Spacy. It's a little easier than stanford"},{"cell_type":"code","execution_count":25,"metadata":{},"outputs":[],"source":"big_nlp = spacy.load(\"en_core_web_lg\") # Load the big spacy model\nsmall_nlp = spacy.load(\"en_core_web_sm\") #Load the small spacy model \n\ndef spacyToSug(example,ent):\n return {\n \"example_id\":example[\"id\"],\n \"start\":ent.start_char,\n \"end\":ent.end_char,\n \"tag\":ent.label_,\n \"value\":example['content'][ent.start_char:ent.end_char]\n\n }\ndef process_with_spacy_big(example):\n results = []\n doc = big_nlp(example['content'])\n for ent in doc.ents:\n results.append(spacyToSug(example,ent))\n return results\ndef process_with_spacy_small(example):\n results = []\n doc = small_nlp(example['content'])\n for ent in doc.ents:\n results.append(spacyToSug(example,ent))\n return results\n"},{"cell_type":"markdown","execution_count":16,"metadata":{},"outputs":[],"source":"#### Flair\nZalandos's Flair package has received many rave reviews and made some big claims. Will be interesting to see how it fares against others \nOnly thing to note is that if you are on CPU, use the ner-**fast** model, the regular one is very slow"},{"cell_type":"code","execution_count":4,"metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":"2019-10-17 14:53:55,792 loading file /home/tal/.flair/models/en-ner-fast-conll03-v0.4.pt\n"}],"source":"ftagger = SequenceTagger.load('ner-fast')\ndef flair_to_suggestions(example,ent):\n return {\n \"example_id\":example[\"id\"],\n \"start\":ent.start_pos,\n \"end\":ent.end_pos,\n \"tag\":ent.tag,\n \"value\":example['content'][ent.start_pos:ent.end_pos]\n# \"tag_id\":tagMap[sug[\"ner\"]]\n }\n \ndef process_with_flair(example):\n doc = Sentence(example['content'])\n ftagger.predict(doc)\n return [flair_to_suggestions(example,ent) for ent in doc.get_spans('ner')]\n"},{"cell_type":"code","execution_count":18,"metadata":{},"outputs":[],"source":""},{"cell_type":"markdown","execution_count":null,"metadata":{},"outputs":[],"source":"#### Putting them together\n\nHere we write a helper function, that receives a list of examples and process each one with each of the models. \nWe collect the list of example_ids that we have seen and submit a [testament](/suggestions/#testaments-saying-that-a-model-has-nothing-to-say) for each model,example pair. \nThis tells LightTag that the model saw the example, even if it made no predictions, and then we can give accurate analytics\n"},{"cell_type":"code","execution_count":45,"metadata":{},"outputs":[],"source":"def process_multiple_examples(examples):\n models={ # Dictionary of models, each has a list of suggestions\n 'spacy_big':[],\n 'spacy_small':[],\n 'stanford':[],\n 'flair':[],\n \n }\n example_ids = [] # we use this to track which examples have been seen. Later we'll submit a testament to LightTag for each model \n for num,example in enumerate(examples):\n models['spacy_big']+=(process_with_spacy_big(example))\n models['spacy_small']+=(process_with_spacy_small(example))\n models['flair'] += (process_with_flair(example))\n models['stanford'] +=(process_with_stanford(example))\n example_ids.append(example['id']) # Take note of the example_id we just processed\n if num %10 ==0:\n print(num)\n return {'models':models,'example_ids':example_ids}"},{"cell_type":"markdown","metadata":{},"source":"## Part 2 Get The Data From LightTag\nWe've already uploaded the data to our LightTag workspace, but if you'd like to follow along on yours, you can find the [raw data here](https://github.com/LightTag/ComparingNERModels)\nThe important point here, is that you pull the data from your LightTag worksapce so that you have the example_ids \n"},{"cell_type":"code","execution_count":21,"metadata":{},"outputs":[],"source":"session = LTSession(workspace='demo',user='lighttag',pwd='Shiva666') # Start an API session"},{"cell_type":"code","execution_count":62,"metadata":{},"outputs":[],"source":"fed_reg_examples =session.get('v1/projects/default/datasets/fedreg/examples/').json() #Retreive the examples from the fedreg dataset\ntrump_examples =session.get('v1/projects/default/datasets/tweets/examples/').json() # Retreive the examples from the tweeets dataset\nexamples = fed_reg_examples[:250] + trump_examples[:250] #We take a subset because Flair is slow"},{"cell_type":"code","execution_count":23,"metadata":{},"outputs":[{"data":{"text/plain":"(1818, 6444, 8262)"},"execution_count":23,"metadata":{},"output_type":"execute_result"}],"source":"len(fed_reg_examples),len(trump_examples),len(examples)"},{"cell_type":"code","execution_count":63,"metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":"0\n10\n20\n30\n40\n50\n60\n70\n80\n90\n100\n110\n120\n130\n140\n150\n160\n170\n180\n190\n200\n210\n220\n230\n240\n250\n260\n270\n280\n290\n300\n310\n320\n330\n340\n350\n360\n370\n380\n390\n400\n410\n420\n430\n440\n450\n460\n470\n480\n490\n"}],"source":"#Run all of the models on the data\nresult_dict = process_multiple_examples(examples)\n\n"},{"cell_type":"code","execution_count":64,"metadata":{},"outputs":[{"data":{"text/plain":"{'example_id': '40e46279-6602-4d97-bf61-66878d565d1d',\n 'start': 4,\n 'end': 23,\n 'tag': 'MISC',\n 'value': 'Trade Agreement Act'}"},"execution_count":64,"metadata":{},"output_type":"execute_result"}],"source":"# This is what the results look like\nmodel_outputs = result_dict['models']\nmodel_outputs['stanford'][0]"},{"cell_type":"markdown","execution_count":null,"metadata":{},"outputs":[],"source":"## Part 3 - Normalizing the Output\nWe ran different models, and while all of them do \"NER\", they use different terms and different granularities. \nIn order to compare them, we need to normalize the tags they use which is what we do below. \nWe make a dictionary that maps from the tag we want to replace to it's replacement value, then iterate over the suggestions and apply it when necasary"},{"cell_type":"code","execution_count":65,"metadata":{},"outputs":[],"source":"maper_dict = dict(ORGANIZATION='ORG',CARDINAL='NUMBER',LOCATION='GPE',LOC='GPE',COUNTRY='GPE',STATE_OR_PROVINCE='GPE',\n NATIONALITY='NORP',WORK_OF_ART='MISC',CITY='GPE',IDEOLOGY='NORP',PER='PERSON',ORDINAL='NUMBER',\n PRODUCT='MISC',RELIGION='NORP'\n )\nreplace_if_need = lambda tag: maper_dict.get(tag,tag) #if the tag is in the dict, give its replacement, otherwise keep it \ndef normalize_suggestion(suggestion):\n suggestion['tag'] = replace_if_need(suggestion['tag'])\n return suggestion\nfor model_name in model_outputs: \n model_outputs[model_name] =list(map(normalize_suggestion,model_outputs[model_name]))\n\n"},{"cell_type":"markdown","execution_count":null,"metadata":{},"outputs":[],"source":"#### Checking ourselves"},{"cell_type":"code","execution_count":66,"metadata":{},"outputs":[],"source":"\nAllSuggestions = pd.DataFrame()\nfor model_name in model_outputs:\n suggestions_pd = pd.DataFrame(model_outputs[model_name])\n suggestions_pd['model'] = model_name\n AllSuggestions = AllSuggestions.append(suggestions_pd)\n"},{"cell_type":"markdown","execution_count":null,"metadata":{},"outputs":[],"source":"It's really useful to look at a pivot table of the tags vs models, counting how often each model said each tag. \nThis tells us nothing about who did a better job, but it can help us recognize overlapping tag names or tags we might not care about \n"},{"cell_type":"code","execution_count":68,"metadata":{},"outputs":[{"data":{"text/html":"
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
modelflairspacy_bigspacy_smallstanford
tag
CAUSE_OF_DEATH0.00.00.07.0
CRIMINAL_CHARGE0.00.00.03.0
DATE0.0335.0332.0398.0
DURATION0.00.00.032.0
EVENT0.06.015.00.0
FAC0.07.025.00.0
GPE351.0565.0563.0596.0
HANDLE0.00.00.090.0
LAW0.0239.0164.00.0
MISC490.051.097.0257.0
MONEY0.058.043.031.0
NORP0.052.056.071.0
NUMBER0.0585.0608.0846.0
ORG726.01013.01012.0552.0
PERCENT0.019.025.024.0
PERSON130.0175.0184.0207.0
QUANTITY0.01.01.00.0
SET0.00.00.020.0
TIME0.026.021.017.0
TITLE0.00.00.0121.0
URL0.00.00.0138.0
\n
","text/plain":"model flair spacy_big spacy_small stanford\ntag \nCAUSE_OF_DEATH 0.0 0.0 0.0 7.0\nCRIMINAL_CHARGE 0.0 0.0 0.0 3.0\nDATE 0.0 335.0 332.0 398.0\nDURATION 0.0 0.0 0.0 32.0\nEVENT 0.0 6.0 15.0 0.0\nFAC 0.0 7.0 25.0 0.0\nGPE 351.0 565.0 563.0 596.0\nHANDLE 0.0 0.0 0.0 90.0\nLAW 0.0 239.0 164.0 0.0\nMISC 490.0 51.0 97.0 257.0\nMONEY 0.0 58.0 43.0 31.0\nNORP 0.0 52.0 56.0 71.0\nNUMBER 0.0 585.0 608.0 846.0\nORG 726.0 1013.0 1012.0 552.0\nPERCENT 0.0 19.0 25.0 24.0\nPERSON 130.0 175.0 184.0 207.0\nQUANTITY 0.0 1.0 1.0 0.0\nSET 0.0 0.0 0.0 20.0\nTIME 0.0 26.0 21.0 17.0\nTITLE 0.0 0.0 0.0 121.0\nURL 0.0 0.0 0.0 138.0"},"execution_count":68,"metadata":{},"output_type":"execute_result"}],"source":"AllSuggestions.pivot_table(index='model',columns='tag',values='start',aggfunc=len).fillna(0).T"},{"cell_type":"markdown","execution_count":null,"metadata":{},"outputs":[],"source":"## Part 4 - Defining a New Schema \nIn case we don't already have a Schema defined in LightTag that contains all of these tags, we can create one now with the API. \nWe'll take a list of the tags that appeared from the dataframe we just calulated, then define a new schema"},{"cell_type":"code","execution_count":70,"metadata":{},"outputs":[],"source":"tags=AllSuggestions.tag.unique().tolist()\n\n"},{"cell_type":"code","execution_count":73,"metadata":{},"outputs":[],"source":"schema_def = {\n 'name':'ner-model-comparison',\n 'tags':[{'name':t,'description':t } for t in tags]\n}\nnew_schema =session.post('v1/projects/default/schemas/bulk/',json=schema_def)"},{"cell_type":"code","execution_count":74,"metadata":{},"outputs":[],"source":"schema_id = new_schema.json()['id']"},{"cell_type":"code","execution_count":75,"metadata":{},"outputs":[{"data":{"text/plain":"11385"},"execution_count":75,"metadata":{},"output_type":"execute_result"}],"source":""},{"cell_type":"markdown","execution_count":81,"metadata":{},"outputs":[{"data":{"text/plain":["'0741b2dc-cac3-42f8-a4c1-65d17a79a4c6'"]},"execution_count":81,"metadata":{},"output_type":"execute_result"}],"source":"## Part 5 - Registering the models and uploading suggestions. \nAs we [saw before](suggestions/#2.-Registering-a-SuggestionModel) we need to register a model before we can upload suggestions to it. Models belong to a schema, like the one we just defined. In this example, we'll iterate over the models we calculated and register them, upload suggestions and submit testaments in one go \n\n\n"},{"cell_type":"code","execution_count":77,"metadata":{},"outputs":[],"source":"registerd_models = {} # Capture the models we registered already \nfor model_name in model_outputs:\n model_def = { #definition of the model\n 'schema':schema_id,\n 'name':model_name,\n 'metadata':{\n 'anything':['you','want']\n }\n }\n response = session.post('v2/models/',json=model_def) # Send it to LightTag\n model = response.json() # Get back the model we just regitered\n registerd_models[model_name] = model #Store it for later\n\n session.post(model['url']+'suggestions/',json=model_outputs[model_name]) #Send the suggestions\n session.post(model['url']+'testaments/',json=result_dict['example_ids']) # Testaments, tells LightTag all of the examples this model has seen\n \n\n \n\n\n \n"},{"cell_type":"markdown","execution_count":null,"metadata":{},"outputs":[],"source":"## Part 6 - Reviewing The Results In LightTag \nNow that our suggestions have been uploaded, we want to know how our models compare to each other. \nWe can get a rough sense by looking at the *Inter Model Agreement* in LightTag's analytics dashboard. But agreement isn't enough we want to know who was right and who was wrong, and we'll use LightTag's review feature for that \n\n\n"},{"cell_type":"markdown","execution_count":80,"metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":"/bin/sh: 1: Syntax error: word unexpected (expecting \")\")\n"}],"source":"#### Inter Model Agreement\nA quick way to see if our models tend to agree or conflict. In our case, looks like lots of disagreement. \n![agreement](./img/agreement.png \"Inter Model Agreement for what we just uploaded\")\n\n#### Inter Model Agreement API \n\n#### Review\nStill the question remains, are any of these models better than others ? Do they perform differently on the two datasets ? \nOnly one way to find out, by reviewing the data. Luckily LightTag makes this easy with Review Mode \n"},{"cell_type":"code","execution_count":87,"metadata":{},"outputs":[{"data":{"text/html":"
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
datasetidmodel_xmodel_ynum_agreeschemasize
042d4181d-934f-4c58-850d-ecdf6fdeb83062f92ec5-8ae5-4843-aa16-99a638360dc5/62f92ec5-...62f92ec5-8ae5-4843-aa16-99a638360dc562f92ec5-8ae5-4843-aa16-99a638360dc52691f46c639f-7359-4978-9104-62d23b20656d2691
18f5bd425-ae8c-45f2-9cdd-f297ae6a580662f92ec5-8ae5-4843-aa16-99a638360dc5/62f92ec5-...62f92ec5-8ae5-4843-aa16-99a638360dc562f92ec5-8ae5-4843-aa16-99a638360dc5455f46c639f-7359-4978-9104-62d23b20656d455
242d4181d-934f-4c58-850d-ecdf6fdeb83062f92ec5-8ae5-4843-aa16-99a638360dc5/96f8d8a1-...62f92ec5-8ae5-4843-aa16-99a638360dc596f8d8a1-7e00-4ba8-b806-76ae745bf13e1881f46c639f-7359-4978-9104-62d23b20656d2691
38f5bd425-ae8c-45f2-9cdd-f297ae6a580662f92ec5-8ae5-4843-aa16-99a638360dc5/96f8d8a1-...62f92ec5-8ae5-4843-aa16-99a638360dc596f8d8a1-7e00-4ba8-b806-76ae745bf13e346f46c639f-7359-4978-9104-62d23b20656d455
442d4181d-934f-4c58-850d-ecdf6fdeb83062f92ec5-8ae5-4843-aa16-99a638360dc5/b24cf488-...62f92ec5-8ae5-4843-aa16-99a638360dc5b24cf488-8b83-4766-b6cd-cee6afac2fe7353f46c639f-7359-4978-9104-62d23b20656d2691
\n
","text/plain":" dataset \\\n0 42d4181d-934f-4c58-850d-ecdf6fdeb830 \n1 8f5bd425-ae8c-45f2-9cdd-f297ae6a5806 \n2 42d4181d-934f-4c58-850d-ecdf6fdeb830 \n3 8f5bd425-ae8c-45f2-9cdd-f297ae6a5806 \n4 42d4181d-934f-4c58-850d-ecdf6fdeb830 \n\n id \\\n0 62f92ec5-8ae5-4843-aa16-99a638360dc5/62f92ec5-... \n1 62f92ec5-8ae5-4843-aa16-99a638360dc5/62f92ec5-... \n2 62f92ec5-8ae5-4843-aa16-99a638360dc5/96f8d8a1-... \n3 62f92ec5-8ae5-4843-aa16-99a638360dc5/96f8d8a1-... \n4 62f92ec5-8ae5-4843-aa16-99a638360dc5/b24cf488-... \n\n model_x model_y \\\n0 62f92ec5-8ae5-4843-aa16-99a638360dc5 62f92ec5-8ae5-4843-aa16-99a638360dc5 \n1 62f92ec5-8ae5-4843-aa16-99a638360dc5 62f92ec5-8ae5-4843-aa16-99a638360dc5 \n2 62f92ec5-8ae5-4843-aa16-99a638360dc5 96f8d8a1-7e00-4ba8-b806-76ae745bf13e \n3 62f92ec5-8ae5-4843-aa16-99a638360dc5 96f8d8a1-7e00-4ba8-b806-76ae745bf13e \n4 62f92ec5-8ae5-4843-aa16-99a638360dc5 b24cf488-8b83-4766-b6cd-cee6afac2fe7 \n\n num_agree schema size \n0 2691 f46c639f-7359-4978-9104-62d23b20656d 2691 \n1 455 f46c639f-7359-4978-9104-62d23b20656d 455 \n2 1881 f46c639f-7359-4978-9104-62d23b20656d 2691 \n3 346 f46c639f-7359-4978-9104-62d23b20656d 455 \n4 353 f46c639f-7359-4978-9104-62d23b20656d 2691 "},"execution_count":87,"metadata":{},"output_type":"execute_result"}],"source":"IMA = session.get(\"/v1/metrics/model/iaa/\",params={\"schema_id\":schema_id}).json()\nIMA = pd.DataFrame(IMA)\nIMA.head()"}],"nbformat":4,"nbformat_minor":2,"metadata":{"language_info":{"name":"python","codemirror_mode":{"name":"ipython","version":3}},"orig_nbformat":2,"file_extension":".py","mimetype":"text/x-python","name":"python","npconvert_exporter":"python","pygments_lexer":"ipython3","version":3}}