{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#Working With Task Results\n", "\n", "When you run a task on LightTag you'll want to get back the annotations,classifications and relationships annotated in that task. You can retreive that data either through the UI or through the [api](/api#Downloading-Results). The output structure is the same, and in this document we'll use a pre downloaded file. \n", "\n", "The downloaded object contains some metadata about the task, and a list of examples each with the annotations you applied to them . This document will show you how to extract relevant inforamtion, manipulate it to gleam insights and join varying pieces of data together. " ] }, { "cell_type": "code", "execution_count": 136, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dict_keys(['id', 'examples', 'schema', 'dataset', 'relations', 'name', 'annotators_per_example'])" ] }, "execution_count": 136, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd # We use pandas to manipulate the data\n", "import json #Load the the data in json\n", "from pprint import pprint #Print things pretty\n", "results = json.load(open('/home/tal/Downloads/classes2_annotations (1).json')) # Our downloaded results\n", "results.keys()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The results object has the following keys: \n", "* **id** The id of the Task \n", "* **Name** The name of the Task\n", "* **Schema** The Schema the task used, including it's tags and classes. \n", "* **relations** Any relationships that may have been annotated\n", "* **examples** A list of the examples in the dataset. Each example has a list of **Annotations** and **Classifications** " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The Example Object\n", "The example object contains:\n", "\n", "* **content** The content that was annotated, any metadata you attached to it\n", "* **metadata** Any metadata you included with the example\n", "* **seen_by** A list of the annotators that saw this example during the task\n", "* **annotations** A list of annotations made to this example\n", "* **classifications** A list of classifications applied to this example" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
annotationsclassificationscontentexample_idmetadataseen_by
0[][]This proposed rule is issued under Marketing O...11f00552-c0d3-46f3-8cc9-8643a7e74694{'end': 263, 'key': 'agreement', 'type': 'pror...[]
1[][]1. Medical Device User Fee Agreement IV Commit...13c5b003-9741-4342-90c7-dfc7fc86650e{'end': 36, 'key': 'agreement', 'type': 'rule'...[]
\n", "
" ], "text/plain": [ " annotations classifications \\\n", "0 [] [] \n", "1 [] [] \n", "\n", " content \\\n", "0 This proposed rule is issued under Marketing O... \n", "1 1. Medical Device User Fee Agreement IV Commit... \n", "\n", " example_id \\\n", "0 11f00552-c0d3-46f3-8cc9-8643a7e74694 \n", "1 13c5b003-9741-4342-90c7-dfc7fc86650e \n", "\n", " metadata seen_by \n", "0 {'end': 263, 'key': 'agreement', 'type': 'pror... [] \n", "1 {'end': 36, 'key': 'agreement', 'type': 'rule'... [] " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "examples = results['examples']\n", "pd.DataFrame(examples[:2])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Annotations\n", "Each annotation object in the list annotations corresponds to a span in an example with a tag. We call that a Tagged Token. Each annotation object has it's tag, span and example and a list of annotators that made that annotation (annotated_by) as well as it's validation status if it has been validated. The fields on the annotation object are \n", "\n", "* **example_id** The id of the example that was annotated_by\n", "* **start** The start offset of the span that was annotated \n", "* **end** The end offset of the span that was annotated \n", "* **tag** The name of the tag that was applied\n", "* **tag_id** The id of the tag that was applied\n", "* **tagged_token_id** A unique id that determines the (example_id,start,end,tag). Useful for aggregating and joining. \n", "* **value** The text that was annotatoed\n", "* **reviewed** Has this annotation been seen in review mode (true/false)\n", "* **correct** Has this been reviewed and set as correct (true/false - null if not reviewed) \n", "* **annotated_by** A list of the annotators that made this annotation " ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
\n", "
" ], "text/plain": [ "Empty DataFrame\n", "Columns: []\n", "Index: []" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "example = examples[0]\n", "pd.DataFrame(example['annotations'][:3])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Annotation Analytics" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### How to Merge All Annotations from All Examples \n", "Here's a one liner that extrars all of the annotations from all of the examples. \n", "This is useful for analytics as we'll see below" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "all_annotations = sum(map(lambda x:x['annotations'],examples),[])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### How to Count the Tag/Word Pairs (e.g. a weighted dictionary)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
tagDATEDURATIONGPEHANDLEMONEYNUMBERORGPERSONTIMETITLE
value
0.0251.00.00.00.00.00.00.00.00.00.0
02-550.00.00.00.00.00.00.00.01.00.0
024240.00.01.00.00.00.00.00.00.00.0
060.00.00.01.00.00.00.00.00.00.0
12.00.00.00.01.02.01.01.01.00.0
\n", "
" ], "text/plain": [ "tag DATE DURATION GPE HANDLE MONEY NUMBER ORG PERSON TIME TITLE\n", "value \n", " 0.025 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n", " 02-55 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0\n", " 02424 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n", " 06 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0\n", " 1 2.0 0.0 0.0 0.0 1.0 2.0 1.0 1.0 1.0 0.0" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Annotations = pd.DataFrame(all_annotations)\n", "Annotations.pivot_table(index='value',columns='tag',values='example_id',aggfunc=len).fillna(0).head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Calculating Inter Annotator Agreement\n", "To do calculate inter annotator agreement, we'll compare individual annotators with the tagged_token_id. \n", "To get the data in the format we need, we'll use pandas json_normalize utility" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
annotatorannotator_idtimestamptagged_token_idexample_idtagvalue
02019-10-16 08:48:05.62196362019-10-24T13:36:21.917742+00:008568a9cd-ae42-4c06-942a-c3d1d158c541a954d097-c61c-4192-bcea-09121c7f2d07PERSONor
12019-10-16 08:48:05.59281142019-10-24T13:36:22.043508+00:0080313a6b-a227-4005-a3b9-e400715c9fc6a954d097-c61c-4192-bcea-09121c7f2d07HANDLEthe
22019-10-16 08:48:05.62196362019-10-24T13:36:21.914699+00:0024f8561a-d5c9-4728-a9d2-99eecf578fe3a954d097-c61c-4192-bcea-09121c7f2d07TIMEis
32019-10-16 08:48:05.65227782019-10-24T13:36:21.782039+00:005fb9aed3-0a22-4076-b04c-e22efb10d72ba954d097-c61c-4192-bcea-09121c7f2d07NUMBERthe
42019-10-16 08:48:05.62196362019-10-24T13:36:21.908996+00:0056840eeb-342d-4e6c-b970-1a9a022d622ea954d097-c61c-4192-bcea-09121c7f2d07DATEagreement
\n", "
" ], "text/plain": [ " annotator annotator_id timestamp \\\n", "0 2019-10-16 08:48:05.621963 6 2019-10-24T13:36:21.917742+00:00 \n", "1 2019-10-16 08:48:05.592811 4 2019-10-24T13:36:22.043508+00:00 \n", "2 2019-10-16 08:48:05.621963 6 2019-10-24T13:36:21.914699+00:00 \n", "3 2019-10-16 08:48:05.652277 8 2019-10-24T13:36:21.782039+00:00 \n", "4 2019-10-16 08:48:05.621963 6 2019-10-24T13:36:21.908996+00:00 \n", "\n", " tagged_token_id example_id \\\n", "0 8568a9cd-ae42-4c06-942a-c3d1d158c541 a954d097-c61c-4192-bcea-09121c7f2d07 \n", "1 80313a6b-a227-4005-a3b9-e400715c9fc6 a954d097-c61c-4192-bcea-09121c7f2d07 \n", "2 24f8561a-d5c9-4728-a9d2-99eecf578fe3 a954d097-c61c-4192-bcea-09121c7f2d07 \n", "3 5fb9aed3-0a22-4076-b04c-e22efb10d72b a954d097-c61c-4192-bcea-09121c7f2d07 \n", "4 56840eeb-342d-4e6c-b970-1a9a022d622e a954d097-c61c-4192-bcea-09121c7f2d07 \n", "\n", " tag value \n", "0 PERSON or \n", "1 HANDLE the \n", "2 TIME is \n", "3 NUMBER the \n", "4 DATE agreement " ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "IAAData = pd.io.json.json_normalize(all_annotations,meta=['tagged_token_id','example_id','tag','value'],\n", " record_path='annotated_by')\n", "IAAData.head()" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
annotator2019-10-16 08:48:05.5398662019-10-16 08:48:05.5771562019-10-16 08:48:05.5928112019-10-16 08:48:05.6075802019-10-16 08:48:05.6219632019-10-16 08:48:05.6374602019-10-16 08:48:05.6522772019-10-16 08:48:05.6672582019-10-16 08:48:05.6815042019-10-16 08:48:05.696058tal
tagged_token_id
000117f3-f28b-4e90-b87a-5cca1e9137490.00.00.01.00.00.00.00.00.00.00.0
0001915f-3893-4866-a218-ebe4419d86611.00.00.00.00.00.00.00.00.00.00.0
0001ea06-01f8-4996-afd1-d5eb878909b00.00.00.00.00.00.00.00.00.00.01.0
0005bd41-a479-496c-a4e6-2527af145ef40.00.00.00.00.00.00.00.01.00.00.0
0006265e-9eee-499b-955f-fc2cf2bae8fe0.01.00.00.00.00.00.00.00.00.00.0
\n", "
" ], "text/plain": [ "annotator 2019-10-16 08:48:05.539866 \\\n", "tagged_token_id \n", "000117f3-f28b-4e90-b87a-5cca1e913749 0.0 \n", "0001915f-3893-4866-a218-ebe4419d8661 1.0 \n", "0001ea06-01f8-4996-afd1-d5eb878909b0 0.0 \n", "0005bd41-a479-496c-a4e6-2527af145ef4 0.0 \n", "0006265e-9eee-499b-955f-fc2cf2bae8fe 0.0 \n", "\n", "annotator 2019-10-16 08:48:05.577156 \\\n", "tagged_token_id \n", "000117f3-f28b-4e90-b87a-5cca1e913749 0.0 \n", "0001915f-3893-4866-a218-ebe4419d8661 0.0 \n", "0001ea06-01f8-4996-afd1-d5eb878909b0 0.0 \n", "0005bd41-a479-496c-a4e6-2527af145ef4 0.0 \n", "0006265e-9eee-499b-955f-fc2cf2bae8fe 1.0 \n", "\n", "annotator 2019-10-16 08:48:05.592811 \\\n", "tagged_token_id \n", "000117f3-f28b-4e90-b87a-5cca1e913749 0.0 \n", "0001915f-3893-4866-a218-ebe4419d8661 0.0 \n", "0001ea06-01f8-4996-afd1-d5eb878909b0 0.0 \n", "0005bd41-a479-496c-a4e6-2527af145ef4 0.0 \n", "0006265e-9eee-499b-955f-fc2cf2bae8fe 0.0 \n", "\n", "annotator 2019-10-16 08:48:05.607580 \\\n", "tagged_token_id \n", "000117f3-f28b-4e90-b87a-5cca1e913749 1.0 \n", "0001915f-3893-4866-a218-ebe4419d8661 0.0 \n", "0001ea06-01f8-4996-afd1-d5eb878909b0 0.0 \n", "0005bd41-a479-496c-a4e6-2527af145ef4 0.0 \n", "0006265e-9eee-499b-955f-fc2cf2bae8fe 0.0 \n", "\n", "annotator 2019-10-16 08:48:05.621963 \\\n", "tagged_token_id \n", "000117f3-f28b-4e90-b87a-5cca1e913749 0.0 \n", "0001915f-3893-4866-a218-ebe4419d8661 0.0 \n", "0001ea06-01f8-4996-afd1-d5eb878909b0 0.0 \n", "0005bd41-a479-496c-a4e6-2527af145ef4 0.0 \n", "0006265e-9eee-499b-955f-fc2cf2bae8fe 0.0 \n", "\n", "annotator 2019-10-16 08:48:05.637460 \\\n", "tagged_token_id \n", "000117f3-f28b-4e90-b87a-5cca1e913749 0.0 \n", "0001915f-3893-4866-a218-ebe4419d8661 0.0 \n", "0001ea06-01f8-4996-afd1-d5eb878909b0 0.0 \n", "0005bd41-a479-496c-a4e6-2527af145ef4 0.0 \n", "0006265e-9eee-499b-955f-fc2cf2bae8fe 0.0 \n", "\n", "annotator 2019-10-16 08:48:05.652277 \\\n", "tagged_token_id \n", "000117f3-f28b-4e90-b87a-5cca1e913749 0.0 \n", "0001915f-3893-4866-a218-ebe4419d8661 0.0 \n", "0001ea06-01f8-4996-afd1-d5eb878909b0 0.0 \n", "0005bd41-a479-496c-a4e6-2527af145ef4 0.0 \n", "0006265e-9eee-499b-955f-fc2cf2bae8fe 0.0 \n", "\n", "annotator 2019-10-16 08:48:05.667258 \\\n", "tagged_token_id \n", "000117f3-f28b-4e90-b87a-5cca1e913749 0.0 \n", "0001915f-3893-4866-a218-ebe4419d8661 0.0 \n", "0001ea06-01f8-4996-afd1-d5eb878909b0 0.0 \n", "0005bd41-a479-496c-a4e6-2527af145ef4 0.0 \n", "0006265e-9eee-499b-955f-fc2cf2bae8fe 0.0 \n", "\n", "annotator 2019-10-16 08:48:05.681504 \\\n", "tagged_token_id \n", "000117f3-f28b-4e90-b87a-5cca1e913749 0.0 \n", "0001915f-3893-4866-a218-ebe4419d8661 0.0 \n", "0001ea06-01f8-4996-afd1-d5eb878909b0 0.0 \n", "0005bd41-a479-496c-a4e6-2527af145ef4 1.0 \n", "0006265e-9eee-499b-955f-fc2cf2bae8fe 0.0 \n", "\n", "annotator 2019-10-16 08:48:05.696058 tal \n", "tagged_token_id \n", "000117f3-f28b-4e90-b87a-5cca1e913749 0.0 0.0 \n", "0001915f-3893-4866-a218-ebe4419d8661 0.0 0.0 \n", "0001ea06-01f8-4996-afd1-d5eb878909b0 0.0 1.0 \n", "0005bd41-a479-496c-a4e6-2527af145ef4 0.0 0.0 \n", "0006265e-9eee-499b-955f-fc2cf2bae8fe 0.0 0.0 " ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "IAAPivot = IAAData.pivot_table(index='tagged_token_id',columns='annotator',values='timestamp',aggfunc=len).fillna(0)\n", "IAAPivot.head()" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
annotator2019-10-16 08:48:05.5398662019-10-16 08:48:05.5771562019-10-16 08:48:05.5928112019-10-16 08:48:05.6075802019-10-16 08:48:05.6219632019-10-16 08:48:05.6374602019-10-16 08:48:05.6522772019-10-16 08:48:05.6672582019-10-16 08:48:05.6815042019-10-16 08:48:05.696058tal
annotator
2019-10-16 08:48:05.5398663529.016.07.015.015.014.03.019.019.016.017.0
2019-10-16 08:48:05.57715616.03941.022.015.013.013.08.021.021.010.021.0
2019-10-16 08:48:05.5928117.022.03816.07.011.016.014.018.013.020.015.0
2019-10-16 08:48:05.60758015.015.07.03310.014.011.014.014.010.08.012.0
2019-10-16 08:48:05.62196315.013.011.014.03584.011.09.015.029.011.020.0
\n", "
" ], "text/plain": [ "annotator 2019-10-16 08:48:05.539866 \\\n", "annotator \n", "2019-10-16 08:48:05.539866 3529.0 \n", "2019-10-16 08:48:05.577156 16.0 \n", "2019-10-16 08:48:05.592811 7.0 \n", "2019-10-16 08:48:05.607580 15.0 \n", "2019-10-16 08:48:05.621963 15.0 \n", "\n", "annotator 2019-10-16 08:48:05.577156 \\\n", "annotator \n", "2019-10-16 08:48:05.539866 16.0 \n", "2019-10-16 08:48:05.577156 3941.0 \n", "2019-10-16 08:48:05.592811 22.0 \n", "2019-10-16 08:48:05.607580 15.0 \n", "2019-10-16 08:48:05.621963 13.0 \n", "\n", "annotator 2019-10-16 08:48:05.592811 \\\n", "annotator \n", "2019-10-16 08:48:05.539866 7.0 \n", "2019-10-16 08:48:05.577156 22.0 \n", "2019-10-16 08:48:05.592811 3816.0 \n", "2019-10-16 08:48:05.607580 7.0 \n", "2019-10-16 08:48:05.621963 11.0 \n", "\n", "annotator 2019-10-16 08:48:05.607580 \\\n", "annotator \n", "2019-10-16 08:48:05.539866 15.0 \n", "2019-10-16 08:48:05.577156 15.0 \n", "2019-10-16 08:48:05.592811 7.0 \n", "2019-10-16 08:48:05.607580 3310.0 \n", "2019-10-16 08:48:05.621963 14.0 \n", "\n", "annotator 2019-10-16 08:48:05.621963 \\\n", "annotator \n", "2019-10-16 08:48:05.539866 15.0 \n", "2019-10-16 08:48:05.577156 13.0 \n", "2019-10-16 08:48:05.592811 11.0 \n", "2019-10-16 08:48:05.607580 14.0 \n", "2019-10-16 08:48:05.621963 3584.0 \n", "\n", "annotator 2019-10-16 08:48:05.637460 \\\n", "annotator \n", "2019-10-16 08:48:05.539866 14.0 \n", "2019-10-16 08:48:05.577156 13.0 \n", "2019-10-16 08:48:05.592811 16.0 \n", "2019-10-16 08:48:05.607580 11.0 \n", "2019-10-16 08:48:05.621963 11.0 \n", "\n", "annotator 2019-10-16 08:48:05.652277 \\\n", "annotator \n", "2019-10-16 08:48:05.539866 3.0 \n", "2019-10-16 08:48:05.577156 8.0 \n", "2019-10-16 08:48:05.592811 14.0 \n", "2019-10-16 08:48:05.607580 14.0 \n", "2019-10-16 08:48:05.621963 9.0 \n", "\n", "annotator 2019-10-16 08:48:05.667258 \\\n", "annotator \n", "2019-10-16 08:48:05.539866 19.0 \n", "2019-10-16 08:48:05.577156 21.0 \n", "2019-10-16 08:48:05.592811 18.0 \n", "2019-10-16 08:48:05.607580 14.0 \n", "2019-10-16 08:48:05.621963 15.0 \n", "\n", "annotator 2019-10-16 08:48:05.681504 \\\n", "annotator \n", "2019-10-16 08:48:05.539866 19.0 \n", "2019-10-16 08:48:05.577156 21.0 \n", "2019-10-16 08:48:05.592811 13.0 \n", "2019-10-16 08:48:05.607580 10.0 \n", "2019-10-16 08:48:05.621963 29.0 \n", "\n", "annotator 2019-10-16 08:48:05.696058 tal \n", "annotator \n", "2019-10-16 08:48:05.539866 16.0 17.0 \n", "2019-10-16 08:48:05.577156 10.0 21.0 \n", "2019-10-16 08:48:05.592811 20.0 15.0 \n", "2019-10-16 08:48:05.607580 8.0 12.0 \n", "2019-10-16 08:48:05.621963 11.0 20.0 " ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\n", "AggrementCount = IAAPivot.T.dot(IAAPivot) #This dot product gives you the count of times two annotators agreed\n", "AggrementCount.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Getting the Agreements for each annotation" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "tagged_token_id\n", "000117f3-f28b-4e90-b87a-5cca1e913749 1\n", "0001915f-3893-4866-a218-ebe4419d8661 1\n", "0001ea06-01f8-4996-afd1-d5eb878909b0 1\n", "0005bd41-a479-496c-a4e6-2527af145ef4 1\n", "0006265e-9eee-499b-955f-fc2cf2bae8fe 1\n", "Name: annotator, dtype: int64" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "TokenAgreement = IAAData.groupby('tagged_token_id').annotator.count()\n", "TokenAgreement.head()" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1 39664\n", "2 785\n", "3 10\n", "Name: annotator, dtype: int64" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# How often did we have an agreement of 1,2,3...\n", "TokenAgreement.value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Get The Annotations that had exactly 3 person agreement\n", "\n" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
annotatorannotator_idtimestamptagged_token_idexample_idtagvalue
22522019-10-16 08:48:05.681504102019-10-24T13:35:16.789966+00:00fe6547d0-e207-4aee-9ea3-64d305ef2deab863213d-98d5-4d20-9525-4fc7c9448e2dDATEin
22532019-10-16 08:48:05.66725892019-10-24T13:35:15.935764+00:00fe6547d0-e207-4aee-9ea3-64d305ef2deab863213d-98d5-4d20-9525-4fc7c9448e2dDATEin
2254tal12019-10-24T13:35:15.254039+00:00fe6547d0-e207-4aee-9ea3-64d305ef2deab863213d-98d5-4d20-9525-4fc7c9448e2dDATEin
24432019-10-16 08:48:05.65227782019-10-24T13:35:01.552501+00:00c0b30335-604f-4dfb-b671-fdc999b9e67fbbd0eefb-feee-40cd-8e34-54519378f146DATErevised
24442019-10-16 08:48:05.63746072019-10-24T13:35:01.060189+00:00c0b30335-604f-4dfb-b671-fdc999b9e67fbbd0eefb-feee-40cd-8e34-54519378f146DATErevised
\n", "
" ], "text/plain": [ " annotator annotator_id \\\n", "2252 2019-10-16 08:48:05.681504 10 \n", "2253 2019-10-16 08:48:05.667258 9 \n", "2254 tal 1 \n", "2443 2019-10-16 08:48:05.652277 8 \n", "2444 2019-10-16 08:48:05.637460 7 \n", "\n", " timestamp tagged_token_id \\\n", "2252 2019-10-24T13:35:16.789966+00:00 fe6547d0-e207-4aee-9ea3-64d305ef2dea \n", "2253 2019-10-24T13:35:15.935764+00:00 fe6547d0-e207-4aee-9ea3-64d305ef2dea \n", "2254 2019-10-24T13:35:15.254039+00:00 fe6547d0-e207-4aee-9ea3-64d305ef2dea \n", "2443 2019-10-24T13:35:01.552501+00:00 c0b30335-604f-4dfb-b671-fdc999b9e67f \n", "2444 2019-10-24T13:35:01.060189+00:00 c0b30335-604f-4dfb-b671-fdc999b9e67f \n", "\n", " example_id tag value \n", "2252 b863213d-98d5-4d20-9525-4fc7c9448e2d DATE in \n", "2253 b863213d-98d5-4d20-9525-4fc7c9448e2d DATE in \n", "2254 b863213d-98d5-4d20-9525-4fc7c9448e2d DATE in \n", "2443 bbd0eefb-feee-40cd-8e34-54519378f146 DATE revised \n", "2444 bbd0eefb-feee-40cd-8e34-54519378f146 DATE revised " ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Agreed = IAAData.groupby('tagged_token_id').filter(lambda x: len(x)==3)\n", "Agreed.head()" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
timestampexample_idtagvalue
tagged_token_idannotator_idannotator
1d5053e0-da64-4ba0-aac9-d125fe8259831tal2019-10-24T13:30:45.360454+00:00edd6e7a7-58cb-4c38-895a-1bf5eba7dac9DATEtypes
82019-10-16 08:48:05.6522772019-10-24T13:30:43.706208+00:00edd6e7a7-58cb-4c38-895a-1bf5eba7dac9DATEtypes
102019-10-16 08:48:05.6815042019-10-24T13:30:44.883886+00:00edd6e7a7-58cb-4c38-895a-1bf5eba7dac9DATEtypes
629c905c-1126-491f-a030-f1b25bfad6bf1tal2019-10-24T13:34:32.025165+00:00f0cc12da-9736-4d81-a1b7-7359e5988b9aTIMEsafeguard
82019-10-16 08:48:05.6522772019-10-24T13:34:31.722251+00:00f0cc12da-9736-4d81-a1b7-7359e5988b9aTIMEsafeguard
\n", "
" ], "text/plain": [ " timestamp \\\n", "tagged_token_id annotator_id annotator \n", "1d5053e0-da64-4ba0-aac9-d125fe825983 1 tal 2019-10-24T13:30:45.360454+00:00 \n", " 8 2019-10-16 08:48:05.652277 2019-10-24T13:30:43.706208+00:00 \n", " 10 2019-10-16 08:48:05.681504 2019-10-24T13:30:44.883886+00:00 \n", "629c905c-1126-491f-a030-f1b25bfad6bf 1 tal 2019-10-24T13:34:32.025165+00:00 \n", " 8 2019-10-16 08:48:05.652277 2019-10-24T13:34:31.722251+00:00 \n", "\n", " example_id \\\n", "tagged_token_id annotator_id annotator \n", "1d5053e0-da64-4ba0-aac9-d125fe825983 1 tal edd6e7a7-58cb-4c38-895a-1bf5eba7dac9 \n", " 8 2019-10-16 08:48:05.652277 edd6e7a7-58cb-4c38-895a-1bf5eba7dac9 \n", " 10 2019-10-16 08:48:05.681504 edd6e7a7-58cb-4c38-895a-1bf5eba7dac9 \n", "629c905c-1126-491f-a030-f1b25bfad6bf 1 tal f0cc12da-9736-4d81-a1b7-7359e5988b9a \n", " 8 2019-10-16 08:48:05.652277 f0cc12da-9736-4d81-a1b7-7359e5988b9a \n", "\n", " tag \\\n", "tagged_token_id annotator_id annotator \n", "1d5053e0-da64-4ba0-aac9-d125fe825983 1 tal DATE \n", " 8 2019-10-16 08:48:05.652277 DATE \n", " 10 2019-10-16 08:48:05.681504 DATE \n", "629c905c-1126-491f-a030-f1b25bfad6bf 1 tal TIME \n", " 8 2019-10-16 08:48:05.652277 TIME \n", "\n", " value \n", "tagged_token_id annotator_id annotator \n", "1d5053e0-da64-4ba0-aac9-d125fe825983 1 tal types \n", " 8 2019-10-16 08:48:05.652277 types \n", " 10 2019-10-16 08:48:05.681504 types \n", "629c905c-1126-491f-a030-f1b25bfad6bf 1 tal safeguard \n", " 8 2019-10-16 08:48:05.652277 safeguard " ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Nicer display for excel / reporting\n", "Agreed.set_index(['tagged_token_id','annotator_id','annotator']).sort_index().head()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Classifications\n", "Classifications are statements about the entire document. They live in the **classifications** object inside of an example. Each classification has the following properties: \n", "* **definition_id** The id of the task definition that the classification was made in\n", "* **example_id** The id of the example that was classified\n", "* **classname** The class that was applied\n", "* **class_id** The id of the class that was applied\n", "* **classified_by** A list of the annotators that made this classification" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'definition_id': '2f660f4d-aa9f-4d92-8d4f-b2163c847016',\n", " 'example_id': 'a954d097-c61c-4192-bcea-09121c7f2d07',\n", " 'classname': 'Kendra Ritter',\n", " 'class_id': '6b193f1c-1267-4f4f-b576-c0f186fbf779',\n", " 'classified_by': [{'annotator_id': 4,\n", " 'timestamp': None,\n", " 'annotator': '2019-10-16 08:48:05.592811'},\n", " {'annotator_id': 6,\n", " 'timestamp': None,\n", " 'annotator': '2019-10-16 08:48:05.621963'}]}" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Take all the classifications from all of the examples \n", "all_classifications = sum(map(lambda x:x['classifications'],examples),[])\n", "all_classifications[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Classification Analytics\n", "The following show common queries you might want to run on your classifications" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
annotatorannotator_idtimestampexample_idclassname
02019-10-16 08:48:05.5928114Nonea954d097-c61c-4192-bcea-09121c7f2d07Kendra Ritter
12019-10-16 08:48:05.6219636Nonea954d097-c61c-4192-bcea-09121c7f2d07Kendra Ritter
22019-10-16 08:48:05.6522778Nonea954d097-c61c-4192-bcea-09121c7f2d07Kelly Patterson
32019-10-16 08:48:05.69605811Noneab911160-25ea-421b-bfef-4a8802efad05Kendra Ritter
42019-10-16 08:48:05.5928114Noneab911160-25ea-421b-bfef-4a8802efad05Kelly Patterson
\n", "
" ], "text/plain": [ " annotator annotator_id timestamp \\\n", "0 2019-10-16 08:48:05.592811 4 None \n", "1 2019-10-16 08:48:05.621963 6 None \n", "2 2019-10-16 08:48:05.652277 8 None \n", "3 2019-10-16 08:48:05.696058 11 None \n", "4 2019-10-16 08:48:05.592811 4 None \n", "\n", " example_id classname \n", "0 a954d097-c61c-4192-bcea-09121c7f2d07 Kendra Ritter \n", "1 a954d097-c61c-4192-bcea-09121c7f2d07 Kendra Ritter \n", "2 a954d097-c61c-4192-bcea-09121c7f2d07 Kelly Patterson \n", "3 ab911160-25ea-421b-bfef-4a8802efad05 Kendra Ritter \n", "4 ab911160-25ea-421b-bfef-4a8802efad05 Kelly Patterson " ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Classifications = pd.io.json.json_normalize(all_classifications,meta=['example_id','classname'],record_path='classified_by')\n", "Classifications.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Show the classifications for each example\n", "Get a table of each example_id and count how many times it was classified as each class\n" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
classnameBrittany FisherKelly PattersonKendra RitterThomas WoodwardValerie Delgado
example_id
a8b3e83b-226e-4e5e-ab37-977291abbfd90.00.01.00.00.0
a8e70531-f6ee-4b43-a2d8-9cf3c7e789050.00.02.01.00.0
a91ea8ca-6a16-468d-8ceb-3b74d8c8961a0.00.01.02.00.0
a938eb3a-9b09-47d3-8d5b-f91314f564870.01.02.00.00.0
a940e3e9-e919-45f7-9d5e-fd3c0300a2330.02.00.01.00.0
\n", "
" ], "text/plain": [ "classname Brittany Fisher Kelly Patterson \\\n", "example_id \n", "a8b3e83b-226e-4e5e-ab37-977291abbfd9 0.0 0.0 \n", "a8e70531-f6ee-4b43-a2d8-9cf3c7e78905 0.0 0.0 \n", "a91ea8ca-6a16-468d-8ceb-3b74d8c8961a 0.0 0.0 \n", "a938eb3a-9b09-47d3-8d5b-f91314f56487 0.0 1.0 \n", "a940e3e9-e919-45f7-9d5e-fd3c0300a233 0.0 2.0 \n", "\n", "classname Kendra Ritter Thomas Woodward \\\n", "example_id \n", "a8b3e83b-226e-4e5e-ab37-977291abbfd9 1.0 0.0 \n", "a8e70531-f6ee-4b43-a2d8-9cf3c7e78905 2.0 1.0 \n", "a91ea8ca-6a16-468d-8ceb-3b74d8c8961a 1.0 2.0 \n", "a938eb3a-9b09-47d3-8d5b-f91314f56487 2.0 0.0 \n", "a940e3e9-e919-45f7-9d5e-fd3c0300a233 0.0 1.0 \n", "\n", "classname Valerie Delgado \n", "example_id \n", "a8b3e83b-226e-4e5e-ab37-977291abbfd9 0.0 \n", "a8e70531-f6ee-4b43-a2d8-9cf3c7e78905 0.0 \n", "a91ea8ca-6a16-468d-8ceb-3b74d8c8961a 0.0 \n", "a938eb3a-9b09-47d3-8d5b-f91314f56487 0.0 \n", "a940e3e9-e919-45f7-9d5e-fd3c0300a233 0.0 " ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ClassPivot = Classifications.pivot_table(index='example_id',columns='classname',values='annotator_id',aggfunc=len).fillna(0)\n", "ClassPivot.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Calculate the co-occorence of classes\n", "It's often useful to see if some pair of classes is frequently confused. This is a quick way to do so: " ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
classnameBrittany FisherKelly PattersonKendra RitterThomas WoodwardValerie Delgado
classname
Brittany Fisher269.0110.0111.081.044.0
Kelly Patterson110.0620.0301.0183.052.0
Kendra Ritter111.0301.0920.0273.091.0
Thomas Woodward81.0183.0273.0541.058.0
Valerie Delgado44.052.091.058.0150.0
\n", "
" ], "text/plain": [ "classname Brittany Fisher Kelly Patterson Kendra Ritter \\\n", "classname \n", "Brittany Fisher 269.0 110.0 111.0 \n", "Kelly Patterson 110.0 620.0 301.0 \n", "Kendra Ritter 111.0 301.0 920.0 \n", "Thomas Woodward 81.0 183.0 273.0 \n", "Valerie Delgado 44.0 52.0 91.0 \n", "\n", "classname Thomas Woodward Valerie Delgado \n", "classname \n", "Brittany Fisher 81.0 44.0 \n", "Kelly Patterson 183.0 52.0 \n", "Kendra Ritter 273.0 91.0 \n", "Thomas Woodward 541.0 58.0 \n", "Valerie Delgado 58.0 150.0 " ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Confusion = ClassPivot.T.dot(ClassPivot)\n", "Confusion" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Joing Classifications to the original document" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
metadataBrittany FisherKelly PattersonKendra RitterThomas WoodwardValerie Delgado
example_idcontent
ffa438d9-a4f9-4548-998b-11c0bf1add66This action, pursuant to 5 U.S.C. 553, amends regulations issued to carry out a marketing order as defined in 7 CFR 900.2(j). This final rule is issued under Marketing Order No. 985 (7 CFR part 985), as amended, regulating the handling of spearmint oil produced in the Far West (Washington, Idaho, Oregon, and designated parts of Nevada and Utah). Part 985 (referred to as “the Order”) is effective under the Agricultural Marketing Agreement Act of 1937, as amended (7 U.S.C. 601-674), hereinafter referred to as the “Act.” The Committee locally administers the Marketing Order and is comprised of spearmint oil producers operating within the area of production, and a public member.{'end': 445, 'key': 'agreement', 'type': 'rule...0.01.01.01.00.0
ffac4212-8328-4dfb-8434-952fb4643dd0A recovery plan for Astragalus desereticus was not prepared; therefore, specific delisting criteria were not developed for the species. However, in 2005, we invited agencies with management or ownership authorities within the species' habitat to serve on a team to develop an interagency conservation agreement for Astragalus desereticus intended to facilitate a coordinated conservation effort between the agencies (UDWR et al. 2006, entire). The Conservation Agreement for Deseret milkvetch (Astragalus desereticus) (Conservation Agreement) was signed and approved by UDWR, UDOT, SITLA, and the Service in 2006 and will remain in effect for 30 years. The Conservation Agreement provides guidance to stakeholders to address threats and establish goals to ensure long-term survival of the species (UDWR et al. 2006, p. 7). Conservation actions contained in the Conservation Agreement (in italics), efforts to accomplish these actions, and their current status are described below.{'end': 470, 'key': 'agreement', 'type': 'pror...1.01.01.00.00.0
\n", "
" ], "text/plain": [ " metadata \\\n", "example_id content \n", "ffa438d9-a4f9-4548-998b-11c0bf1add66 This action, pursuant to 5 U.S.C. 553, amends r... {'end': 445, 'key': 'agreement', 'type': 'rule... \n", "ffac4212-8328-4dfb-8434-952fb4643dd0 A recovery plan for Astragalus desereticus was ... {'end': 470, 'key': 'agreement', 'type': 'pror... \n", "\n", " Brittany Fisher \\\n", "example_id content \n", "ffa438d9-a4f9-4548-998b-11c0bf1add66 This action, pursuant to 5 U.S.C. 553, amends r... 0.0 \n", "ffac4212-8328-4dfb-8434-952fb4643dd0 A recovery plan for Astragalus desereticus was ... 1.0 \n", "\n", " Kelly Patterson \\\n", "example_id content \n", "ffa438d9-a4f9-4548-998b-11c0bf1add66 This action, pursuant to 5 U.S.C. 553, amends r... 1.0 \n", "ffac4212-8328-4dfb-8434-952fb4643dd0 A recovery plan for Astragalus desereticus was ... 1.0 \n", "\n", " Kendra Ritter \\\n", "example_id content \n", "ffa438d9-a4f9-4548-998b-11c0bf1add66 This action, pursuant to 5 U.S.C. 553, amends r... 1.0 \n", "ffac4212-8328-4dfb-8434-952fb4643dd0 A recovery plan for Astragalus desereticus was ... 1.0 \n", "\n", " Thomas Woodward \\\n", "example_id content \n", "ffa438d9-a4f9-4548-998b-11c0bf1add66 This action, pursuant to 5 U.S.C. 553, amends r... 1.0 \n", "ffac4212-8328-4dfb-8434-952fb4643dd0 A recovery plan for Astragalus desereticus was ... 0.0 \n", "\n", " Valerie Delgado \n", "example_id content \n", "ffa438d9-a4f9-4548-998b-11c0bf1add66 This action, pursuant to 5 U.S.C. 553, amends r... 0.0 \n", "ffac4212-8328-4dfb-8434-952fb4643dd0 A recovery plan for Astragalus desereticus was ... 0.0 " ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Examples = pd.DataFrame(examples,columns=['content','metadata','example_id'])\n", "ExamplesWithClasses = pd.merge(Examples,ClassPivot,on='example_id')\n", "ExamplesWithClasses = ExamplesWithClasses.set_index(['example_id','content']).sort_index()\n", "ExamplesWithClasses.tail(2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Check how frequently each annotator uses each class\n", "It's often useful to see if a particular annotator is biased to a particular class. You can do so like this: " ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1704" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
classname
annotatorclassname
2019-10-16 08:48:05.539866Kelly Patterson0.335484
Kendra Ritter0.277419
Thomas Woodward0.180645
Brittany Fisher0.129032
Valerie Delgado0.077419
\n", "
" ], "text/plain": [ " classname\n", "annotator classname \n", "2019-10-16 08:48:05.539866 Kelly Patterson 0.335484\n", " Kendra Ritter 0.277419\n", " Thomas Woodward 0.180645\n", " Brittany Fisher 0.129032\n", " Valerie Delgado 0.077419" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Classifications.groupby('annotator').classname.value_counts(normalize=True).to_frame().head(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Relations\n", "If you've had your team create relationships you can get those results from the relations object. This is external \n", "to the Examples object because relations can span across multiple examples (Such as in co-reference resolution). The output of the relations object is the individual entities in a relationship, with a link to their parent, a list of their children and metadata: \n", "\n", "* **id** The unique identifier of this relation node\n", "* **materialized_path** A unique identifer of this node, typically used when you load this data into a [relational database](https://gabi.dev/tag/materialized-path/)\n", "* **parent_id** The id of this nodes parent in the relationship\n", "* **children** A list of the ids of the children of this node\n", "* **type** What kind of node is this ? *token* if this was made from an entity *pseudo_node* if it was made from a Pseudo Node (such as a non terminal like NP) \n", "* **relation_type** The type annotation on the edge of the relationship. E.g. how does this node relate to it's parent\n", "* **annotator_id** The id of the annotator that made this relation\n", "* **annotator** The name of the annotator\n", "* **tagged_token_id** If this node came from an annotated entity, this is the unique id of the Tagged Token (as described in the Annotation section) \n" ] }, { "cell_type": "code", "execution_count": 137, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
annotatorannotator_idchildrenidmaterialized_pathparent_idpseudo_node_typerelation_typetagged_token_idtype
0tal1[0c11cbf1-70d8-4472-9504-c186013cf6f2]501e60b3-7572-48eb-8caf-0a73bbe52165501e60b3-7572-48eb-8caf-0a73bbe52165NoneNoneNoneae5de32e-11d0-4ba3-ad6d-24893095b186token
1tal1[42c4cf4d-1306-4e13-a649-72758df50266, b031c51...0c11cbf1-70d8-4472-9504-c186013cf6f2501e60b3-7572-48eb-8caf-0a73bbe52165/0c11cbf1-...501e60b3-7572-48eb-8caf-0a73bbe52165NoneNone97701568-648f-4bf5-b6e1-649a36b4da76token
\n", "
" ], "text/plain": [ " annotator annotator_id children \\\n", "0 tal 1 [0c11cbf1-70d8-4472-9504-c186013cf6f2] \n", "1 tal 1 [42c4cf4d-1306-4e13-a649-72758df50266, b031c51... \n", "\n", " id \\\n", "0 501e60b3-7572-48eb-8caf-0a73bbe52165 \n", "1 0c11cbf1-70d8-4472-9504-c186013cf6f2 \n", "\n", " materialized_path \\\n", "0 501e60b3-7572-48eb-8caf-0a73bbe52165 \n", "1 501e60b3-7572-48eb-8caf-0a73bbe52165/0c11cbf1-... \n", "\n", " parent_id pseudo_node_type relation_type \\\n", "0 None None None \n", "1 501e60b3-7572-48eb-8caf-0a73bbe52165 None None \n", "\n", " tagged_token_id type \n", "0 ae5de32e-11d0-4ba3-ad6d-24893095b186 token \n", "1 97701568-648f-4bf5-b6e1-649a36b4da76 token " ] }, "execution_count": 137, "metadata": {}, "output_type": "execute_result" } ], "source": [ "relations = results['relations']\n", "Relations = pd.DataFrame(relations)\n", "Relations[:2]\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Relation Analytics" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Adding data from Annotations\n", "You'll typically want the annotation data that comprised this node. Get it by joining on the Annotations" ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
annotatorannotator_idchildrenidmaterialized_pathparent_idpseudo_node_typerelation_typetagged_token_idtypeexample_idstartendvaluetag
0tal1[0c11cbf1-70d8-4472-9504-c186013cf6f2]501e60b3-7572-48eb-8caf-0a73bbe52165501e60b3-7572-48eb-8caf-0a73bbe52165NoneNoneNoneae5de32e-11d0-4ba3-ad6d-24893095b186tokenfeb0438c-4a27-4e0f-877c-686e65728d88126135appointedTITLE
1tal1[42c4cf4d-1306-4e13-a649-72758df50266, b031c51...0c11cbf1-70d8-4472-9504-c186013cf6f2501e60b3-7572-48eb-8caf-0a73bbe52165/0c11cbf1-...501e60b3-7572-48eb-8caf-0a73bbe52165NoneNone97701568-648f-4bf5-b6e1-649a36b4da76tokenfeb0438c-4a27-4e0f-877c-686e65728d883744companyTITLE
2tal1[]42c4cf4d-1306-4e13-a649-72758df50266501e60b3-7572-48eb-8caf-0a73bbe52165/0c11cbf1-...0c11cbf1-70d8-4472-9504-c186013cf6f2NoneNone0bff51a0-802d-4055-aa5f-e098a73854dctokenfeb0438c-4a27-4e0f-877c-686e65728d886776successorMONEY
3tal1[]b031c51e-e378-41c8-ba7c-274c52683ba6501e60b3-7572-48eb-8caf-0a73bbe52165/0c11cbf1-...0c11cbf1-70d8-4472-9504-c186013cf6f2NoneNonea7cd1d90-fe36-40c6-81c2-40b8e710b4d7tokenfeb0438c-4a27-4e0f-877c-686e65728d881318meansTITLE
\n", "
" ], "text/plain": [ " annotator annotator_id children \\\n", "0 tal 1 [0c11cbf1-70d8-4472-9504-c186013cf6f2] \n", "1 tal 1 [42c4cf4d-1306-4e13-a649-72758df50266, b031c51... \n", "2 tal 1 [] \n", "3 tal 1 [] \n", "\n", " id \\\n", "0 501e60b3-7572-48eb-8caf-0a73bbe52165 \n", "1 0c11cbf1-70d8-4472-9504-c186013cf6f2 \n", "2 42c4cf4d-1306-4e13-a649-72758df50266 \n", "3 b031c51e-e378-41c8-ba7c-274c52683ba6 \n", "\n", " materialized_path \\\n", "0 501e60b3-7572-48eb-8caf-0a73bbe52165 \n", "1 501e60b3-7572-48eb-8caf-0a73bbe52165/0c11cbf1-... \n", "2 501e60b3-7572-48eb-8caf-0a73bbe52165/0c11cbf1-... \n", "3 501e60b3-7572-48eb-8caf-0a73bbe52165/0c11cbf1-... \n", "\n", " parent_id pseudo_node_type relation_type \\\n", "0 None None None \n", "1 501e60b3-7572-48eb-8caf-0a73bbe52165 None None \n", "2 0c11cbf1-70d8-4472-9504-c186013cf6f2 None None \n", "3 0c11cbf1-70d8-4472-9504-c186013cf6f2 None None \n", "\n", " tagged_token_id type \\\n", "0 ae5de32e-11d0-4ba3-ad6d-24893095b186 token \n", "1 97701568-648f-4bf5-b6e1-649a36b4da76 token \n", "2 0bff51a0-802d-4055-aa5f-e098a73854dc token \n", "3 a7cd1d90-fe36-40c6-81c2-40b8e710b4d7 token \n", "\n", " example_id start end value tag \n", "0 feb0438c-4a27-4e0f-877c-686e65728d88 126 135 appointed TITLE \n", "1 feb0438c-4a27-4e0f-877c-686e65728d88 37 44 company TITLE \n", "2 feb0438c-4a27-4e0f-877c-686e65728d88 67 76 successor MONEY \n", "3 feb0438c-4a27-4e0f-877c-686e65728d88 13 18 means TITLE " ] }, "execution_count": 66, "metadata": {}, "output_type": "execute_result" } ], "source": [ "AnnotationsInfo = Annotations[['tagged_token_id','example_id','start','end','value','tag']]\n", "RelationNodesWithAnnotation = pd.merge(Relations,AnnotationsInfo,on='tagged_token_id')\n", "RelationNodesWithAnnotation.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Creating a Parent Child Table " ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
annotator_parentannotator_id_parentchildren_parentmaterialized_path_parentparent_id_parentpseudo_node_type_parentrelation_type_parenttagged_token_id_parenttype_parentexample_id_parent...children_childmaterialized_path_childparent_id_childpseudo_node_type_childrelation_type_childtagged_token_id_childtype_childexample_id_childstart_childend_child
id_parentvalue_parenttag_parentid_childvalue_childtag_child
0c11cbf1-70d8-4472-9504-c186013cf6f2companyTITLE42c4cf4d-1306-4e13-a649-72758df50266successorMONEYtal1[42c4cf4d-1306-4e13-a649-72758df50266, b031c51...501e60b3-7572-48eb-8caf-0a73bbe52165/0c11cbf1-...501e60b3-7572-48eb-8caf-0a73bbe52165NoneNone97701568-648f-4bf5-b6e1-649a36b4da76tokenfeb0438c-4a27-4e0f-877c-686e65728d88...[]501e60b3-7572-48eb-8caf-0a73bbe52165/0c11cbf1-...0c11cbf1-70d8-4472-9504-c186013cf6f2NoneNone0bff51a0-802d-4055-aa5f-e098a73854dctokenfeb0438c-4a27-4e0f-877c-686e65728d886776
b031c51e-e378-41c8-ba7c-274c52683ba6meansTITLEtal1[42c4cf4d-1306-4e13-a649-72758df50266, b031c51...501e60b3-7572-48eb-8caf-0a73bbe52165/0c11cbf1-...501e60b3-7572-48eb-8caf-0a73bbe52165NoneNone97701568-648f-4bf5-b6e1-649a36b4da76tokenfeb0438c-4a27-4e0f-877c-686e65728d88...[]501e60b3-7572-48eb-8caf-0a73bbe52165/0c11cbf1-...0c11cbf1-70d8-4472-9504-c186013cf6f2NoneNonea7cd1d90-fe36-40c6-81c2-40b8e710b4d7tokenfeb0438c-4a27-4e0f-877c-686e65728d881318
501e60b3-7572-48eb-8caf-0a73bbe52165appointedTITLE0c11cbf1-70d8-4472-9504-c186013cf6f2companyTITLEtal1[0c11cbf1-70d8-4472-9504-c186013cf6f2]501e60b3-7572-48eb-8caf-0a73bbe52165NoneNoneNoneae5de32e-11d0-4ba3-ad6d-24893095b186tokenfeb0438c-4a27-4e0f-877c-686e65728d88...[42c4cf4d-1306-4e13-a649-72758df50266, b031c51...501e60b3-7572-48eb-8caf-0a73bbe52165/0c11cbf1-...501e60b3-7572-48eb-8caf-0a73bbe52165NoneNone97701568-648f-4bf5-b6e1-649a36b4da76tokenfeb0438c-4a27-4e0f-877c-686e65728d883744
\n", "

3 rows × 24 columns

\n", "
" ], "text/plain": [ " annotator_parent \\\n", "id_parent value_parent tag_parent id_child value_child tag_child \n", "0c11cbf1-70d8-4472-9504-c186013cf6f2 company TITLE 42c4cf4d-1306-4e13-a649-72758df50266 successor MONEY tal \n", " b031c51e-e378-41c8-ba7c-274c52683ba6 means TITLE tal \n", "501e60b3-7572-48eb-8caf-0a73bbe52165 appointed TITLE 0c11cbf1-70d8-4472-9504-c186013cf6f2 company TITLE tal \n", "\n", " annotator_id_parent \\\n", "id_parent value_parent tag_parent id_child value_child tag_child \n", "0c11cbf1-70d8-4472-9504-c186013cf6f2 company TITLE 42c4cf4d-1306-4e13-a649-72758df50266 successor MONEY 1 \n", " b031c51e-e378-41c8-ba7c-274c52683ba6 means TITLE 1 \n", "501e60b3-7572-48eb-8caf-0a73bbe52165 appointed TITLE 0c11cbf1-70d8-4472-9504-c186013cf6f2 company TITLE 1 \n", "\n", " children_parent \\\n", "id_parent value_parent tag_parent id_child value_child tag_child \n", "0c11cbf1-70d8-4472-9504-c186013cf6f2 company TITLE 42c4cf4d-1306-4e13-a649-72758df50266 successor MONEY [42c4cf4d-1306-4e13-a649-72758df50266, b031c51... \n", " b031c51e-e378-41c8-ba7c-274c52683ba6 means TITLE [42c4cf4d-1306-4e13-a649-72758df50266, b031c51... \n", "501e60b3-7572-48eb-8caf-0a73bbe52165 appointed TITLE 0c11cbf1-70d8-4472-9504-c186013cf6f2 company TITLE [0c11cbf1-70d8-4472-9504-c186013cf6f2] \n", "\n", " materialized_path_parent \\\n", "id_parent value_parent tag_parent id_child value_child tag_child \n", "0c11cbf1-70d8-4472-9504-c186013cf6f2 company TITLE 42c4cf4d-1306-4e13-a649-72758df50266 successor MONEY 501e60b3-7572-48eb-8caf-0a73bbe52165/0c11cbf1-... \n", " b031c51e-e378-41c8-ba7c-274c52683ba6 means TITLE 501e60b3-7572-48eb-8caf-0a73bbe52165/0c11cbf1-... \n", "501e60b3-7572-48eb-8caf-0a73bbe52165 appointed TITLE 0c11cbf1-70d8-4472-9504-c186013cf6f2 company TITLE 501e60b3-7572-48eb-8caf-0a73bbe52165 \n", "\n", " parent_id_parent \\\n", "id_parent value_parent tag_parent id_child value_child tag_child \n", "0c11cbf1-70d8-4472-9504-c186013cf6f2 company TITLE 42c4cf4d-1306-4e13-a649-72758df50266 successor MONEY 501e60b3-7572-48eb-8caf-0a73bbe52165 \n", " b031c51e-e378-41c8-ba7c-274c52683ba6 means TITLE 501e60b3-7572-48eb-8caf-0a73bbe52165 \n", "501e60b3-7572-48eb-8caf-0a73bbe52165 appointed TITLE 0c11cbf1-70d8-4472-9504-c186013cf6f2 company TITLE None \n", "\n", " pseudo_node_type_parent \\\n", "id_parent value_parent tag_parent id_child value_child tag_child \n", "0c11cbf1-70d8-4472-9504-c186013cf6f2 company TITLE 42c4cf4d-1306-4e13-a649-72758df50266 successor MONEY None \n", " b031c51e-e378-41c8-ba7c-274c52683ba6 means TITLE None \n", "501e60b3-7572-48eb-8caf-0a73bbe52165 appointed TITLE 0c11cbf1-70d8-4472-9504-c186013cf6f2 company TITLE None \n", "\n", " relation_type_parent \\\n", "id_parent value_parent tag_parent id_child value_child tag_child \n", "0c11cbf1-70d8-4472-9504-c186013cf6f2 company TITLE 42c4cf4d-1306-4e13-a649-72758df50266 successor MONEY None \n", " b031c51e-e378-41c8-ba7c-274c52683ba6 means TITLE None \n", "501e60b3-7572-48eb-8caf-0a73bbe52165 appointed TITLE 0c11cbf1-70d8-4472-9504-c186013cf6f2 company TITLE None \n", "\n", " tagged_token_id_parent \\\n", "id_parent value_parent tag_parent id_child value_child tag_child \n", "0c11cbf1-70d8-4472-9504-c186013cf6f2 company TITLE 42c4cf4d-1306-4e13-a649-72758df50266 successor MONEY 97701568-648f-4bf5-b6e1-649a36b4da76 \n", " b031c51e-e378-41c8-ba7c-274c52683ba6 means TITLE 97701568-648f-4bf5-b6e1-649a36b4da76 \n", "501e60b3-7572-48eb-8caf-0a73bbe52165 appointed TITLE 0c11cbf1-70d8-4472-9504-c186013cf6f2 company TITLE ae5de32e-11d0-4ba3-ad6d-24893095b186 \n", "\n", " type_parent \\\n", "id_parent value_parent tag_parent id_child value_child tag_child \n", "0c11cbf1-70d8-4472-9504-c186013cf6f2 company TITLE 42c4cf4d-1306-4e13-a649-72758df50266 successor MONEY token \n", " b031c51e-e378-41c8-ba7c-274c52683ba6 means TITLE token \n", "501e60b3-7572-48eb-8caf-0a73bbe52165 appointed TITLE 0c11cbf1-70d8-4472-9504-c186013cf6f2 company TITLE token \n", "\n", " example_id_parent \\\n", "id_parent value_parent tag_parent id_child value_child tag_child \n", "0c11cbf1-70d8-4472-9504-c186013cf6f2 company TITLE 42c4cf4d-1306-4e13-a649-72758df50266 successor MONEY feb0438c-4a27-4e0f-877c-686e65728d88 \n", " b031c51e-e378-41c8-ba7c-274c52683ba6 means TITLE feb0438c-4a27-4e0f-877c-686e65728d88 \n", "501e60b3-7572-48eb-8caf-0a73bbe52165 appointed TITLE 0c11cbf1-70d8-4472-9504-c186013cf6f2 company TITLE feb0438c-4a27-4e0f-877c-686e65728d88 \n", "\n", " ... \\\n", "id_parent value_parent tag_parent id_child value_child tag_child ... \n", "0c11cbf1-70d8-4472-9504-c186013cf6f2 company TITLE 42c4cf4d-1306-4e13-a649-72758df50266 successor MONEY ... \n", " b031c51e-e378-41c8-ba7c-274c52683ba6 means TITLE ... \n", "501e60b3-7572-48eb-8caf-0a73bbe52165 appointed TITLE 0c11cbf1-70d8-4472-9504-c186013cf6f2 company TITLE ... \n", "\n", " children_child \\\n", "id_parent value_parent tag_parent id_child value_child tag_child \n", "0c11cbf1-70d8-4472-9504-c186013cf6f2 company TITLE 42c4cf4d-1306-4e13-a649-72758df50266 successor MONEY [] \n", " b031c51e-e378-41c8-ba7c-274c52683ba6 means TITLE [] \n", "501e60b3-7572-48eb-8caf-0a73bbe52165 appointed TITLE 0c11cbf1-70d8-4472-9504-c186013cf6f2 company TITLE [42c4cf4d-1306-4e13-a649-72758df50266, b031c51... \n", "\n", " materialized_path_child \\\n", "id_parent value_parent tag_parent id_child value_child tag_child \n", "0c11cbf1-70d8-4472-9504-c186013cf6f2 company TITLE 42c4cf4d-1306-4e13-a649-72758df50266 successor MONEY 501e60b3-7572-48eb-8caf-0a73bbe52165/0c11cbf1-... \n", " b031c51e-e378-41c8-ba7c-274c52683ba6 means TITLE 501e60b3-7572-48eb-8caf-0a73bbe52165/0c11cbf1-... \n", "501e60b3-7572-48eb-8caf-0a73bbe52165 appointed TITLE 0c11cbf1-70d8-4472-9504-c186013cf6f2 company TITLE 501e60b3-7572-48eb-8caf-0a73bbe52165/0c11cbf1-... \n", "\n", " parent_id_child \\\n", "id_parent value_parent tag_parent id_child value_child tag_child \n", "0c11cbf1-70d8-4472-9504-c186013cf6f2 company TITLE 42c4cf4d-1306-4e13-a649-72758df50266 successor MONEY 0c11cbf1-70d8-4472-9504-c186013cf6f2 \n", " b031c51e-e378-41c8-ba7c-274c52683ba6 means TITLE 0c11cbf1-70d8-4472-9504-c186013cf6f2 \n", "501e60b3-7572-48eb-8caf-0a73bbe52165 appointed TITLE 0c11cbf1-70d8-4472-9504-c186013cf6f2 company TITLE 501e60b3-7572-48eb-8caf-0a73bbe52165 \n", "\n", " pseudo_node_type_child \\\n", "id_parent value_parent tag_parent id_child value_child tag_child \n", "0c11cbf1-70d8-4472-9504-c186013cf6f2 company TITLE 42c4cf4d-1306-4e13-a649-72758df50266 successor MONEY None \n", " b031c51e-e378-41c8-ba7c-274c52683ba6 means TITLE None \n", "501e60b3-7572-48eb-8caf-0a73bbe52165 appointed TITLE 0c11cbf1-70d8-4472-9504-c186013cf6f2 company TITLE None \n", "\n", " relation_type_child \\\n", "id_parent value_parent tag_parent id_child value_child tag_child \n", "0c11cbf1-70d8-4472-9504-c186013cf6f2 company TITLE 42c4cf4d-1306-4e13-a649-72758df50266 successor MONEY None \n", " b031c51e-e378-41c8-ba7c-274c52683ba6 means TITLE None \n", "501e60b3-7572-48eb-8caf-0a73bbe52165 appointed TITLE 0c11cbf1-70d8-4472-9504-c186013cf6f2 company TITLE None \n", "\n", " tagged_token_id_child \\\n", "id_parent value_parent tag_parent id_child value_child tag_child \n", "0c11cbf1-70d8-4472-9504-c186013cf6f2 company TITLE 42c4cf4d-1306-4e13-a649-72758df50266 successor MONEY 0bff51a0-802d-4055-aa5f-e098a73854dc \n", " b031c51e-e378-41c8-ba7c-274c52683ba6 means TITLE a7cd1d90-fe36-40c6-81c2-40b8e710b4d7 \n", "501e60b3-7572-48eb-8caf-0a73bbe52165 appointed TITLE 0c11cbf1-70d8-4472-9504-c186013cf6f2 company TITLE 97701568-648f-4bf5-b6e1-649a36b4da76 \n", "\n", " type_child \\\n", "id_parent value_parent tag_parent id_child value_child tag_child \n", "0c11cbf1-70d8-4472-9504-c186013cf6f2 company TITLE 42c4cf4d-1306-4e13-a649-72758df50266 successor MONEY token \n", " b031c51e-e378-41c8-ba7c-274c52683ba6 means TITLE token \n", "501e60b3-7572-48eb-8caf-0a73bbe52165 appointed TITLE 0c11cbf1-70d8-4472-9504-c186013cf6f2 company TITLE token \n", "\n", " example_id_child \\\n", "id_parent value_parent tag_parent id_child value_child tag_child \n", "0c11cbf1-70d8-4472-9504-c186013cf6f2 company TITLE 42c4cf4d-1306-4e13-a649-72758df50266 successor MONEY feb0438c-4a27-4e0f-877c-686e65728d88 \n", " b031c51e-e378-41c8-ba7c-274c52683ba6 means TITLE feb0438c-4a27-4e0f-877c-686e65728d88 \n", "501e60b3-7572-48eb-8caf-0a73bbe52165 appointed TITLE 0c11cbf1-70d8-4472-9504-c186013cf6f2 company TITLE feb0438c-4a27-4e0f-877c-686e65728d88 \n", "\n", " start_child \\\n", "id_parent value_parent tag_parent id_child value_child tag_child \n", "0c11cbf1-70d8-4472-9504-c186013cf6f2 company TITLE 42c4cf4d-1306-4e13-a649-72758df50266 successor MONEY 67 \n", " b031c51e-e378-41c8-ba7c-274c52683ba6 means TITLE 13 \n", "501e60b3-7572-48eb-8caf-0a73bbe52165 appointed TITLE 0c11cbf1-70d8-4472-9504-c186013cf6f2 company TITLE 37 \n", "\n", " end_child \n", "id_parent value_parent tag_parent id_child value_child tag_child \n", "0c11cbf1-70d8-4472-9504-c186013cf6f2 company TITLE 42c4cf4d-1306-4e13-a649-72758df50266 successor MONEY 76 \n", " b031c51e-e378-41c8-ba7c-274c52683ba6 means TITLE 18 \n", "501e60b3-7572-48eb-8caf-0a73bbe52165 appointed TITLE 0c11cbf1-70d8-4472-9504-c186013cf6f2 company TITLE 44 \n", "\n", "[3 rows x 24 columns]" ] }, "execution_count": 67, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ParentChild = pd.merge(RelationNodesWithAnnotation,RelationNodesWithAnnotation,\n", " left_on='id',right_on='parent_id',how='inner',suffixes=['_parent','_child'])\n", "ParentChild.set_index(['id_parent','value_parent','tag_parent','id_child','value_child','tag_child']).sort_index()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Forming The Relationship Tree\n", "Sometimes you'll want to have relationships presented in tree format. You'll need to go back to CS and do [BFS](https://en.wikipedia.org/wiki/Breadth-first_search). Here's an example" ] }, { "cell_type": "code", "execution_count": 140, "metadata": {}, "outputs": [], "source": [ "nodesById = {node['id']:node for node in relations} # Make a map from node_id to the node" ] }, { "cell_type": "code", "execution_count": 138, "metadata": {}, "outputs": [], "source": [ "def is_root(node):\n", " # If a node has no parent it is a root\n", " return node['parent_id'] is None\n", "roots = [root for root in filter(is_root,relations)] #Gets the list of roots\n" ] }, { "cell_type": "code", "execution_count": 132, "metadata": {}, "outputs": [], "source": [ "def process_node(node):\n", " children_ids = node['children']\n", " children = [] \n", " print(children_ids)\n", " for child_id in children_ids:\n", " child = nodesById[child_id] # Look up the child node\n", " child = process_node(child) # Recursively call this function to attach children to the child\n", " children.append(child) # Append the updated child to the list of children\n", " node['children_objects'] = children # Attach the children to the node\n", " return node" ] }, { "cell_type": "code", "execution_count": 142, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['0c11cbf1-70d8-4472-9504-c186013cf6f2']\n", "['42c4cf4d-1306-4e13-a649-72758df50266', 'b031c51e-e378-41c8-ba7c-274c52683ba6']\n", "[]\n", "[]\n", "{'annotator': 'tal',\n", " 'annotator_id': 1,\n", " 'children': ['0c11cbf1-70d8-4472-9504-c186013cf6f2'],\n", " 'children_objects': [{'annotator': 'tal',\n", " 'annotator_id': 1,\n", " 'children': ['42c4cf4d-1306-4e13-a649-72758df50266',\n", " 'b031c51e-e378-41c8-ba7c-274c52683ba6'],\n", " 'children_objects': [{'annotator': 'tal',\n", " 'annotator_id': 1,\n", " 'children': [],\n", " 'children_objects': [],\n", " 'id': '42c4cf4d-1306-4e13-a649-72758df50266',\n", " 'materialized_path': '501e60b3-7572-48eb-8caf-0a73bbe52165/0c11cbf1-70d8-4472-9504-c186013cf6f2/42c4cf4d-1306-4e13-a649-72758df50266',\n", " 'parent_id': '0c11cbf1-70d8-4472-9504-c186013cf6f2',\n", " 'pseudo_node_type': None,\n", " 'relation_type': None,\n", " 'tagged_token_id': '0bff51a0-802d-4055-aa5f-e098a73854dc',\n", " 'type': 'token'},\n", " {'annotator': 'tal',\n", " 'annotator_id': 1,\n", " 'children': [],\n", " 'children_objects': [],\n", " 'id': 'b031c51e-e378-41c8-ba7c-274c52683ba6',\n", " 'materialized_path': '501e60b3-7572-48eb-8caf-0a73bbe52165/0c11cbf1-70d8-4472-9504-c186013cf6f2/b031c51e-e378-41c8-ba7c-274c52683ba6',\n", " 'parent_id': '0c11cbf1-70d8-4472-9504-c186013cf6f2',\n", " 'pseudo_node_type': None,\n", " 'relation_type': None,\n", " 'tagged_token_id': 'a7cd1d90-fe36-40c6-81c2-40b8e710b4d7',\n", " 'type': 'token'}],\n", " 'id': '0c11cbf1-70d8-4472-9504-c186013cf6f2',\n", " 'materialized_path': '501e60b3-7572-48eb-8caf-0a73bbe52165/0c11cbf1-70d8-4472-9504-c186013cf6f2',\n", " 'parent_id': '501e60b3-7572-48eb-8caf-0a73bbe52165',\n", " 'pseudo_node_type': None,\n", " 'relation_type': None,\n", " 'tagged_token_id': '97701568-648f-4bf5-b6e1-649a36b4da76',\n", " 'type': 'token'}],\n", " 'id': '501e60b3-7572-48eb-8caf-0a73bbe52165',\n", " 'materialized_path': '501e60b3-7572-48eb-8caf-0a73bbe52165',\n", " 'parent_id': None,\n", " 'pseudo_node_type': None,\n", " 'relation_type': None,\n", " 'tagged_token_id': 'ae5de32e-11d0-4ba3-ad6d-24893095b186',\n", " 'type': 'token'}\n" ] } ], "source": [ "from pprint import pprint \n", "pprint(process_node(roots[0]))\n", "\n" ] } ], "metadata": { "file_extension": ".py", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.8" }, "mimetype": "text/x-python", "name": "python", "npconvert_exporter": "python", "pygments_lexer": "ipython3", "version": 3 }, "nbformat": 4, "nbformat_minor": 2 }