[1]:

Working With Task Results

When you run a task on LightTag you’ll want to get back the annotations,classifications and relationships annotated in that task. You can retreive that data either through the UI or through the api. The output structure is the same, and in this document we’ll use a pre downloaded file.

The downloaded object contains some metadata about the task, and a list of examples each with the annotations you applied to them . This document will show you how to extract relevant inforamtion, manipulate it to gleam insights and join varying pieces of data together.

[136]:
import pandas as pd # We use pandas to manipulate the data
import json  #Load the the data in json
from pprint import pprint #Print things pretty
results = json.load(open('/home/tal/Downloads/classes2_annotations (1).json')) # Our downloaded results
results.keys()
[136]:
dict_keys(['id', 'examples', 'schema', 'dataset', 'relations', 'name', 'annotators_per_example'])

The results object has the following keys: * id The id of the Task * Name The name of the Task * Schema The Schema the task used, including it’s tags and classes. * relations Any relationships that may have been annotated * examples A list of the examples in the dataset. Each example has a list of Annotations and Classifications

[ ]:

The Example Object

The example object contains:

  • content The content that was annotated, any metadata you attached to it

  • metadata Any metadata you included with the example

  • seen_by A list of the annotators that saw this example during the task

  • annotations A list of annotations made to this example

  • classifications A list of classifications applied to this example

[3]:
examples = results['examples']
pd.DataFrame(examples[:2])
[3]:
annotations classifications content example_id metadata seen_by
0 [] [] This proposed rule is issued under Marketing O... 11f00552-c0d3-46f3-8cc9-8643a7e74694 {'end': 263, 'key': 'agreement', 'type': 'pror... []
1 [] [] 1. Medical Device User Fee Agreement IV Commit... 13c5b003-9741-4342-90c7-dfc7fc86650e {'end': 36, 'key': 'agreement', 'type': 'rule'... []

Annotations

Each annotation object in the list annotations corresponds to a span in an example with a tag. We call that a Tagged Token. Each annotation object has it’s tag, span and example and a list of annotators that made that annotation (annotated_by) as well as it’s validation status if it has been validated. The fields on the annotation object are

  • example_id The id of the example that was annotated_by

  • start The start offset of the span that was annotated

  • end The end offset of the span that was annotated

  • tag The name of the tag that was applied

  • tag_id The id of the tag that was applied

  • tagged_token_id A unique id that determines the (example_id,start,end,tag). Useful for aggregating and joining.

  • value The text that was annotatoed

  • reviewed Has this annotation been seen in review mode (true/false)

  • correct Has this been reviewed and set as correct (true/false - null if not reviewed)

  • annotated_by A list of the annotators that made this annotation

[4]:
example = examples[0]
pd.DataFrame(example['annotations'][:3])
[4]:

Annotation Analytics

How to Merge All Annotations from All Examples

Here’s a one liner that extrars all of the annotations from all of the examples. This is useful for analytics as we’ll see below

[5]:
all_annotations = sum(map(lambda x:x['annotations'],examples),[])

How to Count the Tag/Word Pairs (e.g. a weighted dictionary)

[6]:
Annotations = pd.DataFrame(all_annotations)
Annotations.pivot_table(index='value',columns='tag',values='example_id',aggfunc=len).fillna(0).head()
[6]:
tag DATE DURATION GPE HANDLE MONEY NUMBER ORG PERSON TIME TITLE
value
0.025 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
02-55 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0
02424 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
06 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0
1 2.0 0.0 0.0 0.0 1.0 2.0 1.0 1.0 1.0 0.0

Calculating Inter Annotator Agreement

To do calculate inter annotator agreement, we’ll compare individual annotators with the tagged_token_id. To get the data in the format we need, we’ll use pandas json_normalize utility

[7]:
IAAData = pd.io.json.json_normalize(all_annotations,meta=['tagged_token_id','example_id','tag','value'],
                                                          record_path='annotated_by')
IAAData.head()
[7]:
annotator annotator_id timestamp tagged_token_id example_id tag value
0 2019-10-16 08:48:05.621963 6 2019-10-24T13:36:21.917742+00:00 8568a9cd-ae42-4c06-942a-c3d1d158c541 a954d097-c61c-4192-bcea-09121c7f2d07 PERSON or
1 2019-10-16 08:48:05.592811 4 2019-10-24T13:36:22.043508+00:00 80313a6b-a227-4005-a3b9-e400715c9fc6 a954d097-c61c-4192-bcea-09121c7f2d07 HANDLE the
2 2019-10-16 08:48:05.621963 6 2019-10-24T13:36:21.914699+00:00 24f8561a-d5c9-4728-a9d2-99eecf578fe3 a954d097-c61c-4192-bcea-09121c7f2d07 TIME is
3 2019-10-16 08:48:05.652277 8 2019-10-24T13:36:21.782039+00:00 5fb9aed3-0a22-4076-b04c-e22efb10d72b a954d097-c61c-4192-bcea-09121c7f2d07 NUMBER the
4 2019-10-16 08:48:05.621963 6 2019-10-24T13:36:21.908996+00:00 56840eeb-342d-4e6c-b970-1a9a022d622e a954d097-c61c-4192-bcea-09121c7f2d07 DATE agreement
[8]:
IAAPivot = IAAData.pivot_table(index='tagged_token_id',columns='annotator',values='timestamp',aggfunc=len).fillna(0)
IAAPivot.head()
[8]:
annotator 2019-10-16 08:48:05.539866 2019-10-16 08:48:05.577156 2019-10-16 08:48:05.592811 2019-10-16 08:48:05.607580 2019-10-16 08:48:05.621963 2019-10-16 08:48:05.637460 2019-10-16 08:48:05.652277 2019-10-16 08:48:05.667258 2019-10-16 08:48:05.681504 2019-10-16 08:48:05.696058 tal
tagged_token_id
000117f3-f28b-4e90-b87a-5cca1e913749 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0001915f-3893-4866-a218-ebe4419d8661 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0001ea06-01f8-4996-afd1-d5eb878909b0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0
0005bd41-a479-496c-a4e6-2527af145ef4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0
0006265e-9eee-499b-955f-fc2cf2bae8fe 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
[9]:

AggrementCount = IAAPivot.T.dot(IAAPivot) #This dot product gives you the count of times two annotators agreed
AggrementCount.head()
[9]:
annotator 2019-10-16 08:48:05.539866 2019-10-16 08:48:05.577156 2019-10-16 08:48:05.592811 2019-10-16 08:48:05.607580 2019-10-16 08:48:05.621963 2019-10-16 08:48:05.637460 2019-10-16 08:48:05.652277 2019-10-16 08:48:05.667258 2019-10-16 08:48:05.681504 2019-10-16 08:48:05.696058 tal
annotator
2019-10-16 08:48:05.539866 3529.0 16.0 7.0 15.0 15.0 14.0 3.0 19.0 19.0 16.0 17.0
2019-10-16 08:48:05.577156 16.0 3941.0 22.0 15.0 13.0 13.0 8.0 21.0 21.0 10.0 21.0
2019-10-16 08:48:05.592811 7.0 22.0 3816.0 7.0 11.0 16.0 14.0 18.0 13.0 20.0 15.0
2019-10-16 08:48:05.607580 15.0 15.0 7.0 3310.0 14.0 11.0 14.0 14.0 10.0 8.0 12.0
2019-10-16 08:48:05.621963 15.0 13.0 11.0 14.0 3584.0 11.0 9.0 15.0 29.0 11.0 20.0

Getting the Agreements for each annotation

[10]:
TokenAgreement = IAAData.groupby('tagged_token_id').annotator.count()
TokenAgreement.head()
[10]:
tagged_token_id
000117f3-f28b-4e90-b87a-5cca1e913749    1
0001915f-3893-4866-a218-ebe4419d8661    1
0001ea06-01f8-4996-afd1-d5eb878909b0    1
0005bd41-a479-496c-a4e6-2527af145ef4    1
0006265e-9eee-499b-955f-fc2cf2bae8fe    1
Name: annotator, dtype: int64
[11]:
# How often did we have an agreement of 1,2,3...
TokenAgreement.value_counts()
[11]:
1    39664
2      785
3       10
Name: annotator, dtype: int64

Get The Annotations that had exactly 3 person agreement

[12]:
Agreed = IAAData.groupby('tagged_token_id').filter(lambda x: len(x)==3)
Agreed.head()
[12]:
annotator annotator_id timestamp tagged_token_id example_id tag value
2252 2019-10-16 08:48:05.681504 10 2019-10-24T13:35:16.789966+00:00 fe6547d0-e207-4aee-9ea3-64d305ef2dea b863213d-98d5-4d20-9525-4fc7c9448e2d DATE in
2253 2019-10-16 08:48:05.667258 9 2019-10-24T13:35:15.935764+00:00 fe6547d0-e207-4aee-9ea3-64d305ef2dea b863213d-98d5-4d20-9525-4fc7c9448e2d DATE in
2254 tal 1 2019-10-24T13:35:15.254039+00:00 fe6547d0-e207-4aee-9ea3-64d305ef2dea b863213d-98d5-4d20-9525-4fc7c9448e2d DATE in
2443 2019-10-16 08:48:05.652277 8 2019-10-24T13:35:01.552501+00:00 c0b30335-604f-4dfb-b671-fdc999b9e67f bbd0eefb-feee-40cd-8e34-54519378f146 DATE revised
2444 2019-10-16 08:48:05.637460 7 2019-10-24T13:35:01.060189+00:00 c0b30335-604f-4dfb-b671-fdc999b9e67f bbd0eefb-feee-40cd-8e34-54519378f146 DATE revised
[13]:
# Nicer display for excel / reporting
Agreed.set_index(['tagged_token_id','annotator_id','annotator']).sort_index().head()

[13]:
timestamp example_id tag value
tagged_token_id annotator_id annotator
1d5053e0-da64-4ba0-aac9-d125fe825983 1 tal 2019-10-24T13:30:45.360454+00:00 edd6e7a7-58cb-4c38-895a-1bf5eba7dac9 DATE types
8 2019-10-16 08:48:05.652277 2019-10-24T13:30:43.706208+00:00 edd6e7a7-58cb-4c38-895a-1bf5eba7dac9 DATE types
10 2019-10-16 08:48:05.681504 2019-10-24T13:30:44.883886+00:00 edd6e7a7-58cb-4c38-895a-1bf5eba7dac9 DATE types
629c905c-1126-491f-a030-f1b25bfad6bf 1 tal 2019-10-24T13:34:32.025165+00:00 f0cc12da-9736-4d81-a1b7-7359e5988b9a TIME safeguard
8 2019-10-16 08:48:05.652277 2019-10-24T13:34:31.722251+00:00 f0cc12da-9736-4d81-a1b7-7359e5988b9a TIME safeguard

Classifications

Classifications are statements about the entire document. They live in the classifications object inside of an example. Each classification has the following properties: * definition_id The id of the task definition that the classification was made in * example_id The id of the example that was classified * classname The class that was applied * class_id The id of the class that was applied * classified_by A list of the annotators that made this classification

[19]:
# Take all the classifications from all of the examples
all_classifications = sum(map(lambda x:x['classifications'],examples),[])
all_classifications[0]
[19]:
{'definition_id': '2f660f4d-aa9f-4d92-8d4f-b2163c847016',
 'example_id': 'a954d097-c61c-4192-bcea-09121c7f2d07',
 'classname': 'Kendra Ritter',
 'class_id': '6b193f1c-1267-4f4f-b576-c0f186fbf779',
 'classified_by': [{'annotator_id': 4,
   'timestamp': None,
   'annotator': '2019-10-16 08:48:05.592811'},
  {'annotator_id': 6,
   'timestamp': None,
   'annotator': '2019-10-16 08:48:05.621963'}]}

Classification Analytics

The following show common queries you might want to run on your classifications

[22]:
Classifications = pd.io.json.json_normalize(all_classifications,meta=['example_id','classname'],record_path='classified_by')
Classifications.head()
[22]:
annotator annotator_id timestamp example_id classname
0 2019-10-16 08:48:05.592811 4 None a954d097-c61c-4192-bcea-09121c7f2d07 Kendra Ritter
1 2019-10-16 08:48:05.621963 6 None a954d097-c61c-4192-bcea-09121c7f2d07 Kendra Ritter
2 2019-10-16 08:48:05.652277 8 None a954d097-c61c-4192-bcea-09121c7f2d07 Kelly Patterson
3 2019-10-16 08:48:05.696058 11 None ab911160-25ea-421b-bfef-4a8802efad05 Kendra Ritter
4 2019-10-16 08:48:05.592811 4 None ab911160-25ea-421b-bfef-4a8802efad05 Kelly Patterson

Show the classifications for each example

Get a table of each example_id and count how many times it was classified as each class

[25]:
ClassPivot = Classifications.pivot_table(index='example_id',columns='classname',values='annotator_id',aggfunc=len).fillna(0)
ClassPivot.head()
[25]:
classname Brittany Fisher Kelly Patterson Kendra Ritter Thomas Woodward Valerie Delgado
example_id
a8b3e83b-226e-4e5e-ab37-977291abbfd9 0.0 0.0 1.0 0.0 0.0
a8e70531-f6ee-4b43-a2d8-9cf3c7e78905 0.0 0.0 2.0 1.0 0.0
a91ea8ca-6a16-468d-8ceb-3b74d8c8961a 0.0 0.0 1.0 2.0 0.0
a938eb3a-9b09-47d3-8d5b-f91314f56487 0.0 1.0 2.0 0.0 0.0
a940e3e9-e919-45f7-9d5e-fd3c0300a233 0.0 2.0 0.0 1.0 0.0

Calculate the co-occorence of classes

It’s often useful to see if some pair of classes is frequently confused. This is a quick way to do so:

[27]:
Confusion = ClassPivot.T.dot(ClassPivot)
Confusion
[27]:
classname Brittany Fisher Kelly Patterson Kendra Ritter Thomas Woodward Valerie Delgado
classname
Brittany Fisher 269.0 110.0 111.0 81.0 44.0
Kelly Patterson 110.0 620.0 301.0 183.0 52.0
Kendra Ritter 111.0 301.0 920.0 273.0 91.0
Thomas Woodward 81.0 183.0 273.0 541.0 58.0
Valerie Delgado 44.0 52.0 91.0 58.0 150.0

Joing Classifications to the original document

[35]:
Examples = pd.DataFrame(examples,columns=['content','metadata','example_id'])
ExamplesWithClasses = pd.merge(Examples,ClassPivot,on='example_id')
ExamplesWithClasses = ExamplesWithClasses.set_index(['example_id','content']).sort_index()
ExamplesWithClasses.tail(2)
[35]:
metadata Brittany Fisher Kelly Patterson Kendra Ritter Thomas Woodward Valerie Delgado
example_id content
ffa438d9-a4f9-4548-998b-11c0bf1add66 This action, pursuant to 5 U.S.C. 553, amends regulations issued to carry out a marketing order as defined in 7 CFR 900.2(j). This final rule is issued under Marketing Order No. 985 (7 CFR part 985), as amended, regulating the handling of spearmint oil produced in the Far West (Washington, Idaho, Oregon, and designated parts of Nevada and Utah). Part 985 (referred to as “the Order”) is effective under the Agricultural Marketing Agreement Act of 1937, as amended (7 U.S.C. 601-674), hereinafter referred to as the “Act.” The Committee locally administers the Marketing Order and is comprised of spearmint oil producers operating within the area of production, and a public member. {'end': 445, 'key': 'agreement', 'type': 'rule... 0.0 1.0 1.0 1.0 0.0
ffac4212-8328-4dfb-8434-952fb4643dd0 A recovery plan for Astragalus desereticus was not prepared; therefore, specific delisting criteria were not developed for the species. However, in 2005, we invited agencies with management or ownership authorities within the species' habitat to serve on a team to develop an interagency conservation agreement for Astragalus desereticus intended to facilitate a coordinated conservation effort between the agencies (UDWR et al. 2006, entire). The Conservation Agreement for Deseret milkvetch (Astragalus desereticus) (Conservation Agreement) was signed and approved by UDWR, UDOT, SITLA, and the Service in 2006 and will remain in effect for 30 years. The Conservation Agreement provides guidance to stakeholders to address threats and establish goals to ensure long-term survival of the species (UDWR et al. 2006, p. 7). Conservation actions contained in the Conservation Agreement (in italics), efforts to accomplish these actions, and their current status are described below. {'end': 470, 'key': 'agreement', 'type': 'pror... 1.0 1.0 1.0 0.0 0.0

Check how frequently each annotator uses each class

It’s often useful to see if a particular annotator is biased to a particular class. You can do so like this:

[37]:

[37]:
1704
[46]:
Classifications.groupby('annotator').classname.value_counts(normalize=True).to_frame().head(5)
[46]:
classname
annotator classname
2019-10-16 08:48:05.539866 Kelly Patterson 0.335484
Kendra Ritter 0.277419
Thomas Woodward 0.180645
Brittany Fisher 0.129032
Valerie Delgado 0.077419

Relations

If you’ve had your team create relationships you can get those results from the relations object. This is external to the Examples object because relations can span across multiple examples (Such as in co-reference resolution). The output of the relations object is the individual entities in a relationship, with a link to their parent, a list of their children and metadata:

  • id The unique identifier of this relation node

  • materialized_path A unique identifer of this node, typically used when you load this data into a relational database

  • parent_id The id of this nodes parent in the relationship

  • children A list of the ids of the children of this node

  • type What kind of node is this ? token if this was made from an entity pseudo_node if it was made from a Pseudo Node (such as a non terminal like NP)

  • relation_type The type annotation on the edge of the relationship. E.g. how does this node relate to it’s parent

  • annotator_id The id of the annotator that made this relation

  • annotator The name of the annotator

  • tagged_token_id If this node came from an annotated entity, this is the unique id of the Tagged Token (as described in the Annotation section)

[137]:
relations = results['relations']
Relations = pd.DataFrame(relations)
Relations[:2]

[137]:
annotator annotator_id children id materialized_path parent_id pseudo_node_type relation_type tagged_token_id type
0 tal 1 [0c11cbf1-70d8-4472-9504-c186013cf6f2] 501e60b3-7572-48eb-8caf-0a73bbe52165 501e60b3-7572-48eb-8caf-0a73bbe52165 None None None ae5de32e-11d0-4ba3-ad6d-24893095b186 token
1 tal 1 [42c4cf4d-1306-4e13-a649-72758df50266, b031c51... 0c11cbf1-70d8-4472-9504-c186013cf6f2 501e60b3-7572-48eb-8caf-0a73bbe52165/0c11cbf1-... 501e60b3-7572-48eb-8caf-0a73bbe52165 None None 97701568-648f-4bf5-b6e1-649a36b4da76 token

Relation Analytics

Adding data from Annotations

You’ll typically want the annotation data that comprised this node. Get it by joining on the Annotations

[66]:
AnnotationsInfo = Annotations[['tagged_token_id','example_id','start','end','value','tag']]
RelationNodesWithAnnotation = pd.merge(Relations,AnnotationsInfo,on='tagged_token_id')
RelationNodesWithAnnotation.head()
[66]:
annotator annotator_id children id materialized_path parent_id pseudo_node_type relation_type tagged_token_id type example_id start end value tag
0 tal 1 [0c11cbf1-70d8-4472-9504-c186013cf6f2] 501e60b3-7572-48eb-8caf-0a73bbe52165 501e60b3-7572-48eb-8caf-0a73bbe52165 None None None ae5de32e-11d0-4ba3-ad6d-24893095b186 token feb0438c-4a27-4e0f-877c-686e65728d88 126 135 appointed TITLE
1 tal 1 [42c4cf4d-1306-4e13-a649-72758df50266, b031c51... 0c11cbf1-70d8-4472-9504-c186013cf6f2 501e60b3-7572-48eb-8caf-0a73bbe52165/0c11cbf1-... 501e60b3-7572-48eb-8caf-0a73bbe52165 None None 97701568-648f-4bf5-b6e1-649a36b4da76 token feb0438c-4a27-4e0f-877c-686e65728d88 37 44 company TITLE
2 tal 1 [] 42c4cf4d-1306-4e13-a649-72758df50266 501e60b3-7572-48eb-8caf-0a73bbe52165/0c11cbf1-... 0c11cbf1-70d8-4472-9504-c186013cf6f2 None None 0bff51a0-802d-4055-aa5f-e098a73854dc token feb0438c-4a27-4e0f-877c-686e65728d88 67 76 successor MONEY
3 tal 1 [] b031c51e-e378-41c8-ba7c-274c52683ba6 501e60b3-7572-48eb-8caf-0a73bbe52165/0c11cbf1-... 0c11cbf1-70d8-4472-9504-c186013cf6f2 None None a7cd1d90-fe36-40c6-81c2-40b8e710b4d7 token feb0438c-4a27-4e0f-877c-686e65728d88 13 18 means TITLE

Creating a Parent Child Table

[67]:
ParentChild = pd.merge(RelationNodesWithAnnotation,RelationNodesWithAnnotation,
                       left_on='id',right_on='parent_id',how='inner',suffixes=['_parent','_child'])
ParentChild.set_index(['id_parent','value_parent','tag_parent','id_child','value_child','tag_child']).sort_index()
[67]:
annotator_parent annotator_id_parent children_parent materialized_path_parent parent_id_parent pseudo_node_type_parent relation_type_parent tagged_token_id_parent type_parent example_id_parent ... children_child materialized_path_child parent_id_child pseudo_node_type_child relation_type_child tagged_token_id_child type_child example_id_child start_child end_child
id_parent value_parent tag_parent id_child value_child tag_child
0c11cbf1-70d8-4472-9504-c186013cf6f2 company TITLE 42c4cf4d-1306-4e13-a649-72758df50266 successor MONEY tal 1 [42c4cf4d-1306-4e13-a649-72758df50266, b031c51... 501e60b3-7572-48eb-8caf-0a73bbe52165/0c11cbf1-... 501e60b3-7572-48eb-8caf-0a73bbe52165 None None 97701568-648f-4bf5-b6e1-649a36b4da76 token feb0438c-4a27-4e0f-877c-686e65728d88 ... [] 501e60b3-7572-48eb-8caf-0a73bbe52165/0c11cbf1-... 0c11cbf1-70d8-4472-9504-c186013cf6f2 None None 0bff51a0-802d-4055-aa5f-e098a73854dc token feb0438c-4a27-4e0f-877c-686e65728d88 67 76
b031c51e-e378-41c8-ba7c-274c52683ba6 means TITLE tal 1 [42c4cf4d-1306-4e13-a649-72758df50266, b031c51... 501e60b3-7572-48eb-8caf-0a73bbe52165/0c11cbf1-... 501e60b3-7572-48eb-8caf-0a73bbe52165 None None 97701568-648f-4bf5-b6e1-649a36b4da76 token feb0438c-4a27-4e0f-877c-686e65728d88 ... [] 501e60b3-7572-48eb-8caf-0a73bbe52165/0c11cbf1-... 0c11cbf1-70d8-4472-9504-c186013cf6f2 None None a7cd1d90-fe36-40c6-81c2-40b8e710b4d7 token feb0438c-4a27-4e0f-877c-686e65728d88 13 18
501e60b3-7572-48eb-8caf-0a73bbe52165 appointed TITLE 0c11cbf1-70d8-4472-9504-c186013cf6f2 company TITLE tal 1 [0c11cbf1-70d8-4472-9504-c186013cf6f2] 501e60b3-7572-48eb-8caf-0a73bbe52165 None None None ae5de32e-11d0-4ba3-ad6d-24893095b186 token feb0438c-4a27-4e0f-877c-686e65728d88 ... [42c4cf4d-1306-4e13-a649-72758df50266, b031c51... 501e60b3-7572-48eb-8caf-0a73bbe52165/0c11cbf1-... 501e60b3-7572-48eb-8caf-0a73bbe52165 None None 97701568-648f-4bf5-b6e1-649a36b4da76 token feb0438c-4a27-4e0f-877c-686e65728d88 37 44

3 rows × 24 columns

Forming The Relationship Tree

Sometimes you’ll want to have relationships presented in tree format. You’ll need to go back to CS and do BFS. Here’s an example

[140]:
nodesById = {node['id']:node for node in relations} # Make a map from node_id to the node
[138]:
def is_root(node):
    # If a node has no parent it is a root
    return node['parent_id'] is None
roots = [root for root in filter(is_root,relations)] #Gets the list of roots

[132]:
def process_node(node):
    children_ids = node['children']
    children = []
    print(children_ids)
    for child_id in children_ids:
        child = nodesById[child_id] # Look up the child node
        child = process_node(child) # Recursively call this function to attach children to the child
        children.append(child) # Append the updated child to the list of children
    node['children_objects'] = children # Attach the children to the node
    return node
[142]:
from pprint import pprint
pprint(process_node(roots[0]))


['0c11cbf1-70d8-4472-9504-c186013cf6f2']
['42c4cf4d-1306-4e13-a649-72758df50266', 'b031c51e-e378-41c8-ba7c-274c52683ba6']
[]
[]
{'annotator': 'tal',
 'annotator_id': 1,
 'children': ['0c11cbf1-70d8-4472-9504-c186013cf6f2'],
 'children_objects': [{'annotator': 'tal',
                       'annotator_id': 1,
                       'children': ['42c4cf4d-1306-4e13-a649-72758df50266',
                                    'b031c51e-e378-41c8-ba7c-274c52683ba6'],
                       'children_objects': [{'annotator': 'tal',
                                             'annotator_id': 1,
                                             'children': [],
                                             'children_objects': [],
                                             'id': '42c4cf4d-1306-4e13-a649-72758df50266',
                                             'materialized_path': '501e60b3-7572-48eb-8caf-0a73bbe52165/0c11cbf1-70d8-4472-9504-c186013cf6f2/42c4cf4d-1306-4e13-a649-72758df50266',
                                             'parent_id': '0c11cbf1-70d8-4472-9504-c186013cf6f2',
                                             'pseudo_node_type': None,
                                             'relation_type': None,
                                             'tagged_token_id': '0bff51a0-802d-4055-aa5f-e098a73854dc',
                                             'type': 'token'},
                                            {'annotator': 'tal',
                                             'annotator_id': 1,
                                             'children': [],
                                             'children_objects': [],
                                             'id': 'b031c51e-e378-41c8-ba7c-274c52683ba6',
                                             'materialized_path': '501e60b3-7572-48eb-8caf-0a73bbe52165/0c11cbf1-70d8-4472-9504-c186013cf6f2/b031c51e-e378-41c8-ba7c-274c52683ba6',
                                             'parent_id': '0c11cbf1-70d8-4472-9504-c186013cf6f2',
                                             'pseudo_node_type': None,
                                             'relation_type': None,
                                             'tagged_token_id': 'a7cd1d90-fe36-40c6-81c2-40b8e710b4d7',
                                             'type': 'token'}],
                       'id': '0c11cbf1-70d8-4472-9504-c186013cf6f2',
                       'materialized_path': '501e60b3-7572-48eb-8caf-0a73bbe52165/0c11cbf1-70d8-4472-9504-c186013cf6f2',
                       'parent_id': '501e60b3-7572-48eb-8caf-0a73bbe52165',
                       'pseudo_node_type': None,
                       'relation_type': None,
                       'tagged_token_id': '97701568-648f-4bf5-b6e1-649a36b4da76',
                       'type': 'token'}],
 'id': '501e60b3-7572-48eb-8caf-0a73bbe52165',
 'materialized_path': '501e60b3-7572-48eb-8caf-0a73bbe52165',
 'parent_id': None,
 'pseudo_node_type': None,
 'relation_type': None,
 'tagged_token_id': 'ae5de32e-11d0-4ba3-ad6d-24893095b186',
 'type': 'token'}