Consistency in OntoNotes |
July 29th, 2013 |
ling, tech |
Like many Heartland states, Iowa has had trouble keeping young people down on the farm or anywhere within state lines. (en/bn/abc_0001:0)A human linguist manually parsed this into a tree:
(TOP (S (PP-MNR (IN Like) (NP (JJ many) (NNP Heartland) (NNS states))) (, ,) (NP-SBJ (NNP Iowa)) (VP (VBZ has) (VP (VBN had) (NP (NP (NN trouble)) (S-NOM (NP-SBJ (-NONE- *PRO*)) (VP (VBG keeping) (NP (JJ young) (NNS people)) (ADVP-LOC (ADVP (RB down) (PP (IN on) (NP (DT the) (NN farm)))) (CC or) (ADVP (RB anywhere) (PP (IN within) (NP (NN state) (NNS lines)))))))))) (. .)))Then it went through several additional layers of annotation to specify things like "Heartland" being a location, "Iowa" being referred to in later sentences with "it", and the relationship of the various arguments to the main verb, 'had'.
There are many places in this process where one could make mistakes, and inconsistent data makes it much harder for maching learning systems. From the beginning, OntoNotes was intended to generate high quality data by doing most of the annotation work twice and then adjudicating any disagreements. [1] But how consistent is the final product?
One way to measure this is to look at a document in the corpus that was accidentally included multiple times. This wasn't noticed at the time and was annotated repeatedly. Documents wsj_0190, wsj_0364, wsj_0511, wsj_0696, wsj_1056, wsj_1228, wsj_1382, wsj_1557, and wsj_1558 all read:
Companies listed below reported quarterly profit substantially different from the average of analysts' estimates. The companies are followed by at least three analysts, and had a minimum five-cent change in actual earnings per share. Estimated and actual results involving losses are omitted. The percent difference compares actual profit with the 30-day estimate where at least three analysts have issues forecasts in the past 30 days. Otherwise, actual profit is compared with the 300-day estimate.My guess is that this appears multiple times in the corpus because it was printed multiple times in the Wall Street Journal. This means it's somewhat atypical and is kind of boilerplate-ish, but we do at least have a lot of copies of it.
How many different ways did these sentences get analyzed? Let's go sentence by sentence.
"Companies listed below reported quarterly profit substantially different from the average of analysts' estimates."
wsj_0190, wsj_0364, wsj_1228: (TOP (S (NP-SBJ (NP (NNS Companies)) (VP (VBN listed) (NP (-NONE- *)) (ADVP-LOC (IN below)))) (VP (VBD reported) (NP (NP (JJ quarterly) (NN profit)) (ADJP (RB substantially) (JJ different) (PP (IN from) (NP (NP (DT the) (NN average)) (PP (IN of) (NP (NP (NNS analysts) (POS ')) (NNS estimates)))))))) (. .))) wsj_0511, wsj_1557, wsj_1558: (TOP (S (NP-SBJ (NP (NNS Companies)) (VP (VBN listed) (NP (-NONE- *)) (PP-LOC (IN below)))) (VP (VBD reported) (NP (NP (JJ quarterly) (NN profit)) (ADJP (RB substantially) (JJ different) (PP (IN from) (NP (NP (DT the) (NN average)) (PP (IN of) (NP (NP (NNS analysts) (POS ')) (NNS estimates)))))))) (. .))) wsj_0696, wsj_1056, wsj_1382: (TOP (S (NP-SBJ (NP (NNS Companies)) (VP (VBN listed) (NP (-NONE- *)) (ADVP-LOC (RB below)))) (VP (VBD reported) (NP (NP (JJ quarterly) (NN profit)) (ADJP (RB substantially) (JJ different) (PP (IN from) (NP (NP (DT the) (NN average)) (PP (IN of) (NP (NP (NNS analysts) (POS ')) (NNS estimates)))))))) (. .)))These three versions differ only in their analysis of "listed below". We see
(ADVP-LOC (IN below))
, (PP-LOC (IN below))
and (ADVP-LOC (RB below))
.
Proposition annotation specifies the relationship between various arguments of verbs. For this sentence we had two sets:
wsj_0190, wsj_0364, wsj_0696, wsj_1056, wsj_1228, wsj_1382: 1 list.01 ----- 1:0-rel 2:0-ARG1 3:1-ARG2 0:1*2:0-LINK-PCR 4 report.01 ----- 4:0-rel 0:2-ARG0 5:2-ARG1 wsj_0511, wsj_1557, wsj_1558: 1 list.01 ----- 1:0-rel 2:0-ARG1 3:1-ARGM-LOC 0:1*2:0-LINK-PCR 4 report.01 ----- 4:0-rel 0:2-ARG0 5:2-ARG1The disagreement is over whether "below" is the second argument of "listed" or a locative modifier, and was probably caused by the corresponding disagreement in the parsing.
All named entity annotation passes identified "quarterly" as a date.
"The companies are followed by at least three analysts, and had a minimum five-cent change in actual earnings per share."
wsj_0190: (TOP (S (NP-SBJ-1 (DT The) (NNS companies)) (VP (VP (VBP are) (VP (VBN followed) (NP (-NONE- *-1)) (PP (IN by) (NP-LGS (QP (ADVP (IN at) (JJS least)) (CD three)) (NNS analysts))))) (, ,) (CC and) (VP (VBD had) (NP (NP (DT a) (JJ minimum) (NML (CD five) (HYPH -) (NN cent)) (NN change)) (PP (IN in) (NP (NP (JJ actual) (NNS earnings)) (PP (IN per) (NP (NN share)))))))) (. .))) wsj_0364, wsj_0511, wsj_0696, wsj_1056, wsj_1228, wsj_1382, wsj_1557, wsj_1558: (TOP (S (NP-SBJ-1 (DT The) (NNS companies)) (VP (VP (VBP are) (VP (VBN followed) (NP (-NONE- *-1)) (PP (IN by) (NP-LGS (QP (ADVP (RB at) (RBS least)) (CD three)) (NNS analysts))))) (, ,) (CC and) (VP (VBD had) (NP (NP (DT a) (JJ minimum) (NML (CD five) (HYPH -) (NN cent)) (NN change)) (PP-LOC (IN in) (NP (NP (JJ actual) (NNS earnings)) (PP (IN per) (NP (NN share)))))))) (. .)))The two version here disagree in two places. In wsj_0190 we have "at least three" being
(ADVP (IN at) (JJS least))
which is weird
because the ADVP
would be an adverb phrase without
an adverb, just a preposition (IN
) and superlative
adjective (JJS
). The others, with (ADVP (RB at) (RBS
least))
, are much more reasonable. That adverb phrase
consists of an adverb (RB
) and a superlative adverb
(RBS
).
They also disagree whether the prepositional phrase "in actual earnings per share" should be locative. I don't see why it would be, but only wsj_0190 doesn't mark it that way, so maybe I'm missing something.
For propositional annotation we had:
wsj_1228, wsj_1382, wsj_1557, wsj_1558: 3 follow.02 ----- 3:0-rel 4:0-ARG1 5:1-ARG0 12 have.03 ----- 12:0-rel 0:1-ARG0 13:2-ARG1
wsj_0190, wsj_0364, wsj_0511, wsj_0696, wsj_1056: 3 follow.02 ----- 3:0-rel 4:0-ARG1 5:1-ARG0 12 have.03 ----- 12:0-rel 4:0-ARG0 13:2-ARG1The disagreement here isn't a real disagreement. Node
0:1
is
"the companies" and node 4:0
is a trace that refers back to
"the companies".
All named entity annotation passes identified "at least three" as numeric and "five-cent" as money. All coreference annotation passes matched "the companies" back to "companies listed below" in the previous sentence.
"Estimated and actual results involving losses are omitted."
wsj_0190, wsj_0511, wsj_1056, wsj_1228, wsj_1557, wsj_1558: (TOP (S (NP-SBJ-1 (NP (ADJP (VBN Estimated) (CC and) (JJ actual)) (NNS results)) (VP (VBG involving) (NP (NNS losses)))) (VP (VBP are) (VP (VBN omitted) (NP (-NONE- *-1)))) (. .))) wsj 0364, wsj_0696, wsj_1382: (TOP (S (NP-SBJ-1 (NP (ADJP (JJ Estimated) (CC and) (JJ actual)) (NNS results)) (VP (VBG involving) (NP (NNS losses)))) (VP (VBP are) (VP (VBN omitted) (NP (-NONE- *-1)))) (. .)))We see "estimated" being interpreted as either a past participle (
VBN
) or adjective (JJ
). Both are pretty
reasonable.
For propositions, only the instances of "estimated" tagged as
VBN
were eligible for annotation. All of those were
annotated as:
0 estimate.01 ----- 0:0-rel 3:0,4:1-ARG1
All documents had the other two propositions annotated the same way:
4 involve.01 ----- 4:0-rel 0:2-ARG2 5:1-ARG1 7 omit-v omit.01 ----- 7:0-rel 8:0-ARG1No named entity pass found anything here.
"The percent difference compares actual profit with the 30-day estimate where at least three analysts have issues forecasts in the past 30 days."
wsj_0190: (TOP (S (NP-SBJ (DT The) (NN percent) (NN difference)) (VP (VBZ compares) (NP (JJ actual) (NN profit)) (PP-CLR (IN with) (NP (DT the) (NML (CD 30) (HYPH -) (NN day)) (NN estimate))) (SBAR-ADV (WHADVP-1 (WRB where)) (S (NP-SBJ (QP (ADVP (RB at) (RBS least)) (CD three)) (NNS analysts)) (VP (VBP have) (VP (NNS issues) (NP (NNS forecasts)) (PP-TMP (IN in) (NP (DT the) (JJ past) (CD 30) (NNS days))) (ADVP-LOC (-NONE- *T*-1))))))) (. .))) wsj 0364: (TOP (S (NP-SBJ (DT The) (NN percent) (NN difference)) (VP (VBZ compares) (NP (JJ actual) (NN profit)) (PP-CLR (IN with) (NP (NP (DT the) (NML (CD 30) (HYPH -) (NN day)) (NN estimate)) (SBAR-LOC (WHADVP-1 (WRB where)) (S (NP-SBJ (QP (ADVP (RB at) (RBS least)) (CD three)) (NNS analysts)) (VP (VBP have) (NP (NNS issues) (NNS forecasts)) (PP-TMP (IN in) (NP (DT the) (JJ past) (CD 30) (NNS days))) (ADVP-LOC (-NONE- *T*-1)))))))) (. .))) wsj_0511, wsj_0696, wsj_1382: (TOP (S (NP-SBJ (DT The) (NN percent) (NN difference)) (VP (VBZ compares) (NP (JJ actual) (NN profit)) (PP-CLR (IN with) (NP (NP (DT the) (NML (CD 30) (HYPH -) (NN day)) (NN estimate)) (SBAR (WHADVP-1 (WRB where)) (S (NP-SBJ (QP (ADVP (RB at) (RBS least)) (CD three)) (NNS analysts)) (VP (VBP have) (VP (NNS issues) (NP (NNS forecasts)) (PP-TMP (IN in) (NP (DT the) (JJ past) (CD 30) (NNS days))) (ADVP-LOC (-NONE- *T*-1))))))))) (. .))) wsj_1056: (TOP (S (NP-SBJ (DT The) (NN percent) (NN difference)) (VP (VBZ compares) (NP (JJ actual) (NN profit)) (PP-CLR (IN with) (NP (DT the) (NML (CD 30) (HYPH -) (NN day)) (NN estimate))) (SBAR-ADV (WHADVP-1 (WRB where)) (S (NP-SBJ (QP (ADVP (RB at) (RBS least)) (CD three)) (NNS analysts)) (VP (VBP have) (VP (NNS issues) (NP (NNS forecasts)) (ADVP-LOC (-NONE- *T*-1)) (PP-TMP (IN in) (NP (DT the) (JJ past) (CD 30) (NNS days)))))))) (. .))) wsj_1228: (TOP (S (NP-SBJ (DT The) (NN percent) (NN difference)) (VP (VBZ compares) (NP (JJ actual) (NN profit)) (PP-CLR (IN with) (NP (NP (DT the) (NML (CD 30) (HYPH -) (NN day)) (NN estimate)) (SBAR (WHADVP-1 (WRB where)) (S (NP-SBJ (QP (ADVP (RB at) (RBS least)) (CD three)) (NNS analysts)) (VP (VBP have) (NP (NNS issues) (NNS forecasts)) (PP-TMP (IN in) (NP (DT the) (JJ past) (CD 30) (NNS days))) (ADVP-LOC (-NONE- *T*-1)))))))) (. .))) wsj_1557, wsj_1558: (TOP (S (NP-SBJ (DT The) (NN percent) (NN difference)) (VP (VBZ compares) (NP (JJ actual) (NN profit)) (PP-CLR (IN with) (NP (NP (DT the) (CD 30-day) (NN estimate)) (SBAR (WHADVP-1 (WRB where)) (S (NP-SBJ (QP (ADVP (RB at) (RBS least)) (CD three)) (NNS analysts)) (VP (VB have) (VP (NNS issues) (NP (NNS forecasts)) (PP-TMP (IN in) (NP (DT the) (JJ past) (CD 30) (NNS days))) (ADVP-LOC (-NONE- *T*-1))))))))) (. .)))This one is complicated, and has several different issues. First, there's a disagrement between
(with the 30-day estimate) (where at least three analysts have issues forecasts in the past 30 days).and
(with the 30-day estimate (where at least three analysts have issues forecasts in the past 30 days.))Is the "where" clause under the "with" clause?
Second, there's a disagreement over where the trace goes. All of them put it at the end except for wsj_1056 which puts it after "forecasts". I don't understand traces well enough to say what's going on here.
Third, there's a typo of "issues" for "issued", and so it's tagged as
a plural noun (NNS
) when it should be a verb. This comes
from automated part of speech tagging that is supposed to be
hand-corrected but in this case was missed. Some of the parses have
it heading a verb phrase (VP
) which makes sense except for
the tag, while others nonsensically treat "issues forecasts" as a noun
phrase.
Fourth, there's a disagreement in tokenization. Most of them break
"30-day" into (NML (CD 30) (HYPH -) (NN day))
but wsj_1557 and
wsj_1558 leave it as a simple (CD 30-day)
. I think all hyphens
in this corpus are supposed to be split, so I'm not sure why these two
are left connected.
Fifth, in wsj_0364 the clause "where at least three analysts have issues forecasts in the past 30 days" is marked as locative but not in the others. As before, I don't see how this use has anything to do with location.
For proposition annotation, the two parse trees where the "where" clause isn't under the "with" clause get an extra argument:
wsj_0190, wsj_1056: 3 compare.01 ----- 3:0-rel 0:1-ARG0 4:1-ARG1 6:1-ARG2 12:2-ARGM-ADV wsj_0364, wsj_0511, wsj_0696, wsj_1228, wsj_1557, wsj_1558: 3 compare.01 ----- 3:0-rel 0:1-ARG0 4:1-ARG1 6:1-ARG2
All named entity annotation passes labeled "30-day" and "the past 30 days" as dates, and "at least three" as a number.
"Otherwise, actual profit is compared with the 300-day estimate."
wsj_0190, wsj_0364, wsj_0511, wsj_0696, wsj_1056, wsj_1228, wsj_1382: (TOP (S (ADVP (RB Otherwise)) (, ,) (NP-SBJ-1 (JJ actual) (NN profit)) (VP (VBZ is) (VP (VBN compared) (NP (-NONE- *-1)) (PP-CLR (IN with) (NP (DT the) (NML (CD 300) (HYPH -) (NN day)) (NN estimate))))) (. .))) wsj_1557, wsj_1558: (TOP (S (ADVP (RB Otherwise)) (, ,) (NP-SBJ-1 (JJ actual) (NN profit)) (VP (VBZ is) (VP (VBN compared) (NP (-NONE- *-1)) (PP-CLR (IN with) (NP (DT the) (CD 300-day) (NN estimate))))) (. .)))These two differ only in whether the "300-day" is split at the hyphen, and the two documents that don't split it are the same two that didn't split "30-day". Those two also grouped together in each of the previous cases and are numbered sequentially, so I'm not sure we should really be treating wsj_1557 and wsj_1558 as independent annotations.
The proposition annotations are almost the same, but some annotate the
trace as a direct semantic link (LINK-PCR
):
wsj_0190, wsj_0364, wsj_0511, wsj_0696: 5 compare.01 ----- 5:0-rel 0:1-ARGM-DIS 6:0-ARG1 7:1-ARG2 6:0*6:0-LINK-PCR wsj_1056, wsj_1228, wsj_1382, wsj_1557, wsj_1558: 5 compare.01 ----- 5:0-rel 0:1-ARGM-DIS 6:0-ARG1 7:1-ARG2I can't think of what a link from
6:0
to 6:0
would
mean, but this seems simple enough to have been cleaned up manually if
it were actually invalid.
All named entity annotation passes labeled "300-day" as a date. All coreference passes but one connected "actual profit" back to "actual profit" in the previous sentence.
Summary
Almost all the disagreement here was in the parses, which are also the most complex annotations. There wasn't enough named entity or coreference evaluation to get a good sense of how accurate they are.(Note: I worked on OntoNotes at BBN from 2008 to 2010.)
[1] OntoNotes:
The 90% Solution Eduard Hovy, Mitchell Marcus, Martha Palmer,
Lance Ramshaw, and Ralph Weischedel (2006).
Comment via: google plus, facebook