In this post we are going to explore spaCy’s matchers.
Setup
First install spaCy: https://spacy.io/docs/usage/
Detecting Hyponyms
To learn about spaCy’s matchers, we are going to implement two patterns from Marti Hearst’s paper on Automatic Acquisition of Hyponyms from Large Text Corpora.
A hyponym express a type-of relationship where X is a hyponym of Y if X is a kind of Y.
The opposite relationship is called a hypernym, where Y is a hypernym of X if X is a kind of Y. [2]
In Heart’s paper, patterns are used to extract these hyponym’s from natural text.
For example, from the sentence
The bow lute, such as the Bambara ndang, is plucked and has an individual curved neck for each string.
We know that a “Bamara ndang” is a type of “bow lute”.
This type of pattern can be generalized as:
NP_0 such as {NP_1, NP_2, …, (and
or)} NP_n
and this implies that
hyponym(NP_i, NP_0) for each i ∈ [1, n]
Now let’s implement this!
spaCy splits each word or punctuation mark into an individual token.
The DET
bow NOUN
lute NOUN
, PUNCT
such ADJ
as ADP
the DET
Bambara PROPN
ndang NOUN
is VERB
plucked VERB
and CONJ
has VERB
an DET
individual ADJ
curved ADJ
neck NOUN
for ADP
each DET
string NOUN
. PUNCT
We can extract the noun phrases using Doc.noun_chunks, which yields the base noun phrase as a Span object.
The bow lute
the Bambara ndang
an individual curved neck
each string
In the Hearst paper the patterns described all use the noun phrases rather than individual parts of speech.
We want to merge these noun phrases into a single token so we can match using Matchers
Now when we look at the tokens we see that the noun phrases are merged into the noun phrases
The bow lute NOUN
, PUNCT
such ADJ
as ADP
the Bambara ndang NOUN
is VERB
plucked VERB
and CONJ
has VERB
an individual curved neck NOUN
for ADP
each string NOUN
. PUNCT
We can now use Matchers to extract the Hearst’s Patterns. We will only match the phrases
X such as Y
and if Y contains more than one noun phrase we will extract these post-match.
Extract the match
hyponym(the Bambara ndang,The bow lute)
We can test this on a more complicated example with multiple hyponyms extracted