.relex¶
For performing semi-supervised chemical Relationship Extraction using the Snowball Algorithm.
.relex.cluster¶
Cluster of phrase objects and associated cluster dictionaries
-
class
chemdataextractor.relex.cluster.
Cluster
(label=None, learning_rate=0.5)[source]¶ Bases:
object
Base Snowball Cluster, used to combine similar phrases
-
__init__
(label=None, learning_rate=0.5)[source]¶ Create a new cluster
- Keyword Arguments:
{str} -- The label of this cluster (default (label) – {None})
{list} -- The order of entities that all phrases in this cluster must share (default (order) – {None})
{float} -- How quickly to update confidences based on new information (default (learning_rate) – {0.5})
-
add_phrase
(phrase)[source]¶ Add phrase to this cluster, update the word dictionary and token weights
- Parameters:
phrase (chemdataextractor.relex.phrase.Phrase) – The phrase to add to the cluster
-
update_dictionaries
(phrase)[source]¶ Update all dictionaries in this cluster
- Parameters:
phrase (chemdataextractor.relex.phrase.Phrase) – The phrase to update
-
static
add_tokens
(dictionary, tokens)[source]¶ Add specified tokens to the specified dictionary
- Parameters:
dictionary (OrderedDict) – The dictionary to add tokens to
tokens – tokens to add
- Type:
list of str
-
.relex.entity¶
Extraction pattern object
-
class
chemdataextractor.relex.entity.
Entity
(text, tag, parse_expression, start, end)[source]¶ Bases:
object
A base entity, the fundamental unit of a Relation
-
__init__
(text, tag, parse_expression, start, end)[source]¶ Create a new Entity
- Parameters:
{str} -- The text of the entity (text) –
{str or list} -- name of the entity (tag) –
-- how the entity is identified in text (parse_expression) –
{int} -- The index of the Entity in tokens (start) –
{int} -- The end index of the entity in tokens (end) –
-
.relex.pattern¶
Extraction pattern object
-
class
chemdataextractor.relex.pattern.
Pattern
(entities=None, elements=None, label=None, sentences=None, order=None, relations=None, confidence=0)[source]¶ Bases:
object
Pattern object, fundamentally the same as a phrase except assigned a confidence
.relex.phrase¶
Phrase object
-
class
chemdataextractor.relex.phrase.
Phrase
(sentence_tokens, relations, prefix_length, suffix_length)[source]¶ Bases:
object
-
__init__
(sentence_tokens, relations, prefix_length, suffix_length)[source]¶ Phrase Object
Class for handling which relations and entities appear in a sentence, the base type used for clustering and generating extraction patterns
- Parameters:
{[list} -- The sentence tokens from which to generate the Phrase (sentence_tokens) –
{list} -- List of Relation objects to be tagged in the sentence (relations) –
{int} -- Number of tokens to assign to the prefix (prefix_length) –
{int} -- Number of tokens to assign to the suffix (suffix_length) –
-
.relex.relationship¶
Classes for defining new chemical relationships
-
class
chemdataextractor.relex.relationship.
Relation
(entities, confidence)[source]¶ Bases:
object
Relation class
Essentially a placeholder for related of entities
.relex.snowball¶
.relex.utils¶
Various utility functions
-
chemdataextractor.relex.utils.
match_score
(pi, pj, prefix_weight=0.1, middle_weight=0.8, suffix_weight=0.1)[source]¶ Compute match between phrases using a dot product of vectors :param pi Phrase or pattern :param pj phrase or pattern # add weights to dot products to put more emphasis on matching the middles
-
chemdataextractor.relex.utils.
vectorise
(phrase, cluster)[source]¶ Vectorise a phrase object against a given cluster
- Parameters:
{[type]} -- [description] (cluster) –
{[type]} -- [description] –
-
chemdataextractor.relex.utils.
match
(phrase, cluster, prefix_weight, middles_weight, suffix_weight)[source]¶ Vectorise the phrase against this cluster to determine the match score
- Parameters:
{[type]} -- [description] (cluster) –
{[type]} -- [description] –
-
chemdataextractor.relex.utils.
mode_rows
(a)[source]¶ Find the modal row of a 2d array :param a: The 2d array to process :type a: np.array() :return: The most frequent row
-
chemdataextractor.relex.utils.
KnuthMorrisPratt
(text, pattern)[source]¶ - Yields all starting positions of copies of the pattern in the text.
Calling conventions are similar to string.find, but its arguments can be lists or iterators, not just strings, it returns all matches, not just the first one, and it does not need the whole text in memory at once. Whenever it yields, it will have read the text exactly up to and including the match that caused the yield.