.relex¶
For performing semi-supervised chemical Relationship Extraction using the Snowball Algorithm.
.relex.cluster¶
Cluster of phrase objects and associated cluster dictionaries
-
class
chemdataextractor.relex.cluster.
Cluster
(label=None, learning_rate=0.5)[source]¶ Bases:
object
Base Snowball Cluster, used to combine similar phrases
-
__init__
(label=None, learning_rate=0.5)[source]¶ Create a new cluster
Keyword Arguments: - {str} -- The label of this cluster (default (label) – {None})
- {list} -- The order of entities that all phrases in this cluster must share (default (order) – {None})
- {float} -- How quickly to update confidences based on new information (default (learning_rate) – {0.5})
-
add_phrase
(phrase)[source]¶ Add phrase to this cluster, update the word dictionary and token weights
Parameters: phrase (chemdataextractor.relex.phrase.Phrase) – The phrase to add to the cluster
-
update_dictionaries
(phrase)[source]¶ Update all dictionaries in this cluster
Parameters: phrase (chemdataextractor.relex.phrase.Phrase) – The phrase to update
-
static
add_tokens
(dictionary, tokens)[source]¶ Add specified tokens to the specified dictionary
Parameters: - dictionary (OrderedDict) – The dictionary to add tokens to
- tokens – tokens to add
Type: list of str
-
.relex.entity¶
Extraction pattern object
-
class
chemdataextractor.relex.entity.
Entity
(text, tag, parse_expression, start, end)[source]¶ Bases:
object
A base entity, the fundamental unit of a Relation
-
__init__
(text, tag, parse_expression, start, end)[source]¶ Create a new Entity
Parameters: - {str} -- The text of the entity (text) –
- {str or list} -- name of the entity (tag) –
- -- how the entity is identified in text (parse_expression) –
- {int} -- The index of the Entity in tokens (start) –
- {int} -- The end index of the entity in tokens (end) –
-
.relex.pattern¶
Extraction pattern object
-
class
chemdataextractor.relex.pattern.
Pattern
(entities=None, elements=None, label=None, sentences=None, order=None, relations=None, confidence=0)[source]¶ Bases:
object
Pattern object, fundamentally the same as a phrase except assigned a confidence
.relex.phrase¶
Phrase object
-
class
chemdataextractor.relex.phrase.
Phrase
(sentence_tokens, relations, prefix_length, suffix_length)[source]¶ Bases:
object
-
__init__
(sentence_tokens, relations, prefix_length, suffix_length)[source]¶ Phrase Object
Class for handling which relations and entities appear in a sentence, the base type used for clustering and generating extraction patterns
Parameters: - {[list} -- The sentence tokens from which to generate the Phrase (sentence_tokens) –
- {list} -- List of Relation objects to be tagged in the sentence (relations) –
- {int} -- Number of tokens to assign to the prefix (prefix_length) –
- {int} -- Number of tokens to assign to the suffix (suffix_length) –
-
.relex.relationship¶
Classes for defining new chemical relationships
-
class
chemdataextractor.relex.relationship.
Relation
(entities, confidence)[source]¶ Bases:
object
Relation class
Essentially a placeholder for related of entities
.relex.snowball¶
.relex.utils¶
Various utility functions
-
chemdataextractor.relex.utils.
match_score
(pi, pj, prefix_weight=0.1, middle_weight=0.8, suffix_weight=0.1)[source]¶ Compute match between phrases using a dot product of vectors :param pi Phrase or pattern :param pj phrase or pattern # add weights to dot products to put more emphasis on matching the middles
-
chemdataextractor.relex.utils.
vectorise
(phrase, cluster)[source]¶ Vectorise a phrase object against a given cluster
Parameters: - {[type]} -- [description] (cluster) –
- {[type]} -- [description] –
-
chemdataextractor.relex.utils.
match
(phrase, cluster, prefix_weight, middles_weight, suffix_weight)[source]¶ Vectorise the phrase against this cluster to determine the match score
Parameters: - {[type]} -- [description] (cluster) –
- {[type]} -- [description] –
-
chemdataextractor.relex.utils.
mode_rows
(a)[source]¶ Find the modal row of a 2d array :param a: The 2d array to process :type a: np.array() :return: The most frequent row
-
chemdataextractor.relex.utils.
KnuthMorrisPratt
(text, pattern)[source]¶ - Yields all starting positions of copies of the pattern in the text.
- Calling conventions are similar to string.find, but its arguments can be lists or iterators, not just strings, it returns all matches, not just the first one, and it does not need the whole text in memory at once. Whenever it yields, it will have read the text exactly up to and including the match that caused the yield.