Template Parsers¶
New in CDE v2.0.0 we have automated parser templates for simple quantity models (e.g. boiling point). These parsers are designed to work pretty well “out of the box” on most properties and can really easily extended to fit to new model types. These parsers work with higher precision than AutoSentenceParser, which is primarily used for Snowball.
Currently we have 2 template parsers:
chemdataextractor.parse.template.QuantityModelTemplateParser
: for simple quantity models (CEM, Specifier, Value, Unit)chemdataextractor.parse.template.MultiQuantityModelTemplateParser
: For sentences that contain multiple relationships in one sentence e.g. ‘The respectively phrase’
In [1]:
from chemdataextractor.parse.template import QuantityModelTemplateParser, MultiQuantityModelTemplateParser
These parsers have multiple phrase built-ins that return parse phrases. These can be viewed with dir
In [2]:
[i for i in dir(QuantityModelTemplateParser) if not i.startswith('__')]
Out[2]:
['_get_data',
'_root_phrase',
'_specifier',
'cem_after_specifier_and_value_phrase',
'cem_before_specifier_and_value_phrase',
'cem_phrase',
'extract_error',
'extract_units',
'extract_value',
'interpret',
'model',
'parse_sentence',
'prefix',
'root',
'specifier_and_value',
'specifier_before_cem_and_value_phrase',
'specifier_phrase',
'trigger_phrase',
'value_phrase',
'value_specifier_cem_phrase']
We can use these parsers like any other, by adding them to your models.
In [3]:
from chemdataextractor.model.units.temperature import TemperatureModel
from chemdataextractor.parse.elements import I
from chemdataextractor.model import Compound, StringType, ModelType
from chemdataextractor.doc import Sentence
class MyTemperatureModel(TemperatureModel):
specifier = StringType(parse_expression=I('Tc'), required=True)
compound = ModelType(Compound, required=True)
parsers = [QuantityModelTemplateParser(), MultiQuantityModelTemplateParser()]
The parsers should work and pretty much all basic sentences
In [4]:
s = Sentence('It was found that BiFeO3 is really cool and has a Tc of 1093 K.')
s.models = [MyTemperatureModel]
In [5]:
import pprint
In [6]:
pprint.pprint(s.records.serialize())
[{'Compound': {'names': ['BiFeO3']}}]
As previously mentioned we can also do respecitively-type phrases
In [7]:
s = Sentence('LaMnO3 and HoMnO3 exhibit crazy values with Tc equal to 100 and 200 K, respectively')
s.models = [MyTemperatureModel]
pprint.pprint(s.records.serialize())
[{'Compound': {'names': ['LaMnO3']}},
{'Compound': {'names': ['HoMnO3']}},
{'MyTemperatureModel': {'compound': {'Compound': {'names': ['HoMnO3']}},
'raw_units': 'K',
'raw_value': '200',
'specifier': 'Tc',
'units': 'Kelvin^(1.0)',
'value': [200.0]}},
{'MyTemperatureModel': {'compound': {'Compound': {'names': ['LaMnO3']}},
'raw_units': 'K',
'raw_value': '100',
'specifier': 'Tc',
'units': 'Kelvin^(1.0)',
'value': [100.0]}}]
Creating new Templates¶
The templates are good starting points but you can of course create your own new ones. Simply create a new clas that inherets from BaseAutoParser and BaseSentenceParser. All you need to implement is a root property however you can happily override the interpret functions too, if you wish. Take a look into the template.py file for examples.