Template Parsers

New in CDE v2.0.0 we have automated parser templates for simple quantity models (e.g. boiling point). These parsers are designed to work pretty well “out of the box” on most properties and can really easily extended to fit to new model types. These parsers work with higher precision than AutoSentenceParser, which is primarily used for Snowball.

Currently we have 2 template parsers:

  1. chemdataextractor.parse.template.QuantityModelTemplateParser : for simple quantity models (CEM, Specifier, Value, Unit)
  2. chemdataextractor.parse.template.MultiQuantityModelTemplateParser : For sentences that contain multiple relationships in one sentence e.g. ‘The respectively phrase’
In [1]:
from chemdataextractor.parse.template import QuantityModelTemplateParser, MultiQuantityModelTemplateParser

These parsers have multiple phrase built-ins that return parse phrases. These can be viewed with dir

In [2]:
[i for i in dir(QuantityModelTemplateParser) if not i.startswith('__')]
Out[2]:
['_get_data',
 '_root_phrase',
 '_specifier',
 'cem_after_specifier_and_value_phrase',
 'cem_before_specifier_and_value_phrase',
 'cem_phrase',
 'extract_error',
 'extract_units',
 'extract_value',
 'interpret',
 'model',
 'parse_sentence',
 'prefix',
 'root',
 'specifier_and_value',
 'specifier_before_cem_and_value_phrase',
 'specifier_phrase',
 'trigger_phrase',
 'value_phrase',
 'value_specifier_cem_phrase']

We can use these parsers like any other, by adding them to your models.

In [3]:
from chemdataextractor.model.units.temperature import TemperatureModel
from chemdataextractor.parse.elements import I
from chemdataextractor.model import Compound, StringType, ModelType
from chemdataextractor.doc import Sentence

class MyTemperatureModel(TemperatureModel):
    specifier = StringType(parse_expression=I('Tc'), required=True)
    compound = ModelType(Compound, required=True)
    parsers = [QuantityModelTemplateParser(), MultiQuantityModelTemplateParser()]

The parsers should work and pretty much all basic sentences

In [4]:
s = Sentence('It was found that BiFeO3 is really cool and has a Tc of 1093 K.')
s.models = [MyTemperatureModel]
In [5]:
import pprint
In [6]:
pprint.pprint(s.records.serialize())
[{'Compound': {'names': ['BiFeO3']}}]

As previously mentioned we can also do respecitively-type phrases

In [7]:
s = Sentence('LaMnO3 and HoMnO3 exhibit crazy values with Tc equal to 100 and 200 K, respectively')
s.models = [MyTemperatureModel]
pprint.pprint(s.records.serialize())
[{'Compound': {'names': ['LaMnO3']}},
 {'Compound': {'names': ['HoMnO3']}},
 {'MyTemperatureModel': {'compound': {'Compound': {'names': ['HoMnO3']}},
                         'raw_units': 'K',
                         'raw_value': '200',
                         'specifier': 'Tc',
                         'units': 'Kelvin^(1.0)',
                         'value': [200.0]}},
 {'MyTemperatureModel': {'compound': {'Compound': {'names': ['LaMnO3']}},
                         'raw_units': 'K',
                         'raw_value': '100',
                         'specifier': 'Tc',
                         'units': 'Kelvin^(1.0)',
                         'value': [100.0]}}]

Creating new Templates

The templates are good starting points but you can of course create your own new ones. Simply create a new clas that inherets from BaseAutoParser and BaseSentenceParser. All you need to implement is a root property however you can happily override the interpret functions too, if you wish. Take a look into the template.py file for examples.