Automated parsing¶

Automated parsers in ChemDataExtractor will extract data from tables and from simple sentences. First we need to import the needed elements from ChemDataExtractor:

In [2]:

from chemdataextractor.doc import Document
from chemdataextractor.doc.table import Table
from chemdataextractor.model.units import TemperatureModel
from chemdataextractor.model.model import Compound, ModelType, StringType
from chemdataextractor.parse.elements import I
from chemdataextractor.parse.actions import join

Then we have to define a model. We are setting the mandatory element specifier and a compound.

In [3]:

class GlassTransitionTemperature(TemperatureModel):
    specifier_expr = ((I('Glass') + I('transition') + I('temperature')) | I('Tg')).add_action(join)
    specifier = StringType(parse_expression=specifier_expr, required=True, contextual=True, updatable=True)
    compound = ModelType(Compound, required=True, contextual=True)

Finally, we can parse a paper:

In [4]:

doc = Document.from_file("./data/2016.03.103.xml")
doc.models = [GlassTransitionTemperature]

for record in doc.records:
    print(record.serialize())

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-4-822410711194> in <module>
----> 1 doc = Document.from_file("./data/j.jallcom.2016.03.103.xml")
      2 doc.models = [GlassTransitionTemperature]
      3
      4 for record in doc.records:
      5     print(record.serialize())

~/Documents/cdestuff/Development/cdedev/chemdataextractor/doc/document.py in from_file(cls, f, fname, readers)
    158         """
    159         if isinstance(f, six.string_types):
--> 160             f = io.open(f, 'rb')
    161         if not fname and hasattr(f, 'name'):
    162             fname = f.name

FileNotFoundError: [Errno 2] No such file or directory: './data/j.jallcom.2016.03.103.xml'

In [ ]: