Automated parsingΒΆ
Automated parsers in ChemDataExtractor will extract data from tables and from simple sentences. First we need to import the needed elements from ChemDataExtractor:
In [2]:
from chemdataextractor.doc import Document
from chemdataextractor.doc.table import Table
from chemdataextractor.model.units import TemperatureModel
from chemdataextractor.model.model import Compound, ModelType, StringType
from chemdataextractor.parse.elements import I
from chemdataextractor.parse.actions import join
Then we have to define a model. We are setting the mandatory element
specifier
and a compound
.
In [3]:
class GlassTransitionTemperature(TemperatureModel):
specifier_expr = ((I('Glass') + I('transition') + I('temperature')) | I('Tg')).add_action(join)
specifier = StringType(parse_expression=specifier_expr, required=True, contextual=True, updatable=True)
compound = ModelType(Compound, required=True, contextual=True)
Finally, we can parse a paper:
In [4]:
doc = Document.from_file("./data/2016.03.103.xml")
doc.models = [GlassTransitionTemperature]
for record in doc.records:
print(record.serialize())
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
<ipython-input-4-822410711194> in <module>
----> 1 doc = Document.from_file("./data/j.jallcom.2016.03.103.xml")
2 doc.models = [GlassTransitionTemperature]
3
4 for record in doc.records:
5 print(record.serialize())
~/Documents/cdestuff/Development/cdedev/chemdataextractor/doc/document.py in from_file(cls, f, fname, readers)
158 """
159 if isinstance(f, six.string_types):
--> 160 f = io.open(f, 'rb')
161 if not fname and hasattr(f, 'name'):
162 fname = f.name
FileNotFoundError: [Errno 2] No such file or directory: './data/j.jallcom.2016.03.103.xml'
In [ ]: