chemdataextractor

.config

Config file reader/writer.

chemdataextractor.config.construct_yaml_str(self, node)[source]

Override the default string handling function to always return unicode objects.

class chemdataextractor.config.Config(path=None)[source]

Bases: collections.abc.MutableMapping

Read and write to config file.

A config object is essentially a string key-value store that can be treated like a dictionary:

c = Config()
c['foo'] = 'bar'
print c['foo']

The file location may be specified:

c = Config('~/matt/anotherconfig.yml')
c['where'] = 'in a different file'

If no location is specified, the environment variable CHEMDATAEXTRACTOR_CONFIG is checked and used if available. Otherwise, a standard config location is used, which varies depending on the operating system. You can check the location using the path property. For more information see https://github.com/ActiveState/appdirs

It is possible to edit the file by hand with a text editor. It is in YAML format.

Warning: multiple instances of Config() pointing to the same file will not see each others’ changes, and will overwrite the entire file when any key is changed.

__init__(path=None)[source]
Parameters:path (string) – (Optional) Path to config file location.
path

The path to the config file.

clear()[source]

Clear all values from config.

chemdataextractor.config.config = <Config: /home/docs/.config/ChemDataExtractor/chemdataextractor.yml>

Global config instance.

.data

Tools for loading and caching data files.

class chemdataextractor.data.Package(path)[source]

Bases: object

Data package.

__init__(path)[source]

Initialize self. See help(type(self)) for accurate signature.

remote_path
local_path
remote_exists()[source]
local_exists()[source]
download(force=False)[source]
chemdataextractor.data.PACKAGES = [<Package: models/cem_crf-1.0.pickle>, <Package: models/cem_crf_chemdner_cemp-1.0.pickle>, <Package: models/cem_dict_cs-1.0.pickle>, <Package: models/cem_dict-1.0.pickle>, <Package: models/clusters_chem1500-1.0.pickle>, <Package: models/pos_ap_genia_nocluster-1.0.pickle>, <Package: models/pos_ap_genia-1.0.pickle>, <Package: models/pos_ap_wsj_genia_nocluster-1.0.pickle>, <Package: models/pos_ap_wsj_genia-1.0.pickle>, <Package: models/pos_ap_wsj_nocluster-1.0.pickle>, <Package: models/pos_ap_wsj-1.0.pickle>, <Package: models/pos_crf_genia_nocluster-1.0.pickle>, <Package: models/pos_crf_genia-1.0.pickle>, <Package: models/pos_crf_wsj_genia_nocluster-1.0.pickle>, <Package: models/pos_crf_wsj_genia-1.0.pickle>, <Package: models/pos_crf_wsj_nocluster-1.0.pickle>, <Package: models/pos_crf_wsj-1.0.pickle>, <Package: models/punkt_chem-1.0.pickle>]

Current active data packages

chemdataextractor.data.get_data_dir()[source]

Return path to the data directory.

chemdataextractor.data.find_data(path, warn=True)[source]

Return the absolute path to a data file within the data directory.

chemdataextractor.data.load_model(path)[source]

Load a model from a pickle file in the data directory. Cached so model is only loaded once.

.errors

Error classes for ChemDataExtractor.

exception chemdataextractor.errors.ChemDataExtractorError[source]

Bases: Exception

Base ChemDataExtractor exception.

exception chemdataextractor.errors.ReaderError[source]

Bases: chemdataextractor.errors.ChemDataExtractorError

Raised when a reader is unable to read a document.

exception chemdataextractor.errors.ModelNotFoundError[source]

Bases: chemdataextractor.errors.ChemDataExtractorError

Raised when a model file could not be found.

.utils

Miscellaneous utility functions.

chemdataextractor.utils.memoized_property(fget)[source]

Decorator to create memoized properties.

chemdataextractor.utils.memoize(obj)[source]

Decorator to create memoized functions, methods or classes.

chemdataextractor.utils.python_2_unicode_compatible(klass)[source]

Fix __str__, __unicode__ and __repr__ methods under Python 2.

class chemdataextractor.utils.Singleton[source]

Bases: type

Singleton metaclass.

chemdataextractor.utils.flatten(x)[source]

Return a single flat list containing elements from nested lists.

chemdataextractor.utils.first(el)[source]
chemdataextractor.utils.ensure_dir(path)[source]

Ensure a directory exists.