API Reference

Note

Private methods are not included in the documentation!

config

Config file reader/writer.

data

Tools for loading and caching data files.

errors

Error classes for ChemDataExtractor.

utils

Miscellaneous utility functions.


biblio

Tools for dealing with bibliographic information.

biblio.bibtex

BibTeX parser.

biblio.person

Tools for parsing people’s names from strings into various name components.

biblio.xmp

Parse metadata stored as XMP (Extensible Metadata Platform).


cli

ChemDataExtractor command line interface.

cli.cem

Chemical entity mention (CEM) commands.

cli.chemdner

Command line tools for dealing with CHEMDNER corpus.

cli.cluster

Word clusters command-line interface.

cli.config

Commands for managing ChemDataExtractor configuration.

cli.data

Data and model management interface.

cli.dict

Commands for building a dictionary-based chemical named entity recognizer.

cli.evaluate

Commands for running evaluations.

cli.pos

Part of speech tagging commands.

cli.tokenize

Tokenizer command line interface.


doc

Document processing.

doc.document

Document model.

doc.element

Document elements.

doc.figure

Figure document elements.

doc.meta

MetaData Document elements

doc.table

Table document elements

doc.text

Text-based document elements.


eval

Evaluation of extraction results

eval.evaluation


model

Classes for representing chemical models.

model.base

Data model for extracted information.

model.model

Model classes for physical properties.

model.units

Types for representing quantities, dimensions, and units.

model.units.unit

Base types for making units.

model.units.dimension

Base types for dimensions.

model.units.quantity_model

Base types for making quantity models.

model.units.length

Units and models for lengths.

model.units.mass

Units and models for masses.

model.units.time

Units and models for times.

model.units.temperature

Units and models for temperatures.


nlp

Chemistry-aware natural language processing framework.

nlp.abbrev

Abbreviation detection.

nlp.new_cem

New and improved named entity recognition (NER) for Chemical entity mentions (CEM).

nlp.allennlpwrapper

Tagger wrappers that wrap AllenNLP functionality.

nlp.cem

Named entity recognition (NER) for Chemical entity mentions (CEM).

nlp.corpus

Tools for reading and writing text corpora.

nlp.lexicon

Cache features of previously seen words.

nlp.pos

Part-of-speech tagging.

nlp.tag

Tagger implementations.

nlp.tokenize

Word and sentence tokenizers.


parse

Parse text using rule-based grammars.

parse.actions

Actions to perform during parsing.

parse.auto

Parser for automatic parsing, without user-written parsing rules.

parse.base

Base classes for parsing sentences and tables.

parse.cem

Chemical entity mention parser elements.

parse.common

Common parser elements.

parse.context

parse.elements

Parser elements.

parse.ir

IR spectrum text parser.

parse.mp

NMR text parser.

parse.nmr

NMR text parser.

parse.tg

Glass transition temperature parser.

parse.uvvis

UV-vis text parser.


reader

Reader classes that read a file and produce a ChemDataExtractor Document object.

reader.acs

Readers for documents from the ACS.

reader.base

Abstract base classes for document readers.

reader.cssp

Readers for ChemSpider SyntheticPages.

reader.markup

XML and HTML readers based on lxml.

reader.nlm

Readers for NLM Journal Archiving and Interchange DTD XML files.

reader.pdf

PDF document reader.

reader.plaintext

Plain text document reader.

reader.rsc

Readers for documents from the RSC.

reader.uspto

Readers for USPTO patents.

reader.elsevier

Elsevier XML reader

reader.springer

Readers for documents from Springer.


relex

relex.cluster

Cluster of phrase objects and associated cluster dictionaries

relex.entity

Extraction pattern object

relex.pattern

Extraction pattern object

relex.phrase

Phrase object

relex.relationship

Classes for defining new chemical relationships

relex.snowball

relex.utils

Various utility functions


scrape

Declarative scraping framework for extracting structured data from HTML and XML documents.

scrape.base

Abstract base classes that define the interface for Scrapers, Fields, Crawlers, etc.

scrape.clean

Clean HTML or XML by removing tags completely or replacing with their contents.

scrape.csstranslator

Extend cssselect to improve handling of pseudo-elements.

scrape.entity

An entity to extract.

scrape.fields

Fields to define on an entity.

scrape.scraper

Concrete classes for scraping and searching.

scrape.selector

Tool for selecting content from HTML or XML using CSS or XPath expressions.

scrape.pub

Scraping tools for specific publishers.

scrape.pub.nlm

Tools for scraping documents from NLM Journal Archiving and Interchange DTD XML files.

scrape.pub.rsc

Tools for scraping documents from The Royal Society of Chemistry.

scrape.pub.springer

Tools for scraping documents from Springer, Biomed Central and Chemistry Central XML files.

scrape.pub.elsevier

Tools for scraping documents from Elsevier.


text

Tools for processing text.

text.chem

Chemistry text handling tools.

text.latex

Tools for converting LaTeX to unicode.

text.normalize

Tools for normalizing text.

text.processors

Text processors.