.biblio¶
Misc tools for parsing bibliographic information such as bibtex files, author names etc.
Tools for dealing with bibliographic information.
.biblio.bibtex¶
BibTeX parser.
-
class
chemdataextractor.biblio.bibtex.
BibtexParser
(data, **kwargs)[source]¶ Bases:
object
A class for parsing a BibTeX string into JSON or a python data structure.
Example usage:
with open(example.bib, 'r') as f: bib = BibtexParser(f.read()) bib.parse() print bib.records_list print bib.json
-
__init__
(data, **kwargs)[source]¶ Initialize BibtexParser with data.
Optional metadata passed as keyword arguments will be included in the JSON output. e.g. collection, label, description, id, owner, created, modified, source
Example usage:
bib = BibtexParser(data, created=unicode(datetime.utcnow()), owner='mcs07')
-
classmethod
parse_names
(names)[source]¶ Parse a string of names separated by “and” like in a BibTeX authors field.
-
size
¶ Return the number of records parsed.
-
records_list
¶ Return the records as a list of dictionaries.
-
metadata
¶ Return metadata for the parsed collection of records.
-
json
¶ Return a list of records as a JSON string. Follows the BibJSON convention.
-
.biblio.person¶
Tools for parsing people’s names from strings into various name components.
-
class
chemdataextractor.biblio.person.
PersonName
(fullname=None, from_bibtex=False)[source]¶ Bases:
dict
Class for parsing a person’s name into its constituent parts.
Parses a name string into title, firstname, middlename, nickname, prefix, lastname, suffix.
Example usage:
p = PersonName('von Beethoven, Ludwig')
PersonName acts like a dict:
print p print p['firstname'] print json.dumps(p)
Name components can also be access as attributes:
print p.lastname
Instances can be reused by setting the name property:
p.name = 'Henry Ford Jr. III' print p
Two PersonName objects are equal if every name component matches exactly. For fuzzy matching, use the could_be method. This returns True for names that are not explicitly inconsistent.
This class was written with the intention of parsing BibTeX author names, so name components enclosed within curly brackets will not be split.
-
fullname
¶
-
.biblio.xmp¶
Parse metadata stored as XMP (Extensible Metadata Platform).
This is commonly embedded within PDF documents, and can be extracted using the PDFMiner framework.
More information is available on the Adobe website:
-
class
chemdataextractor.biblio.xmp.
XmpParser
(ns_map={'http://crossref.org/crossmark/1.0/': 'crossmark', 'http://ns.adobe.com/pdf/1.3/': 'pdf', 'http://ns.adobe.com/pdfx/1.3/': 'pdfx', 'http://ns.adobe.com/xap/1.0/': 'xap', 'http://ns.adobe.com/xap/1.0/mm/': 'xapmm', 'http://ns.adobe.com/xap/1.0/rights/': 'rights', 'http://prismstandard.org/namespaces/basic/2.0/': 'prism', 'http://purl.org/dc/elements/1.1/': 'dc', 'http://www.w3.org/1999/02/22-rdf-syntax-ns#': 'rdf', 'http://www.w3.org/XML/1998/namespace': 'xml'})[source]¶ Bases:
object
A parser that converts an XMP metadata string into a python dictionary.
Usage:
parser = XmpParser() metadata = parser.parse(xmpstring)
Common namespaces are abbreviated in the output using the definitions in
xmp.NS_MAP
. If an abbreviation for a namespace is not defined inNS_MAP
, the full URL is used as the key in the output dictionary. It is possible to overrideNS_MAP
when initializing the parser:parser = XmpParser(ns_map={'http://www.w3.org/XML/1998/namespace': 'xml'}) metadata = parser.parse(xmpstring)
-
__init__
(ns_map={'http://crossref.org/crossmark/1.0/': 'crossmark', 'http://ns.adobe.com/pdf/1.3/': 'pdf', 'http://ns.adobe.com/pdfx/1.3/': 'pdfx', 'http://ns.adobe.com/xap/1.0/': 'xap', 'http://ns.adobe.com/xap/1.0/mm/': 'xapmm', 'http://ns.adobe.com/xap/1.0/rights/': 'rights', 'http://prismstandard.org/namespaces/basic/2.0/': 'prism', 'http://purl.org/dc/elements/1.1/': 'dc', 'http://www.w3.org/1999/02/22-rdf-syntax-ns#': 'rdf', 'http://www.w3.org/XML/1998/namespace': 'xml'})[source]¶ Initialize self. See help(type(self)) for accurate signature.
-