Searching RSCΒΆ

In [1]:
from chemdataextractor.scrape.pub.rsc import RscSearchScraper

Imagine we want to look at data from papers involving Aspirin. We first need to download these papers, and the first step to that is to find these documents. To find papers from RSC, we can use the RscSearchScraper class:

In [2]:
query_text = "Aspirin"
scrape = RscSearchScraper().run(query_text)

These results can then be converted to a list of JSON style results, from which a number of properties, e.g. DOIs can be found.

In [3]:
results = scrape.serialize()
print(str(results[0]['doi']).encode('utf-8'))
b'10.1039/c6cp06202d'

Other properties that can be extracted include:

  • Title
  • PDF URL
  • HTML URL
  • Landing URL
  • Journal