Inferred Properties

ChemDataExtractor provides the InferredProperty class so that you can save time on boilerplate code in your parser by making relationships between fields explicit. Let’s look at how you can do this by looking at how this functionality is used within the ChemDataExtractor library to express the relationship between raw_value and value in the QuantityModel class.

class QuantityModel(six.with_metaclass(_QuantityModelMeta, BaseModel)):
    raw_value = StringType(required=True, contextual=True)
    raw_units = StringType(required=True, contextual=True)
    value = InferredProperty(ListType(FloatType(), sorted_=True),
                             origin_field='raw_value', inferrer=infer_value, contextual=True)
    units = InferredProperty(UnitType(),
                             origin_field='raw_units', inferrer=infer_unit, contextual=True)
    error = InferredProperty(FloatType(),
                             origin_field='raw_value', inferrer=infer_error, contextual=True)

Let’s break down what we’ve done with the value property step by step. First, we specify that it’s an InferredProperty. Similarly to what we do with ListType, we first pass in the type of the content. In this case, it’s a sorted list of floats. We can then specify the origin field, and the inferrer used. The inferrer is a function which takes as input the value of the origin field, and the BaseModel instance for which the value is being inferred, and returns the inferred value, or None. Let’s take a look at what the inferrer for inferring values looks like:

def infer_value(string, instance):
    value = None
    if string != 'NoValue' and string != '':
        try:
            value = extract_value(string)
        except (TypeError, IndexError) as e:
            log.debug(e)
    return value

So here we can see the implementation for infer_value as included in ChemDataExtractor. The implementation is incredibly simple; it just tries to extract the value from the string and returns the extracted values.

Having defined the relationship and written this function, you no longer need to write any custom interpretation code in your parser. Any parser that extracts a QuantityModel will automatically default to using the infer_value function when required to extract the value.

Note

While ChemDataExtractor will default to using the infer_value function in this case, if your parser includes custom behaviour to set the value property, that will take priority and automatically override the inferring of properties for that field.