How To Add A New Model

ResINS strives to be a repository for all models of INS instruments, but it is inevitable that there will be some that have not been included yet. Fortunately, ResINS was designed to make adding new models as straightforward as possible, though depending on how unique the model is, it may require significant amount of work:

How to add a new parameter-only model

Adding a new model which only alters the parameters of another, existing model is the simpler task; all that is required is to edit the YAML data file of the corresponding instrument. Now, there is going to be some difference in how to do this when creating a model for personal use only and when contributing to ResINS, but the overall process is identical to adding a new version, so see that guide for details. Here, going forward, it is assumed that you have a YAML data file ready for editing.

A model is a “property” of a version of an instrument and so it must be added to a particular version (though YAML magic can be used to avoid repetition). Therefore, to add a new model, up to two new entries must be added inside models key of a particular version of an instrument:

  • If adding a new version of an existing model:

    1. A new entry must be added whose key has the same name as the previous versions, but whose version number is incremented by one. E.g., given a model called model1 and which has versions model1_v1 and model1_v2, the new model should be called model1_v3.

      • The corresponding value should be a dictionary containing the data for the model, see below for details.

    2. If this new version should become the new default version for the model (for example in the case of a bugfix), please edit the associated alias entry to point to the new version. For example, using the above names, there should be an entry model1: "model1_v2" which should be changed to model1: "model1_v3".

      • In this case, the key-value pair should already exist and only needs to be changed.

      • The value of this entry must be kept as a string which must correspond to a valid key in the models dictionary.

  • If adding a new model, unrelated to the others:

    1. A new entry must be added whose key has the name following the schema: {model_name}_v1 (since this is a new model, its first version should have the number 1), e.g. model1_v1.

      • The corresponding value should be a dictionary containing the data for the model, see below for details.

    2. A new entry must be added whose key is only the model_name from above.

      • The corresponding value must be a string whose value must be the key added previously, so (using the above example) model1: "model1_v1".

How to specify model parameters

With the model keys added, the next step is to add the data associated with the versioned model. The data must follow the YAML file spec, where the guidance on what belongs where can also be found, but the general points to keep in mind are as follows:

  • The function entry must have a value that corresponds to one of the existing models. See resins.models.MODELS for a dictionary that maps these function values to ResINS model objects. The created model will use the corresponding object.

  • The parameters dictionary must contain all the parameters that the relevant model expects (see the associated API documentation, especially that of the ModelData subclass) and that is not included in configurations.

    • It might be useful to look at the existing YAML files that use the same model as the configurations are likely to be the same. Normally, only the values of some of the parameters are likely to be different between different use-cases of the same model.

    • Ultimately, the parameters and configurations depend on the mathematics/physics of the model and the physical INS instrument. If unsure, and especially when contributing to ResINS, do not hesitate to contact us on our GitHub.

How to add a new algorithm for a new model

Creating a new model that uses new physics/mathematics - ones that are not yet implemented in ResINS - is significantly more work than the case above, since Python code will have to be written. Though, before we start, some notes on the procedures for different outcomes:

  • For personal use, the new code can be placed wherever - it will have to be registered with ResINS as explained below.

  • If contributing to ResINS, please use a new file in resins/src/resins/models.

How to add a new model data class

The first step should be creating a new class that will hold all the data accessible to the new model. It does not have to be completed immediately - it is ok to continue adding to it as the model is being developed - but it is a good starting point since the model class will need this class.

The data class must be a subclass of resins.models.model_base.ModelData (please read its documentation for details on how it and its subclasses are supposed to work) and it must be decorated with the dataclasses.dataclass() decorator with the following arguments:

  • init=True

  • repr=True

  • frozen=True

  • slots=True

  • kw_only=True

Thus, for example:

from dataclasses import dataclass
from resins.models.model_base import ModelData

@dataclass(init=True, repr=True, frozen=True, slots=True, kw_only=True)
class TestModelData(ModelData):
    param1: int
    param2: bool

The parameters in this class should be specified as is normal for a dataclass and should represent all the parameters required by the model. I.e., these should be the combination of the values from the YAML file parameters and from the chosen options. How these required values will be split between the two places does not matter for the Python code as get_resolution_function() combines the two before creating the data object (i.e. TestModelData).

Note

If the YAML file is going to contain default values and/or restrictions on some arguments for the model (e.g. the model takes e_init and YAML file specifies that the default value is 100 and that only values between 10 and 1000 are allowed), you will need to reimplement the resins.models.model_base.ModelData.defaults and/or resins.models.model_base.ModelData.restrictions properties (the documentation does not need to be overwritten, just the code). For example:

from dataclasses import dataclass
from typing import Any
from resins.models.model_base import ModelData

@dataclass(init=True, repr=True, frozen=True, slots=True, kw_only=True)
class TestModelData(ModelData):
    default_e_init: int
    e_init_restrictions: list[int]

    @property
    def defaults(self) -> dict[str, list[int | float]]:
        return {'e_init': self.default_e_init}

    @property
    def restrictions(self) -> dict[str, Any]:
        return {'e_init': self.e_init_restrictions}

How to add a new model class

With the data class in place, it is possible to create the model, which is a subclass of resins.models.model_base.InstrumentModel (see its documentation for detailed specification of how to inherit from it). This must specify three class-level variables:

  • input - an integer specifying the number of arguments that the ` __call__` method takes

  • output - an integer specifying the number of outputs the __call__ method returns

  • data_class - a reference to the data class created above, for example:

class TestModel(InstrumentModel):
    input = 1
    output = 1
    data_class = TestModelData

Next, the __init__ method should be defined. The function signature:

  • Must take model_data (an instance of the data class above) as its first argument.

  • May take any number of other argument (usually representing settings, but may be others).

  • Must not expect the independent variables (i.e. the variables that the model is a function of such as energy transfer or momentum).

  • Must take **kwargs.

The body of __init__():

  • Must call super().__init__.

  • Should (if necessary) perform any validation of the arguments, e.g. that the e_init is within the allowed range etc.

  • Should perform as much of the calculation as possible without the independent variables. These pre-computed, intermediate values should be stored as instance variables.

    • For more complex calculations, it might be advisable to break them up into multiple methods. Any such methods should be private (i.e. start with _) and, if possible, should be made @staticmethod or @classmethod.

  • It should not keep a reference to model_data.

For example:

from resins.models.model_base import InstrumentModel, InvalidInputError

class TestModel(InstrumentModel):
    def __init__(self, model_data: TestModelData, e_init: float | None = None, **_):
        super().__init__(model_data)

        if e_init is None:
            e_init = model_data.default_e_init
        elif not model_data.e_init_restrictions[0] <= e_init <= model_data.e_init_restrictions[1]:
            raise InvalidInputError('Good message')

        self.useful_value = 0.5 * e_init ** 3

Lastly, the __call__ method must be implemented:

  • It must take as arguments all the independent variables that the model models the resolution as a function of.

  • It must accept *args and **kwargs.

  • It should perform the remaining computation of the resolution, using the instance variables.

  • It must return the resolution at the values of independent variables provided via the arguments.

For example:

from jaxtyping import Float
import numpy as np
from resins.models.model_base import InstrumentModel

class TestModel(InstrumentModel):
    def __call__(self, frequencies: Float[np.ndarray, 'frequencies'], *args, **kwargs
                 ) -> Float[np.ndarray, 'sigma']:
        return frequencies * self.useful_value

Then, with the model code complete, all that remains is to register it with ResINS:

  • If the above code is outside the ResINS repo (i.e. for personal use), somewhere in your program (before the model is intended to be used) a new key-value pair has to be inserted into resins.models.MODELS:

from resins.models import MODELS
from resins import Instrument

from custom_model_source import TestModel

MODELS['test_model'] = TestModel

instr = Instrument.from_file('path/to/data.yaml', 'version')
model = instrument.get_resolution_function(model_name='model1', e_init=200)
assert isinstance(model, TestModel)
  • If the above code is inside the ResINS repo (i.e. to be submitted to the code-base), the resins.models.MODELS dictionary (found at resins/src/resins/models/__init__.py) has to be modified by adding a new key-value pair, where the key is the “name” of the function and the value is a reference to the above-created model.

    • The key can be anything as it is not exposed to the user - it is only present here and in the YAML data files, but it should be somewhat relevant. The only thing that matters is that it is globally unique.

from resins.models.test_model import TestModel

MODELS = {
    ...
    'test_model': TestModel,
}

How to add the data

Now, with all the code in place, only the data that the model will use has to be added. Since the code is written, the case has effectively become the same as adding a parameter-only model (see the guide for more details). The only difference is that, in this case, it is possible to tweak the code (if necessary) to make everything nicer. Along with that, though, comes the responsibility of structuring the data appropriately - there is no example among the other YAML files to look to. How to do this is up to you, but some points of advice are:

  • The configurations section should reflect the physical INS instrument, see configuration.

    • All parameters that change depending on which option for a configuration is chosen should be in the configurations section. The should not be any parameters that are present in both the parameters and configurations sections.

  • If the parameters section contains a large number of parameters, it can be a good idea to group some of these parameters into dictionaries.

    • This is especially recommended if there is a logical reason for the grouping, for example grouping all the moderator parameters together.

    • To maintain type hinting, the corresponding fields in the Python (the model data class) can be made in typing.TypedDict:

model1_v1:
    parameters:
        param1: 1
        param2: 2
        param3: 3
        param4: 4

and

class TestModelData(ModelData):
    param1: int
    param2: int
    param3: int
    param4: int

can be changed into:

model1_v1:
    parameters:
        param1: 1
        group1:
            param2: 2
            param3: 3
            param4: 4

and

class TestModelData(ModelData):
    param1: int
    group1: Group1

class Group1(TypedDict):
    param2: int
    param3: int
    param4: int