Configurations

Top-level entries

backend

Must be included. The following backends are supported

  • pytorch / torch: use dlex.torch.BaseModel as base model and dlex.datasets.torch.Dataset as base dataset

  • tensorflow / tf: partially supported

  • sklearn: partially supported

random_seed

Random seed for all randomization engines (numpy, random, torch, tensorflow etc.)

Environment

Assume that you want to tune hyper-parameters or run several experiments with similar configurations (e.g. run a model on different dataset). Environment provides a convenient interface to define multiple experiments within a single configuration file.

Each environment has a name and several entries. The argument --env, if specified, indicates which environment(s) to run.

python -m dlex.train -c ./model_configs/demo.yml --env small

Entries for each variable:

variables

Each environment comes with a list of variables and values. Value of each variable can be a single value or a list. If it is a list, all possible combinations of variable values will be examined.

report

When some variables are assigned multiple values, each combination of them will give one or some results. Use report to specify how these results are displayed in report.

  • type: table or raw

  • row / col: when type is table, indicate name of the variable displayed as row / col

default

Set to false if the env is not included in default execution. In that case, it can only be run with --env in the command. All the environments are run by default.

Below is an example of a config file with environments.

env:
  small:
    variables:
      dataset: [list of data sets]
      num_layers: [list of model properties 'num_layers']
      batch_size: 128
    report:
      type: table
      row: dataset
      col: num_layers
  large:
    variables:
      dataset: [list of data sets]
      batch_size: 32
      num_layers: 19
dataset:
  name: ~dataset
  ...
model:
  num_layers: ~num_layers
  ...
train:
  batch_size: ~batch_size

Model

Model option is defined in model at top level, which is passed to the model class and can be accessed in self.configs. Standard configs include:

name:

relative path to model class (similar to importing a module)

and model hyper-parameters (dimension, number of layers, etc.)

Data Set

Data set option is defined in dataset at top level, which is passed to the model class and can be accessed in self.configs. Standard configs include:

name:

relative path to database class (inherited from dlex.datasets.DatasetBuilder)

Train

train entry is mapped into TrainConfig with the following fields and methods

class dlex.configs.TrainConfig(num_epochs: int = None, num_workers: int = None, batch_size: int = None, optimizer: dlex.configs.OptimizerConfig = None, lr_scheduler: dict = None, eval: list = <factory>, max_grad_norm: float = 5.0, save_every: str = '1e', log_every: str = '5s', cross_validation: int = None)
Parameters
  • num_epochs (int) – Number of epochs

  • batch_size (int) – Batch size

  • optimizer (OptimizerConfig) –

  • (dict) (lr_scheduler) –

  • eval (list) – List of sets to be evaluated during training. Empty: no evaluation. Accepted values: test, dev (or valid). If both test and valid sets are presented, the test result for model with best valid result will also be recoreded. dev and valid can be used interchangeable

  • max_grad_norm (float) –

  • save_every (str) – Time interval for saving model. Use s, m, h for number of seconds, minutes, hours. Use e for number of epochs. Examples: 100s, 30m, 2h, 1e

  • log_every (str) – Time interval for logging to file

class dlex.configs.OptimizerConfig(name: str = 'sgd')
Args:

name (str): One of sgd, adam

Test

test entry is mapped into TestConfig with the following fields and methods

class dlex.configs.TestConfig(batch_size: int = None, metrics: list = <factory>)
Parameters
  • batch_size (int) –

  • metrics (list) – List of metrics for evaluation.