Configurations¶

Top-level entries¶

backend

Must be included. The following backends are supported

pytorch / torch: use dlex.torch.BaseModel as base model and dlex.datasets.torch.Dataset as base dataset
tensorflow / tf: partially supported
sklearn: partially supported

random_seed

Random seed for all randomization engines (numpy, random, torch, tensorflow etc.)

Environment¶

Assume that you want to tune hyper-parameters or run several experiments with similar configurations (e.g. run a model on different dataset). Environment provides a convenient interface to define multiple experiments within a single configuration file.

Each environment has a name and several entries. The argument --env, if specified, indicates which environment(s) to run.

python -m dlex.train -c ./model_configs/demo.yml --env small

Entries for each variable:

variables

Each environment comes with a list of variables and values. Value of each variable can be a single value or a list. If it is a list, all possible combinations of variable values will be examined.

report

When some variables are assigned multiple values, each combination of them will give one or some results. Use report to specify how these results are displayed in report.

type: table or raw
row / col: when type is table, indicate name of the variable displayed as row / col

default

Set to false if the env is not included in default execution. In that case, it can only be run with --env in the command. All the environments are run by default.

Below is an example of a config file with environments.

env:
  small:
    variables:
      dataset: [list of data sets]
      num_layers: [list of model properties 'num_layers']
      batch_size: 128
    report:
      type: table
      row: dataset
      col: num_layers
  large:
    variables:
      dataset: [list of data sets]
      batch_size: 32
      num_layers: 19
dataset:
  name: ~dataset
  ...
model:
  num_layers: ~num_layers
  ...
train:
  batch_size: ~batch_size

Model¶

Model option is defined in model at top level, which is passed to the model class and can be accessed in self.configs. Standard configs include:

name:: relative path to model class (similar to importing a module)

and model hyper-parameters (dimension, number of layers, etc.)

Data Set¶

Data set option is defined in dataset at top level, which is passed to the model class and can be accessed in self.configs. Standard configs include:

name:: relative path to database class (inherited from dlex.datasets.DatasetBuilder)

Train¶

train entry is mapped into TrainConfig with the following fields and methods

class dlex.configs.TrainConfig(num_epochs: int = None, num_workers: int = None, batch_size: int = None, optimizer: dlex.configs.OptimizerConfig = None, lr_scheduler: dict = None, eval: list = <factory>, max_grad_norm: float = 5.0, save_every: str = '1e', log_every: str = '5s', cross_validation: int = None)¶

Parameters

num_epochs (int) – Number of epochs
batch_size (int) – Batch size
optimizer (OptimizerConfig) –
(dict) (lr_scheduler) –
eval (list) – List of sets to be evaluated during training. Empty: no evaluation. Accepted values: test, dev (or valid). If both test and valid sets are presented, the test result for model with best valid result will also be recoreded. dev and valid can be used interchangeable
max_grad_norm (float) –
save_every (str) – Time interval for saving model. Use s, m, h for number of seconds, minutes, hours. Use e for number of epochs. Examples: 100s, 30m, 2h, 1e
log_every (str) – Time interval for logging to file

class dlex.configs.OptimizerConfig(name: str = 'sgd')¶

Args:: name (str): One of sgd, adam

Test¶

test entry is mapped into TestConfig with the following fields and methods

class dlex.configs.TestConfig(batch_size: int = None, metrics: list = <factory>)¶

Parameters

batch_size (int) –
metrics (list) – List of metrics for evaluation.