Configurations¶
Top-level entries¶
- backend
Must be included. The following backends are supported
pytorch / torch: use
dlex.torch.BaseModel
as base model anddlex.datasets.torch.Dataset
as base datasettensorflow / tf: partially supported
sklearn: partially supported
- random_seed
Random seed for all randomization engines (numpy, random, torch, tensorflow etc.)
Environment¶
Assume that you want to tune hyper-parameters or run several experiments with similar configurations (e.g. run a model on different dataset). Environment provides a convenient interface to define multiple experiments within a single configuration file.
Each environment has a name and several entries. The argument --env
, if specified, indicates which environment(s) to run.
python -m dlex.train -c ./model_configs/demo.yml --env small
Entries for each variable:
- variables
Each environment comes with a list of variables and values. Value of each variable can be a single value or a list. If it is a list, all possible combinations of variable values will be examined.
- report
When some variables are assigned multiple values, each combination of them will give one or some results. Use
report
to specify how these results are displayed in report.type
:table
orraw
row
/col
: when type is table, indicate name of the variable displayed as row / col
- default
Set to false if the env is not included in default execution. In that case, it can only be run with
--env
in the command. All the environments are run by default.
Below is an example of a config file with environments.
env:
small:
variables:
dataset: [list of data sets]
num_layers: [list of model properties 'num_layers']
batch_size: 128
report:
type: table
row: dataset
col: num_layers
large:
variables:
dataset: [list of data sets]
batch_size: 32
num_layers: 19
dataset:
name: ~dataset
...
model:
num_layers: ~num_layers
...
train:
batch_size: ~batch_size
Model¶
Model option is defined in model
at top level, which is passed to the model class and can be accessed in self.configs
. Standard configs include:
- name:
relative path to model class (similar to importing a module)
and model hyper-parameters (dimension, number of layers, etc.)
Data Set¶
Data set option is defined in dataset
at top level, which is passed to the model class and can be accessed in self.configs
. Standard configs include:
- name:
relative path to database class (inherited from
dlex.datasets.DatasetBuilder
)
Train¶
train
entry is mapped into TrainConfig
with the following fields and methods
-
class
dlex.configs.
TrainConfig
(num_epochs: int = None, num_workers: int = None, batch_size: int = None, optimizer: dlex.configs.OptimizerConfig = None, lr_scheduler: dict = None, eval: list = <factory>, max_grad_norm: float = 5.0, save_every: str = '1e', log_every: str = '5s', cross_validation: int = None)¶ - Parameters
num_epochs (int) – Number of epochs
batch_size (int) – Batch size
optimizer (OptimizerConfig) –
(dict) (lr_scheduler) –
eval (list) – List of sets to be evaluated during training. Empty: no evaluation. Accepted values: test, dev (or valid). If both test and valid sets are presented, the test result for model with best valid result will also be recoreded. dev and valid can be used interchangeable
max_grad_norm (float) –
save_every (str) – Time interval for saving model. Use s, m, h for number of seconds, minutes, hours. Use e for number of epochs. Examples: 100s, 30m, 2h, 1e
log_every (str) – Time interval for logging to file
-
class
dlex.configs.
OptimizerConfig
(name: str = 'sgd')¶ - Args:
name (str): One of sgd, adam