Getting Started

Install

dlex can be install using pip

pip install dlex

Set up an experiment

Step 1: Folder structure

The following folder structure is for best practice

Experiment/
|-- model_configs
|-- model_outputs*
|-- model_reports*
|-- logs*
|-- saved_models*
|-- src
|   |-- datasets
|   |   |-- <dataset>.py
|   |-- models
|   |   |-- <model>.py
|-- README.md

Model parameters and outputs are saved to ./saved_models and ./model_outputs unless DLEX_SAVED_MODELS_PATH and DLEX_MODEL_OUTPUTS_PATH is specified. The folders with * do not need to be created manually, instead they will be created during training and evaluation if missing.

Step 2: Define dataset

Two classes are required to load and inject data to model.

  • DatasetBuilder: handle downloading and preprocessing data. DatasetBuilder should be framework and config independent.

  • PytorchDataset, TensorflowDataset, SklearnDataset: handle loading dataset from the storage; perform shuffle, sorting, batchification, etc. using concepts from each framework

from dlex.configs import AttrDict
from dlex.datasets.torch import PytorchDataset
from dlex.datasets.builder import DatasetBuilder

class SampleDatasetBuilder(DatasetBuilder):
    def __init__(self, params: AttrDict):
        super().__init__(params)

    def maybe_download_and_extract(self, force=False):
        super().maybe_download_and_extract(force)
        # Download dataset...
        # self.download_and_extract([some url], self.get_raw_data_dir())

    def maybe_preprocess(self, force=False):
        super().maybe_preprocess(force)
        # Preprocess data...

    def get_pytorch_wrapper(self, mode: str):
        return PytorchSampleDataset(self, mode)

class PytorchSampleDataset(PytorchDataset):
    def __init__(self, builder, mode):
        super().__init__(builder, mode)
        # Load data from preprocessed files...

In this example, we use dlex.datasets.image.MNIST and dlex.datasets.image.CIFAR10. You do not need to write even a line of code!

Step 3: Construct model

Besides standard module definition, model also needs to support loss calculation, training, predicting and outputting prediction to specified format.

Model receives a MainConfig instance containing all configurations and a dataset instance (which is one of PytorchDataset, TensorflowDataset or SklearnDataset depending on the framework).

import torch.nn as nn

from dlex.torch.models.base import ClassificationModel
from dlex.torch import Batch


class VGG(ClassificationModel):
    LAYERS = {
        'VGG11': [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
        'VGG13': [64, 64, 'M', 128, 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
        'VGG16': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'],
        'VGG19': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M', 512, 512, 512, 512,
                  'M'],
    }

    def __init__(self, params, dataset):
        super().__init__(params, dataset)

        cfg = params.model
        layers = []
        in_channels = dataset.num_channels
        for x in self.LAYERS[cfg.vgg_type or 'VGG11']:
            if x == 'M':
                layers.append(nn.MaxPool2d(kernel_size=2, stride=2))
            else:
                layers.append(nn.Conv2d(in_channels, x, kernel_size=3, padding=1))
                layers.append(nn.BatchNorm2d(x))
                layers.append(nn.ReLU(inplace=True))
                in_channels = x
        layers.append(nn.AvgPool2d(kernel_size=1, stride=1))
        self.features = nn.Sequential(*layers)

        self.classifier = nn.Linear(512, dataset.num_classes)
        self.softmax = nn.LogSoftmax(dim=0)

    def forward(self, batch: Batch):
        out = self.features(batch.X)
        out = out.view(out.size(0), -1)
        out = self.classifier(out)
        out = self.softmax(out)
        return out

Step 4: Configurations

Lots of code can be reduced with configurations. See the complete guide in Configurations.

backend: pytorch
env:
  cifar:
    title: Results on CIFAR10
    variables:
      dataset:
        - dlex.datasets.image.CIFAR10
      vgg_type:
        - VGG16
        - VGG19
model:
  name: src.models.vgg.VGG
  vgg_type: ~vgg_type
dataset:
  name: ~dataset
train:
  eval: [dev, test]
  batch_size: 256
  num_epochs: 30
  optimizer:
    name: adam
    lr: 0.01
  max_grad_norm: 1.0
test:
  metrics: [err]

Step 5: Train & evaluate

Training and evaluation are handled by dlex. Simply execute these commands:

dlex train -c <config_path>
dlex evaluate -c <config_path>
dlex infer -c <config_path>

or

python -m dlex.train -c <config_path>