Getting Started¶
Set up an experiment¶
Step 1: Folder structure¶
The following folder structure is for best practice
Experiment/
|-- model_configs
|-- model_outputs*
|-- model_reports*
|-- logs*
|-- saved_models*
|-- src
| |-- datasets
| | |-- <dataset>.py
| |-- models
| | |-- <model>.py
|-- README.md
Model parameters and outputs are saved to ./saved_models
and ./model_outputs
unless DLEX_SAVED_MODELS_PATH
and DLEX_MODEL_OUTPUTS_PATH
is specified. The folders with * do not need to be created manually, instead they will be created during training and evaluation if missing.
Step 2: Define dataset¶
Two classes are required to load and inject data to model.
DatasetBuilder
: handle downloading and preprocessing data.DatasetBuilder
should be framework and config independent.PytorchDataset
,TensorflowDataset
,SklearnDataset
: handle loading dataset from the storage; perform shuffle, sorting, batchification, etc. using concepts from each framework
from dlex.configs import AttrDict
from dlex.datasets.torch import PytorchDataset
from dlex.datasets.builder import DatasetBuilder
class SampleDatasetBuilder(DatasetBuilder):
def __init__(self, params: AttrDict):
super().__init__(params)
def maybe_download_and_extract(self, force=False):
super().maybe_download_and_extract(force)
# Download dataset...
# self.download_and_extract([some url], self.get_raw_data_dir())
def maybe_preprocess(self, force=False):
super().maybe_preprocess(force)
# Preprocess data...
def get_pytorch_wrapper(self, mode: str):
return PytorchSampleDataset(self, mode)
class PytorchSampleDataset(PytorchDataset):
def __init__(self, builder, mode):
super().__init__(builder, mode)
# Load data from preprocessed files...
In this example, we use dlex.datasets.image.MNIST
and dlex.datasets.image.CIFAR10
. You do not need to write even a line of code!
Step 3: Construct model¶
Besides standard module definition, model also needs to support loss calculation, training, predicting and outputting prediction to specified format.
Model receives a MainConfig
instance containing all configurations and a dataset instance (which is one of PytorchDataset
, TensorflowDataset
or SklearnDataset
depending on the framework).
import torch.nn as nn
from dlex.torch.models.base import ClassificationModel
from dlex.torch import Batch
class VGG(ClassificationModel):
LAYERS = {
'VGG11': [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
'VGG13': [64, 64, 'M', 128, 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
'VGG16': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'],
'VGG19': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M', 512, 512, 512, 512,
'M'],
}
def __init__(self, params, dataset):
super().__init__(params, dataset)
cfg = params.model
layers = []
in_channels = dataset.num_channels
for x in self.LAYERS[cfg.vgg_type or 'VGG11']:
if x == 'M':
layers.append(nn.MaxPool2d(kernel_size=2, stride=2))
else:
layers.append(nn.Conv2d(in_channels, x, kernel_size=3, padding=1))
layers.append(nn.BatchNorm2d(x))
layers.append(nn.ReLU(inplace=True))
in_channels = x
layers.append(nn.AvgPool2d(kernel_size=1, stride=1))
self.features = nn.Sequential(*layers)
self.classifier = nn.Linear(512, dataset.num_classes)
self.softmax = nn.LogSoftmax(dim=0)
def forward(self, batch: Batch):
out = self.features(batch.X)
out = out.view(out.size(0), -1)
out = self.classifier(out)
out = self.softmax(out)
return out
Step 4: Configurations¶
Lots of code can be reduced with configurations. See the complete guide in Configurations.
backend: pytorch
env:
cifar:
title: Results on CIFAR10
variables:
dataset:
- dlex.datasets.image.CIFAR10
vgg_type:
- VGG16
- VGG19
model:
name: src.models.vgg.VGG
vgg_type: ~vgg_type
dataset:
name: ~dataset
train:
eval: [dev, test]
batch_size: 256
num_epochs: 30
optimizer:
name: adam
lr: 0.01
max_grad_norm: 1.0
test:
metrics: [err]
Step 5: Train & evaluate¶
Training and evaluation are handled by dlex. Simply execute these commands:
dlex train -c <config_path>
dlex evaluate -c <config_path>
dlex infer -c <config_path>
or
python -m dlex.train -c <config_path>