Dataset

class dlex.datasets.DatasetBuilder(params: dlex.configs.MainConfig)

This is a base class for preparing data. It should handle downloading the data set and creating all files required for training.

download_and_extract(url: str, folder_path: str = None, filename: str = None)

Download and extract from url

Parameters
  • url (str) – url to download

  • folder_path (str, optional) – location for the extracted files. If None, value in get_raw_data_dir is used.

abstract evaluate(pred, ref, metric: str)
Parameters
  • pred

  • ref

  • metric

Returns

abstract format_output(y_pred, batch_item: dlex.torch.datatypes.BatchItem) → Tuple[str, str, str]

Get representations of model inputs and results in readable format

Parameters
  • y_pred

  • batch_item (BatchItem) –

Returns

A tuple containing string representations of input, ground-truth and predicted values

get_processed_data_dir() → str

Get the directory to store pre-processed files

get_raw_data_dir() → str

Get the directory to store raw data set

get_working_dir() → str

Get the working directory

static is_better_result(metric: str, best_result: float, new_result: float)

Compare new result with previous best result

Parameters
  • metric (str) – name of metric

  • best_result (float) – current best result

  • new_result (float) – new result to be compared with

Returns

True if the new result is better with this metric.

abstract maybe_download_and_extract(force=False)
Parameters

force – if True, download and extract even when files are existed

Returns

class dlex.datasets.torch.Dataset(builder, mode: str)

Load data from pre-processed files and prepare batch for training

Parameters
  • builder (DatasetBuilder) –

  • mode (str) – one of train / valid (or dev) / test

shuffle(seed=42)

Shuffle

Parameters

seed (int) –

ImageDataset

class dlex.datasets.image.ImageDataset(params: dlex.configs.MainConfig)
format_output(y_pred, batch_item, tag='default') -> (<class 'str'>, <class 'str'>, <class 'str'>)

Get representations of model inputs and results in readable format

Parameters
  • y_pred

  • batch_item (BatchItem) –

Returns

A tuple containing string representations of input, ground-truth and predicted values

property input_shape
Returns

shape of the input image

property num_channels
Returns

number of channels in the input image

class dlex.datasets.image.torch.PytorchImageDataset(builder, mode)