scripts package

Submodules

scripts.configs module

Model configurations used to generate the presented results.

create_config()[source]

Creates configuration to build three final models:

  1. Independent

  2. Two-Stage

  3. Single-Stage

Returns

Final model configuration

create_conversion_config(input_shape: Tuple[int, int, int]) Dict[source]

Creates configuration to convert trained TF models to TFLite models

Parameters

input_shape – shape of the expected input features

Returns

Conversion configuration

create_tflite_config()[source]

Creates TFLite configuration for the final three models:

  1. Independent

  2. Two-Stage

  3. Single-Stage

Returns

Final TFLite model configuration

scripts.constants module

Commonly used constant values

scripts.convert_models_to_tflite module

Script to convert TF models to TFLite models.

Example:

$ python -m scripts.convert_models_to_tflite

To change the underlying configuration consult ./scripts/configs.py

scripts.generate_cropped_dataset module

Script to create a cropped version of the dataset generated using process_files.py. Maintains pixels

Example:

Input: 200px x 300px Bounding Box: 10px x 20px

Output: 10px x 20px

For usage run ./generate_cropped_dataset.py -h

scripts.helpers module

Collection of helper functions used in script files.

class ProgressBar(max_num_elements: int, size: int = 20)[source]

Bases: object

Helper class to send progress bar data to stdout.

Example:
>>> from time import sleep
>>> pb = ProgressBar(10)
>>> for i in range(10):
...     pb.step(i)
...     sleep(1)
...     if i == 9:
...         pb.done()
[====================] 100%
static done() None[source]

call this to finalize the progress bar

step(index: int) None[source]

Executes a step in the progress bar, this can change the output or reapply previous version, depending on the current index.

Parameters

index – current index

class ThreadWithReturnValue(group=None, target=None, name=None, args=None, kwargs=None, Verbose=None)[source]

Bases: threading.Thread

Custom thread class with return values found at https://stackoverflow.com/a/6894023/6904543

join(*args) Any[source]

Join with return value

run() None[source]

Overwritten Thread.run method

decision(message: str) bool[source]

Helper to ask for decision input

move_files(files: List[str], target_dir: str) List[str][source]

Moves files stored via file names in iterable into target directory

scripts.preselect_files module

Script to prepare iNaturalist image files based on input parameters

calculate_stats(combined_list: List[Union[str, int]]) None[source]
Parameters

combined_list

Returns

create_local_copy(combined_list: List[Union[str, int]], base_dir: str, storage_dir: str, requires_decisions: bool, num_elements: int) List[str][source]

Creates local copy of num_elements samples per record in combined_list from samples in base_dir into storage_dir.

Uses multi-threading.

Parameters
  • combined_list – List of order directories

  • base_dir – parent directory of input files

  • storage_dir – storage directory to create copy in

  • requires_decisions – if True asks for approval before adding a sample

  • num_elements – required amount of samples per order

Returns

List of new file locations

decide_image(img_path: str) bool[source]
Parameters

img_path

Returns

get_directory_by_prefix(prefix: str, base_dir: str) str[source]
Parameters
  • prefix

  • base_dir

Returns

get_id_range_for_search_term(file_list: List[str], search_terms: List[str]) List[int][source]

Filters given file name list by a list of search terms

Parameters
  • file_list – list of file names

  • search_terms – list of search terms to search for

Returns

list of ids matching the search terms

get_id_ranges_from_input_directory(root_directory: str) Dict[str, List[int]][source]
Parameters

root_directory – root directory to get labels from

Returns

a dictionary containing species-ID ranges for all genera

get_n_random_elements(container: List[Union[str, int]], num_elements: int) Set[Union[str, int]][source]

Getter for num_elements random elements out of container.

Parameters
  • container – the container to extract the elements from

  • num_elements – number of random elements to extract

Returns

list of num_elements random elements out of container

init_random_dataset_ids(num_elements: int) List[str][source]
Parameters

num_elements

Returns

process_directories(combined_list: List[Union[str, int]], base_dir: str, out_dir: str = '../data/iNat', requires_decisions: bool = False, num_elements: int = 30, move_files_only: bool = False) None[source]

Method to generate a local copy of the data set.

Parameters
  • combined_list – List of sample directories

  • base_dir – root directory of source files

  • out_dir – target directory

  • requires_decisions – if True requires approval before adding a sample

  • num_elements – required number of samples per order/species

  • move_files_only – if True moves

Returns

process_directory(dir_prefix: str, base_dir: str, out_dir: str, requires_decisions: bool = False, num_samples: int = 30) List[str][source]

Copies num_samples files from input to target directory. Allows to require approval for each sample by adding requires_decision=True

Parameters
  • dir_prefix – prefix of source/input directory, e.g. 00420

  • base_dir – root directory of source

  • out_dir – target/output directory

  • requires_decisions – if True asks for approval before copying each sample

  • num_samples – required amount of samples

Returns

List of new file locations

scan_input_directory(root_directory: str) None[source]

Prints statistics from input directory :param root_directory: :return:

test_dataset(combined_list: List[Union[str, int]], base_dir: str, expected_num_elements: int) None[source]

Verifies the dataset for consistency/integrity

Parameters
  • combined_list

  • base_dir

  • expected_num_elements

Returns

scripts.process_files module

Script to convert an input directory into a data set directory based on a provided bounding box configuration file.

For usage:

python -m scripts.process_files -h

bounding_box_stats(files: Dict) Dict[source]

Generates statistics for available bounding boxes. Calculates min, max, mean and avg width and heights.

Parameters

files – dictionary of annotated image files

Returns

general statistics for bounding boxes

create_config_file(test_set: Dict[str, Dict[str, Union[Dict[str, float], str]]], train_set: Dict[str, Dict[str, Union[Dict[str, float], str]]], val_set: Dict[str, Dict[str, Union[Dict[str, float], str]]], label_names: Set[str], output_file: str) None[source]

Creates JSON Dataset configuration file from given data.

Parameters
  • test_set – annotated test data set

  • train_set – annotated train data set

  • val_set – annotated validation data set

  • label_names – set of available class labels in the data set

  • output_file – target file path + name for the configuration file

Returns

create_dataset_structure(files: Dict[str, Dict[str, Union[Dict[str, float], str]]], label_names: Set[str], dir_name: str, test_split: float, val_split: float) None[source]

Generates dataset directory structure for given data.

Parameters
  • files – files to process

  • label_names – set of available class labels

  • dir_name – name of target directory

  • test_split – share of test data (in [0, 1])

  • val_split – share of validation data (in [0, 1])

Returns

create_directory_structure(directory: str, class_names: Set[str]) Tuple[str, str, str][source]

Generates test, train and validation directories.

Parameters
  • directory – root directory to create directories in

  • class_names – set of available class names in the data set

Returns

Tuple holding test, train and validation directory

extract_file_name(elem: Dict) str[source]

Extracts file name from dictionary.

Parameters

elem – dictionary holding image annotations

Returns

the file name of the annotated image

extract_label(elem: Dict) Tuple[Dict[str, float], str][source]

Extracts labels from element dictionary.

Parameters

elem

Returns

Bounding Box coordinates and class label: ((y_min, x_min, y_max, x_max), label)

generate_statistics(files: Dict, target_directory: str, label_names: Set[str]) None[source]

Generate statistics over the processed data sets.

Parameters
  • files – dictionary holding annotated image files

  • target_directory – root directory for files

  • label_names – available class labels in the data set

Returns

get_genera_file_stats_for_directory(directory: str, label_names: Set[str]) Dict[str, int][source]

Counts number of samples for each genus.

Parameters
  • directory – source root directory

  • label_names – set of class names/subdirectories to find in root directory

Returns

Dictionary holding statistics genus: file_count

load_element(elem, in_directory: str) Dict[source]

load element from data dict.

Parameters
  • elem – annotated image

  • in_directory – source directory of annotated image

Returns

converted annotations

load_labels_from_bbox_file(bbox_file: str, in_directory: str) Dict[str, Dict[str, Union[Dict[str, float], str]]][source]

loads all available labels from dataset file.

Parameters
  • bbox_file – file holding BBox annotations

  • in_directory – source directory of annotations

Returns

dictionary of annotated source image files

load_labels_from_bbox_files(bbox_files: List[str], in_directory: str) Dict[str, Dict[str, Union[Dict[str, float], str]]][source]

loads all labels from all provided files.

Parameters
  • bbox_files – list of BBox annotation files

  • in_directory – source directory for annotated images

Returns

dictionary holding annotated images

split_labeled_files(files: Dict[str, Dict[str, Union[Dict[str, float], str]]], test_share: float, validation_share: float) Tuple[Dict[str, Dict[str, Union[Dict[str, float], str]]], Dict[str, Dict[str, Union[Dict[str, float], str]]], Dict[str, Dict[str, Union[Dict[str, float], str]]]][source]

Splits given files into test/train/validation sets.

Parameters
  • files – Annotated image dictionary

  • test_share – pct of test samples (in [0, 1])

  • validation_share – pct of validation samples (in [0, 1])

Returns

three annotation dictionaries: test, train, validation

spread_files(files: Dict[str, Dict[str, Union[Dict[str, float], str]]], target_directory: str, label_names: Set[str]) Dict[str, Dict[str, Union[Dict[str, float], str]]][source]

copies files from input to target directory nested in class directories.

Parameters
  • files – annotated image dictionary

  • target_directory – target root directory

  • label_names – set of available class names in the data set

Returns

copy of files with new location in target_directory

test_output_directory(directory: str, label_names: Set[str]) Tuple[Dict, Dict, Dict][source]

Validates output directory for correctness.

Parameters
  • directory – root directory of data set

  • label_names – available class labels in data set

Returns

statistics for the test, train and validation sets

scripts.reuse_labels module

Use this script to copy the labeled images from the mounted volume into the dedicated structure.

extract_file_name(elem: Dict) str[source]

Extract file name from element dictionary.

Parameters

elem – element dictionary

Returns

file name

get_directory_from_prefix(in_dir, prefix)[source]

search function for directory with prefix prefix within in_dir.

Parameters
  • in_dir – parent directory

  • prefix – search term to look for

Returns

path to target directory

move_file(elem, input_dir, target_dir)[source]

moves files from one directory into another.

Parameters
  • elem – files to move

  • input_dir – source to move from

  • target_dir – target to move files to

Returns

list of files in target directory

process_in_multi_threads(file_content, input_dir, target_dir) None[source]

Moves files file_content from input_dir to target_dir using multiple threads.

Parameters
  • file_content – list of file names

  • input_dir – source directory

  • target_dir – target directory

Returns

process_in_single_thread(file_content, input_dir, target_dir) None[source]

Copies files file_content from input_dir to target_dir using a single thread.

Parameters
  • file_content – files to move

  • input_dir – source directory

  • target_dir – target directory

Returns