scripts package¶
Submodules¶
scripts.configs module¶
Model configurations used to generate the presented results.
- create_config()[source]¶
Creates configuration to build three final models:
Independent
Two-Stage
Single-Stage
- Returns
Final model configuration
scripts.constants module¶
Commonly used constant values
scripts.convert_models_to_tflite module¶
Script to convert TF models to TFLite models.
- Example:
$ python -m scripts.convert_models_to_tflite
To change the underlying configuration consult ./scripts/configs.py
scripts.generate_cropped_dataset module¶
Script to create a cropped version of the dataset generated using process_files.py. Maintains pixels
- Example:
Input: 200px x 300px Bounding Box: 10px x 20px
Output: 10px x 20px
For usage run ./generate_cropped_dataset.py -h
scripts.helpers module¶
Collection of helper functions used in script files.
- class ProgressBar(max_num_elements: int, size: int = 20)[source]¶
Bases:
object
Helper class to send progress bar data to stdout.
- Example:
>>> from time import sleep >>> pb = ProgressBar(10) >>> for i in range(10): ... pb.step(i) ... sleep(1) ... if i == 9: ... pb.done() [====================] 100%
- class ThreadWithReturnValue(group=None, target=None, name=None, args=None, kwargs=None, Verbose=None)[source]¶
Bases:
threading.Thread
Custom thread class with return values found at https://stackoverflow.com/a/6894023/6904543
scripts.preselect_files module¶
Script to prepare iNaturalist image files based on input parameters
- calculate_stats(combined_list: List[Union[str, int]]) None [source]¶
- Parameters
combined_list –
- Returns
- create_local_copy(combined_list: List[Union[str, int]], base_dir: str, storage_dir: str, requires_decisions: bool, num_elements: int) List[str] [source]¶
Creates local copy of num_elements samples per record in combined_list from samples in base_dir into storage_dir.
Uses multi-threading.
- Parameters
combined_list – List of order directories
base_dir – parent directory of input files
storage_dir – storage directory to create copy in
requires_decisions – if True asks for approval before adding a sample
num_elements – required amount of samples per order
- Returns
List of new file locations
- get_directory_by_prefix(prefix: str, base_dir: str) str [source]¶
- Parameters
prefix –
base_dir –
- Returns
- get_id_range_for_search_term(file_list: List[str], search_terms: List[str]) List[int] [source]¶
Filters given file name list by a list of search terms
- Parameters
file_list – list of file names
search_terms – list of search terms to search for
- Returns
list of ids matching the search terms
- get_id_ranges_from_input_directory(root_directory: str) Dict[str, List[int]] [source]¶
- Parameters
root_directory – root directory to get labels from
- Returns
a dictionary containing species-ID ranges for all genera
- get_n_random_elements(container: List[Union[str, int]], num_elements: int) Set[Union[str, int]] [source]¶
Getter for num_elements random elements out of container.
- Parameters
container – the container to extract the elements from
num_elements – number of random elements to extract
- Returns
list of num_elements random elements out of container
- process_directories(combined_list: List[Union[str, int]], base_dir: str, out_dir: str = '../data/iNat', requires_decisions: bool = False, num_elements: int = 30, move_files_only: bool = False) None [source]¶
Method to generate a local copy of the data set.
- Parameters
combined_list – List of sample directories
base_dir – root directory of source files
out_dir – target directory
requires_decisions – if True requires approval before adding a sample
num_elements – required number of samples per order/species
move_files_only – if True moves
- Returns
- process_directory(dir_prefix: str, base_dir: str, out_dir: str, requires_decisions: bool = False, num_samples: int = 30) List[str] [source]¶
Copies num_samples files from input to target directory. Allows to require approval for each sample by adding requires_decision=True
- Parameters
dir_prefix – prefix of source/input directory, e.g. 00420
base_dir – root directory of source
out_dir – target/output directory
requires_decisions – if True asks for approval before copying each sample
num_samples – required amount of samples
- Returns
List of new file locations
scripts.process_files module¶
Script to convert an input directory into a data set directory based on a provided bounding box configuration file.
- For usage:
python -m scripts.process_files -h
- bounding_box_stats(files: Dict) Dict [source]¶
Generates statistics for available bounding boxes. Calculates min, max, mean and avg width and heights.
- Parameters
files – dictionary of annotated image files
- Returns
general statistics for bounding boxes
- create_config_file(test_set: Dict[str, Dict[str, Union[Dict[str, float], str]]], train_set: Dict[str, Dict[str, Union[Dict[str, float], str]]], val_set: Dict[str, Dict[str, Union[Dict[str, float], str]]], label_names: Set[str], output_file: str) None [source]¶
Creates JSON Dataset configuration file from given data.
- Parameters
test_set – annotated test data set
train_set – annotated train data set
val_set – annotated validation data set
label_names – set of available class labels in the data set
output_file – target file path + name for the configuration file
- Returns
- create_dataset_structure(files: Dict[str, Dict[str, Union[Dict[str, float], str]]], label_names: Set[str], dir_name: str, test_split: float, val_split: float) None [source]¶
Generates dataset directory structure for given data.
- Parameters
files – files to process
label_names – set of available class labels
dir_name – name of target directory
test_split – share of test data (in [0, 1])
val_split – share of validation data (in [0, 1])
- Returns
- create_directory_structure(directory: str, class_names: Set[str]) Tuple[str, str, str] [source]¶
Generates test, train and validation directories.
- Parameters
directory – root directory to create directories in
class_names – set of available class names in the data set
- Returns
Tuple holding test, train and validation directory
- extract_file_name(elem: Dict) str [source]¶
Extracts file name from dictionary.
- Parameters
elem – dictionary holding image annotations
- Returns
the file name of the annotated image
- extract_label(elem: Dict) Tuple[Dict[str, float], str] [source]¶
Extracts labels from element dictionary.
- Parameters
elem –
- Returns
Bounding Box coordinates and class label: ((y_min, x_min, y_max, x_max), label)
- generate_statistics(files: Dict, target_directory: str, label_names: Set[str]) None [source]¶
Generate statistics over the processed data sets.
- Parameters
files – dictionary holding annotated image files
target_directory – root directory for files
label_names – available class labels in the data set
- Returns
- get_genera_file_stats_for_directory(directory: str, label_names: Set[str]) Dict[str, int] [source]¶
Counts number of samples for each genus.
- Parameters
directory – source root directory
label_names – set of class names/subdirectories to find in root directory
- Returns
Dictionary holding statistics genus: file_count
- load_element(elem, in_directory: str) Dict [source]¶
load element from data dict.
- Parameters
elem – annotated image
in_directory – source directory of annotated image
- Returns
converted annotations
- load_labels_from_bbox_file(bbox_file: str, in_directory: str) Dict[str, Dict[str, Union[Dict[str, float], str]]] [source]¶
loads all available labels from dataset file.
- Parameters
bbox_file – file holding BBox annotations
in_directory – source directory of annotations
- Returns
dictionary of annotated source image files
- load_labels_from_bbox_files(bbox_files: List[str], in_directory: str) Dict[str, Dict[str, Union[Dict[str, float], str]]] [source]¶
loads all labels from all provided files.
- Parameters
bbox_files – list of BBox annotation files
in_directory – source directory for annotated images
- Returns
dictionary holding annotated images
- split_labeled_files(files: Dict[str, Dict[str, Union[Dict[str, float], str]]], test_share: float, validation_share: float) Tuple[Dict[str, Dict[str, Union[Dict[str, float], str]]], Dict[str, Dict[str, Union[Dict[str, float], str]]], Dict[str, Dict[str, Union[Dict[str, float], str]]]] [source]¶
Splits given files into test/train/validation sets.
- Parameters
files – Annotated image dictionary
test_share – pct of test samples (in [0, 1])
validation_share – pct of validation samples (in [0, 1])
- Returns
three annotation dictionaries: test, train, validation
- spread_files(files: Dict[str, Dict[str, Union[Dict[str, float], str]]], target_directory: str, label_names: Set[str]) Dict[str, Dict[str, Union[Dict[str, float], str]]] [source]¶
copies files from input to target directory nested in class directories.
- Parameters
files – annotated image dictionary
target_directory – target root directory
label_names – set of available class names in the data set
- Returns
copy of files with new location in target_directory
scripts.reuse_labels module¶
Use this script to copy the labeled images from the mounted volume into the dedicated structure.
- extract_file_name(elem: Dict) str [source]¶
Extract file name from element dictionary.
- Parameters
elem – element dictionary
- Returns
file name
- get_directory_from_prefix(in_dir, prefix)[source]¶
search function for directory with prefix prefix within in_dir.
- Parameters
in_dir – parent directory
prefix – search term to look for
- Returns
path to target directory
- move_file(elem, input_dir, target_dir)[source]¶
moves files from one directory into another.
- Parameters
elem – files to move
input_dir – source to move from
target_dir – target to move files to
- Returns
list of files in target directory