Results

Metrics

Regression

GIoU

During training and in log entry you can find GIoULoss = 1 - GIoU as GIoU Loss. The metric used for comparison is GIoU: 1 - GIoULoss = GIoU

RMSE

RMSE same value as during training, except variance due to dropout.

Classification

Accuracy

Regular accuracy: ACC = (TP + TN)/(TP + TN + FP + FN)

F1

Harmonic mean of Precision and recall: F1 = TP / (TP + (FN + FP)/2)

Models that solve a single task

Regression (Reg)

Default dataset (Reg1)

INet
image
###########################
    GIoU Loss:   0.6161582
    RMSE Loss:   25.917557
image
MobileNet
image
###########################
    GIoU Loss:   0.61227506
    RMSE Loss:   25.372694
image
VGG-16
image
###########################
    GIoU Loss:   0.947507
    RMSE Loss:   24.218405
image
Result

Arch

GIoU

RMSE

INet

0.3838

25.9176

MobileNet

0.3877

25.3727

VGG-16

0.0525

24.2184

Augmented dataset (Reg2)

INet
image
###########################
    GIoU Loss:   0.6330171
    RMSE Loss:   26.344194
image
MobileNet
image
##########################
    GIoU Loss:   0.43982178
    RMSE Loss:   17.006126
image
VGG-16
image
##########################
    GIoU Loss:   1.399976
    RMSE Loss:   40.743893
image
Result:

Arch

GIoU

RMSE

INet

0.3670

26.3442

MobileNet

0.5602

17.0061

VGG-16

-0.4000

40.7438

Classification (Clf)

Default dataset (Clf1)

INet
image image
##########################
    Accuracy:    0.6422222222222222
    f1 score:    0.6353292590535011
image
MobileNet
image image
##########################
    Accuracy:    0.8511111111111112
    f1 score:    0.8462153544896847
image
VGG-16
image image
##########################
    Accuracy:    0.84
    f1 score:    0.8367397903530476
image
Result:

Arch

Accuracy

f1

INet

0.6422

0.7793

MobileNet

0.8511

0.8462

VGG-16

0.84

0.8367

Augmented dataset (Clf2)

INet
image image
##########################
    Accuracy:    0.6422222222222222
    f1 score:    0.62984677235293
image
MobileNet
image image
##########################
    Accuracy:    0.8333333333333334
    f1 score:    0.8261255977635148
image
VGG-16
image image
##########################
    Accuracy:    0.8466666666666667
    f1 score:    0.8438866903702877
image
Result:

Arch

Accuracy

f1

INet

0.6422

0.6298

MobileNet

0.8333

0.8261

VGG-16

0.8467

0.8439

Uncropped dataset (Clf3)

MobileNet
image image
##########################
    Accuracy:    0.7844444444444445
    f1 score:    0.779331433620758
image
Result:

Arch

Accuracy

f1

MobileNet

0.7844

0.7793

Models that solve two tasks at the same time

2-Head-Predictor (2Head)

Default dataset (2Head1)

INet
image
##########################
Regression:
    GIoU Loss:   0.64344555
    RMSE Loss:   24.211178
Classification:
    Accuracy:    0.49333333333333335
    f1 score:    0.4762167417804296
image image
MobileNet
image
##########################
Regression:
    GIoU Loss:   1.4751697
    RMSE Loss:   39.193527
Classification:
    Accuracy:    0.7311111111111112
    f1 score:    0.7224772507688828
image image
VGG-16
image
##########################
Regression:
    GIoU Loss:   1.5730777
    RMSE Loss:   43.680496
Classification:
    Accuracy:    0.17333333333333334
    f1 score:    0.0590909090909091
image image

Result:

Classification:

Arch

Accuracy

f1

INet

0.4933

0.4762

MobileNet

0.7311

0.7793

VGG-16

0.1733

0.0591

Localization:

Arch

GIoU

RMSE

INet

0.3566

24.2112

MobileNet

-0.4751

39.1935

VGG-16

-0.5731

43.6805

Augmented dataset (2Head2)
INet
image
##########################
Regression:
    GIoU Loss:   0.6958105
    RMSE Loss:   21.909544
Classification:
    Accuracy:    0.48444444444444446
    f1 score:    0.4625029774807487
image image
MobileNet
image
##########################
Regression:
    GIoU Loss:   0.63962024
    RMSE Loss:   20.518587
Classification:
    Accuracy:    0.7555555555555555
    f1 score:    0.7496034996537605
image image
VGG-16
image
##########################
Regression:
    GIoU Loss:   1.4623789
    RMSE Loss:   38.729637
Classification:
    Accuracy:    0.78
    f1 score:    0.77794664790425
image image

Result:

Classification:

Arch

Accuracy

f1

INet

0.4844

0.4625

MobileNet

0.7555

0.7496

VGG-16

0.7800

0.7779

Localization:

Arch

GIoU

RMSE

INet

0.3042

24.2111

MobileNet

0.3604

20.5186

VGG-16

-0.4624

38.7296

Comparisons

(best Reg) MobileNet + (best Clf) MobileNet

Classification:
 #########################
    Accuracy:    0.5288888888888889
    f1 score:    0.5097587081088926
image image

Result:

  • 2-S: 2 Stage method

  • 2-S-1: 2 Stage without retraining (BBReg: MobileNet, Clf: MobileNet)

  • 2-S-2: 2 Stage with clf retrained (BBReg: MobileNet, Clf: INet)

  • 2-S-3: 2 Stage with clf retrained (BBReg: MobileNet, Clf: MobileNet)

  • 2-Head: 2-Head MobileNet

  • Clf3: Classifier trained on original images

Classification:

Arch

Accuracy

f1

Clf3

0.7844

0.7793

2-S-1

0.5289

0.5097

2-S-2

0.5289

0.5269

2-S-3

0.5666

0.5661

2-Head

0.7555

0.7496

Localization:

Arch

GIoU

RMSE

Clf3

2-S-1

0.3877

25.3727

2-S-2

0.3877

25.3727

2-S-3

0.3877

25.3727

2-Head

0.3604

20.5186

RaspberryPi

The trained models have been optimized to run on a RaspberryPi micro computer. To do so quantization techniques have been applied, those have been reviewed in the paper.

Inference tests

Original:

Model evaluation for: "independent":
Classification:
 ===================================
    Accuracy:   0.916
    f1 score:   0.9167668857681328
Localization:
 ===================================
    GIoU:   0.43616188
    RMSE:   17.347866
Testing inference for 5 samples for: "independent":
AVG inference time: 2.4359699999999997s.

Model evaluation for: "two-stage":
Classification:
 ===================================
    Accuracy:   0.544
    f1 score:   0.5522557272067077
Localization:
 ===================================
    GIoU:   0.43616188
    RMSE:   17.347866
Testing inference for 5 samples for: "two-stage":
AVG inference time: 3.467743s.

Model evaluation for: "single-stage":
Classification:
 ===================================
    Accuracy:   0.92
    f1 score:   0.9196824463343246
Localization:
 ===================================
    GIoU:   0.62672853
    RMSE:   19.672493
Testing inference for 5 samples for: "single-stage":
AVG inference time: 1.2404s.

TFLite:

Model evaluation for: "independent":
Classification:
 ===================================
    Accuracy:   0.92
    f1 score:   0.9209747544826291
Localization:
 ===================================
    GIoU:   0.44598112
    RMSE:   18.026241
Testing inference for 5 samples for: "independent":
AVG inference time: 1.324131s.


Model evaluation for: "two-stage":
Classification:
 ===================================
    Accuracy:   0.52
    f1 score:   0.5284798070186199
Localization:
 ===================================
    GIoU:   0.44598112
    RMSE:   18.026241
Testing inference for 5 samples for: "two-stage":
AVG inference time: 2.34496s.


Model evaluation for: "single-stage":
[====================] 100%
Classification:
 ===================================
    Accuracy:   0.608
    f1 score:   0.5803820077663839
Localization:
 ===================================
    GIoU:   0.81673026
    RMSE:   21.786606
Testing inference for 5 samples for: "single-stage":
AVG inference time: 0.731425s.