Deep learning for real-time fruit detection and orchard fruit load estimation: benchmarking of ‘MangoYOLO’

The performance of six existing deep learning architectures were compared for the task of detection of mango fruit in images of tree canopies. Images of trees (n = 1 515) from across five orchards were acquired at night using a 5 Mega-pixel RGB digital camera and 720 W of LED flood lighting in a rig...

Full description

Saved in:

Bibliographic Details
Published in	Precision agriculture Vol. 20; no. 6; pp. 1107 - 1135
Main Authors	Koirala, A., Walsh, K. B., Wang, Z., McCarthy, C.
Format	Journal Article
Language	English
Published	New York Springer US 01.12.2019 Springer Nature B.V
Subjects	Agriculture Algorithms Atmospheric Sciences Biomedical and Life Sciences cameras canopy Chemistry and Earth Sciences Comparative studies comparative study Computer architecture computer hardware Computer Science computer simulation Cultivars data collection Deep learning Design criteria Digital cameras digital images farms Floodlighting Fruits Graphics processing units Image acquisition Image detection Life Sciences Lighting Mangoes Orchards Physics Pixels Remote Sensing/Photogrammetry Soil Science & Conservation Statistics for Engineering Tiles Training Trees Deep learning Mango Fruit detection Yield estimation
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The performance of six existing deep learning architectures were compared for the task of detection of mango fruit in images of tree canopies. Images of trees (n = 1 515) from across five orchards were acquired at night using a 5 Mega-pixel RGB digital camera and 720 W of LED flood lighting in a rig mounted on a farm utility vehicle operating at 6 km/h. The two stage deep learning architectures of Faster R-CNN(VGG) and Faster R-CNN(ZF), and the single stage techniques YOLOv3, YOLOv2, YOLOv2(tiny) and SSD were trained both with original resolution and 512 × 512 pixel versions of 1 300 training tiles, while YOLOv3 was run only with 512 × 512 pixel images, giving a total of eleven models. A new architecture was also developed, based on features of YOLOv3 and YOLOv2(tiny), on the design criteria of accuracy and speed for the current application. This architecture, termed ‘ MangoYOLO ’, was trained using: (i) the 1 300 tile training set, (ii) the COCO dataset before training on the mango training set, and (iii) a daytime image training set of a previous publication, to create the MangoYOLO models ‘ s ’, ‘ pt ’ and ‘ bu ’, respectively. Average Precision plateaued with use of around 400 training tiles. MangoYOLO(pt) achieved a F1 score of 0.968 and Average Precision of 0.983 on a test set independent of the training set, outperforming other algorithms, with a detection speed of 8 ms per 512 × 512 pixel image tile while using just 833 Mb GPU memory per image (on a NVIDIA GeForce GTX 1070 Ti GPU) used for in-field application. The MangoYOLO model also outperformed other models in processing of full images, requiring just 70 ms per image (2 048 × 2 048 pixels) (i.e., capable of processing ~ 14 fps) with use of 4 417 Mb of GPU memory. The model was robust in use with images of other orchards, cultivars and lighting conditions. MangoYOLO(bu) achieved a F1 score of 0.89 on a day-time mango image dataset. With use of a correction factor estimated from the ratio of human count of fruit in images of the two sides of sample trees per orchard and a hand harvest count of all fruit on those trees, MangoYOLO(pt) achieved orchard fruit load estimates of between 4.6 and 15.2% of packhouse fruit counts for the five orchards considered. The labelled images (1 300 training, 130 validation and 300 test) of this study are available for comparative studies.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1385-2256 1573-1618
DOI:	10.1007/s11119-019-09642-0