TasselNetv2: in-field counting of wheat spikes with context-augmented local regression networks

Haipeng Xiong; Zhiguo Cao; Hao Lu; Simon Madec; Liang Liu; Chunhua Shen

doi:10.1186/s13007-019-0537-2

TasselNetv2: in-field counting of wheat spikes with context-augmented local regression networks

Plant Methods. 2019 Dec 11:15:150. doi: 10.1186/s13007-019-0537-2. eCollection 2019.

Authors

Haipeng Xiong¹, Zhiguo Cao¹, Hao Lu¹, Simon Madec², Liang Liu¹, Chunhua Shen³

Affiliations

¹ 1National Key Laboratory of Science and Technology on Multi-Spectral Information Processing, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, 430074 People's Republic of China.
² INRA-EMMAH-CAPTE, 84914 Avignon, France.
³ 3School of Computer Science, The University of Adelaide, Adelaide, SA 5005 Australia.

Abstract

Background: Grain yield of wheat is greatly associated with the population of wheat spikes, i.e., $s p i k e n u m b e r m^{- 2}$ . To obtain this index in a reliable and efficient way, it is necessary to count wheat spikes accurately and automatically. Currently computer vision technologies have shown great potential to automate this task effectively in a low-end manner. In particular, counting wheat spikes is a typical visual counting problem, which is substantially studied under the name of object counting in Computer Vision. TasselNet, which represents one of the state-of-the-art counting approaches, is a convolutional neural network-based local regression model, and currently benchmarks the best record on counting maize tassels. However, when applying TasselNet to wheat spikes, it cannot predict accurate counts when spikes partially present.

Results: In this paper, we make an important observation that the counting performance of local regression networks can be significantly improved via adding visual context to the local patches. Meanwhile, such context can be treated as part of the receptive field without increasing the model capacity. We thus propose a simple yet effective contextual extension of TasselNet-TasselNetv2. If implementing TasselNetv2 in a fully convolutional form, both training and inference can be greatly sped up by reducing redundant computations. In particular, we collected and labeled a large-scale wheat spikes counting (WSC) dataset, with 1764 high-resolution images and 675,322 manually-annotated instances. Extensive experiments show that, TasselNetv2 not only achieves state-of-the-art performance on the WSC dataset ( $91.01 %$ counting accuracy) but also is more than an order of magnitude faster than TasselNet (13.82 fps on $912 \times 1216$ images). The generality of TasselNetv2 is further demonstrated by advancing the state of the art on both the Maize Tassels Counting and ShanghaiTech Crowd Counting datasets.

Conclusions: This paper describes TasselNetv2 for counting wheat spikes, which simultaneously addresses two important use cases in plant counting: improving the counting accuracy without increasing model capacity, and improving efficiency without sacrificing accuracy. It is promising to be deployed in a real-time system with high-throughput demand. In particular, TasselNetv2 can achieve sufficiently accurate results when training from scratch with small networks, and adopting larger pre-trained networks can further boost accuracy. In practice, one can trade off the performance and efficiency according to certain application scenarios. Code and models are made available at: https://tinyurl.com/TasselNetv2.

Keywords: Context fusion; Convolutional models; Local regression networks; Object counting; Wheat spikes.