A Robust Lightweight Vision Transformer for Classification of Crop Diseases
Abstract
Rice, wheat, and maize are important food grains consumed by most of the population in Asian countries (like India, Japan, Singapore, Malaysia, China, and Thailand). These crops’ production is affected by biotic and abiotic factors that cause diseases in several parts of the crops (including leaves, stems, roots, nodes, and panicles). A severe infection affects the growth of the plant, thereby undermining the economy of a country, if not detected at an early stage. This may cause extensive damage to crops, resulting in decreased yield and productivity. Early safeguarding methods are overlooked because of farmers’ lack of awareness and the variety of crop diseases. This causes significant crop damage and can consequently lower productivity. In this manuscript, a lightweight vision transformer (MaxViT) with 814.7 K learnable parameters and 85 layers is designed for classifying crop diseases in paddy and wheat. The MaxViT DNN architecture consists of a convolutional block attention module (CBAM), squeeze and excitation (SE), and depth-wise (DW) convolution, followed by a ConvNeXt module. This network architecture enhances feature representation by eliminating redundant information (using CBAM) and aggregating spatial information (using SE), and spatial filtering by the DW layer cumulatively enhances the overall classification performance. The proposed model was tested using a paddy dataset (with 7857 images and eight classes, obtained from local paddy farms in Lalgudi district, Tiruchirappalli) and a wheat dataset (with 5000 images and five classes, downloaded from the Kaggle platform). The model’s classification performance for various diseases has been evaluated based on accuracy, sensitivity, specificity, mean accuracy, precision, F1-score, and MCC. During training and testing, the model’s overall accuracy on the paddy dataset was 99.43% and 98.47%, respectively. Training and testing accuracies were 94% and 92.8%, respectively, for the wheat dataset. Ablation analysis was carried out to study the significant contribution of each module to improving the performance. It was found that the model’s performance was immune to the presence of noise. Additionally, there are a minimal number of parameters involved in the proposed model as compared to pre-trained networks, which ensures that the model trains faster.
Date
01-08-2025Author
Karthick Mookkandi
Malaya Kumar Nath
Sanghamitra Subhadarsini Dash
Madhusudhan Mishra
Radak Blange