Intrusion Detection for Agricultural IoT Networks: An Empirical Comparison of Supervised and Unsupervised Machine Learning Approaches
Abstract:
Introduction
The integration of Internet of Things (IoT) technologies into agricultural practices, commonly referred to as smart agriculture or AgriIoT, is revolutionizing the sector by enabling unprecedented levels of precision, productivity, and resource efficiency. Through the deployment of distributed sensor networks, automated irrigation, and intelligent monitoring systems, farmers can optimize inputs, reduce waste, and enhance crop yields, which is vital for global food security.
However, this increasing reliance on interconnected technologies has outpaced the development of corresponding security measures. Agricultural IoT systems are often deployed in physically exposed, remote environments, frequently utilizing low-cost devices with limited computational power, memory, and energy, which precludes the implementation of traditional, heavy security protocols.
This landscape creates fertile ground for diverse cyberattacks, including denial-of-service (DoS), botnet infiltration, and false data injection, which can directly disrupt food supply chains, lead to economic losses, and pose risks to critical agricultural assets. Despite these threats, cybersecurity in AgriIoT is severely underserved in the current literature.
This paper presents a comprehensive empirical comparison of supervised and unsupervised machine learning models applied to the CICIDS2017 dataset, a modern benchmark for intrusion detection. The evaluation encompasses five distinct models, assessing their effectiveness in attack classification and anomaly detection, and contextualizes the findings within the resource-constrained requirements of agricultural environments such as those found in developing countries like Nepal.
Background
IoT in Smart Agriculture
Smart agriculture leverages a complex, multi-layered architecture—comprising perception, network, edge, fog, and cloud layers—to manage agricultural operations efficiently. This paradigm relies on the integration of smart sensing technologies, including soil moisture sensors, weather stations, automated irrigation, and livestock tracking systems. The architecture relies on diverse communication protocols, including long-range, low-power solutions like LoRaWAN for wide-area field monitoring, ZigBee for mesh networks within greenhouses, and lightweight application-level protocols like MQTT and CoAP. This protocol heterogeneity creates significant security complexity, as each layer and protocol presents different attack vectors.
Cybersecurity Threats in Agricultural IoT
AgriIoT networks are increasingly exposed to a broad spectrum of cyber threats, exacerbated by the lack of standardized security protocols and the presence of heterogeneous, often unpatched, devices. Common threats include DoS/DDoS attacks, malware and ransomware, botnet participation, spoofing, man-in-the-middle (MitM) attacks, and side-channel vulnerabilities. A false data injection attack targeting soil moisture sensors could manipulate readings to trigger unnecessary irrigation—leading to crop waterlogging—or suppress irrigation entirely, destroying valuable harvests. The IT/OT convergence allows attackers to bridge office-level breaches to industrial-level operational disruption, a physical-cyber intersection unique to the agricultural context.
Machine Learning for Intrusion Detection
Supervised models, such as Random Forest and Support Vector Machines, excel in classifying known attack types using labeled datasets. Unsupervised models, such as Autoencoders, detect anomalies by modeling the baseline behavior of the network, making them suitable for identifying previously unseen or zero-day attacks. Unsupervised Autoencoders are trained to reconstruct input data; their effectiveness is measured by the reconstruction error—the difference between the original input and the reconstruction. A high reconstruction error acts as an anomaly threshold. In security systems there is a fundamental trade-off between the false-positive rate and recall: a system configured for high recall captures more attacks but at the cost of more false positives.
Related Work
The development of efficient NIDS for IoT is a primary security objective, with numerous studies focusing on dimensionality reduction, feature selection, and lightweight model design. Bella et al. proposed an efficient IDS utilizing a CNN–decision forest combination, achieving high detection accuracy and rapid inference. Li et al. compared feature selection versus feature extraction, providing actionable insights for optimizing NIDS performance. While effective for general IoT, such models often rely on centralized processing and lack the context-awareness required to distinguish benign agricultural sensor fluctuations from malicious anomalies.
Research dedicated to smart agriculture has emphasized holistic frameworks. Ali et al. provided a systematic synthesis of AI applications in AgriIoT cybersecurity, documenting nearly 30 specific threat vectors. Thilakarathne et al. proposed a deception-based threat-intelligence platform using honeypots, and Pasca et al. developed a "vulnerable-by-design" framework for generating labeled datasets integrating both cyber threats and sensor faults. For anomaly detection, Maseer et al. benchmarked ten algorithms on CICIDS2017 and found ensemble methods like Random Forest most stable and accurate, while Meidan et al. (N-BaIoT) employed deep autoencoders to detect botnet traffic with low false-alarm rates.
Research Gap
Several gaps remain: a reliance on outdated datasets that fail to represent modern heterogeneous AgriIoT traffic; IDS methodologies tested only in idealized laboratory settings rather than field-validated on real AgriIoT edge hardware; insufficient research into interactions between cyber-attacks and physical sensor failures; and a need for resource-efficient models balancing detection performance against the severe constraints of agricultural sensor networks in developing contexts. This study addresses these gaps by evaluating both supervised and unsupervised approaches on the contemporary CICIDS2017 benchmark.
Methodology
Dataset
This study utilizes the CICIDS2017 dataset, a comprehensive, realistic benchmark containing diverse network traffic including modern DDoS, DoS, infiltration, and web attacks. It comprises 2,827,876 total records with 78 network-traffic features, providing a rich, high-dimensional space for machine-learning evaluation. The flows are labeled by attack type, enabling both binary (normal vs. malicious) and multi-class classification.
Preprocessing
The pipeline involved cleaning the raw parquet files, handling missing values, and splitting data to prevent leakage. The supervised training set combined normal samples with attack samples, yielding approximately 100,000 samples for computational efficiency while maintaining representative attack patterns. Min–Max scaling mapped all numeric features into the [0, 1] range to prevent features with larger dynamic ranges from dominating gradient updates in neural-network models. Splitting was stratified to ensure proportional class representation.
Models
Five machine-learning models were evaluated:
- Random Forest — an ensemble of decision trees that reduces overfitting via feature randomness and bootstrap sampling, capturing complex non-linear relationships in network traffic.
- Decision Tree — a non-parametric method that recursively partitions the feature space by information gain, offering high interpretability for security analysts.
- Logistic Regression — a linear baseline that illustrates the limitations of linear models on the non-linear patterns of modern intrusions.
- Autoencoder — an unsupervised network (78 → 128 → 64 → 16 → 64 → 128 → 78) trained to reconstruct its input; the reconstruction error serves as the anomaly metric.
- Variational Autoencoder (VAE) — a generative model learning the probability distribution of the input within a latent space governed by a prior, providing a probabilistic framework for anomaly detection.
Experimental Setup
The project was developed in Python 3.12 using the uv package manager. Deep-learning models (Autoencoder, VAE) used PyTorch; supervised models (Random Forest, Decision Tree, Logistic Regression) used scikit-learn. Crucially, all training and evaluation ran on a standard laptop CPU without GPU acceleration—an intentional choice mirroring the resource constraints of edge hardware in agricultural deployments. Supervised models were exceptionally fast: Random Forest trained in 3.01 s, Decision Tree in 2.48 s, and Logistic Regression in 2.45 s, underscoring their suitability for rapid, real-time intrusion detection on resource-limited hardware.
Results
The performance of the models evaluated on CICIDS2017 is summarized in Table 1 and Figure 3. Supervised models demonstrate exceptional performance, with Random Forest and Decision Tree providing near-identical F1 scores above 0.996. Random Forest exhibits a marginally superior ROC-AUC of 0.9998 and PR-AUC of 0.9996, indicating better generalization across threshold sensitivities. Logistic Regression lags at F1=0.8921, reflecting the limitations of linear models on high-dimensional, non-linear attack boundaries.
| Model | F1 | ROC-AUC | PR-AUC | Precision | Recall |
|---|---|---|---|---|---|
| Random Forest | 0.9965 | 0.9998 | 0.9996 | 0.9982 | 0.9947 |
| Decision Tree | 0.9967 | 0.9976 | 0.9973 | 0.9965 | 0.9969 |
| Logistic Regression | 0.8921 | 0.9797 | 0.9657 | 0.9282 | 0.8588 |
| Autoencoder (full) | 0.6914 | 0.8363 | 0.7914 | 0.6792 | 0.7042 |
| Autoencoder (quick) | 0.6975 | 0.8023 | 0.7281 | 0.5361 | 0.9979 |
| VAE (quick) | 0.6445 | 0.7219 | 0.6727 | 0.4766 | 0.9946 |
Among unsupervised models, the "quick" configurations of the Autoencoder and VAE reveal a dramatic shift: the quick Autoencoder trades precision (0.5361) for near-perfect recall (0.9979)—catching nearly all malicious instances while generating many false alarms—and the VAE shows similar high recall (0.9946) with the lowest precision (0.4766). The F1 gap between the best supervised model (Random Forest, 0.9965) and the best unsupervised model (Autoencoder quick, 0.6975) is approximately 0.30 points. Notably, the VAE exhibited training instability from a KL-divergence explosion at epoch 11; the fix—KL annealing—was not applied here and remains future work.
Discussion
Supervised vs. Unsupervised ML
Supervised learning remains the gold standard for high-accuracy attack detection where comprehensive labeled data exists, but the inherent class imbalance and the prohibitive cost of continuous labeling in AgriIoT limit fully supervised deployment. Unsupervised learning offers adaptability: though it shows lower classification metrics, it requires no labels, allowing continuous, autonomous baseline modeling of normal traffic—crucial where novel, zero-day threats emerge in unlabeled protocols. For operators, the choice hinges on labeling budget and risk tolerance, with the ideal being a hybrid architecture.
Why Random Forest Excels
Random Forest's superior performance stems from its ensemble nature, which reduces overfitting and captures non-linear relationships in complex traffic. Across the 78 CICIDS2017 features, it identifies the most discriminative—flow duration, packet length, inter-arrival times—via impurity-reduction rankings, while bootstrap aggregating prevents over-dependence on any single redundant feature, yielding superior generalization and stability.
Limitations
The study is limited by reliance on CICIDS2017, which does not represent the specific traffic profiles of industrial AgriIoT protocols such as LoRaWAN or Modbus. Training deep models on real edge hardware presents challenges not captured in this CPU simulation, and a single dataset may not span the heterogeneous, multi-vendor configurations of real AgriIoT systems. Finally, the binary "normal vs. malicious" paradigm restricts operational utility—operators require multi-class categorization to distinguish, e.g., a DDoS attack from a subtle false-data-injection attempt, as these demand different mitigations.
Developing-Country Context
In contexts like Nepal—characterized by small-scale, fragmented landholdings, intermittent connectivity, limited budgets, and scarce localized cybersecurity expertise—the simplicity of Decision Tree and the low-latency potential of lightweight Autoencoders are particularly relevant. The emphasis must shift toward edge-based, offline-capable IDS that operate autonomously on low-cost hardware, paired with culturally aware, accessible interfaces that empower farmers to monitor security risks without extensive technical training.
Proposed Hybrid IDS Architecture
To leverage the distinct advantages of both paradigms, we propose a two-tier hybrid IDS for resource-constrained AgriIoT. The first tier is a lightweight supervised classifier—ideally Random Forest, given its performance here—acting as the primary defense against known attack signatures. The second tier is an unsupervised Autoencoder configured for anomaly detection, identifying deviations the primary classifier might miss.
The decision flow is structured for efficiency and coverage: each network flow is first evaluated by the Random Forest. If classified with confidence above a threshold, the result is accepted and the appropriate action taken immediately. If the classifier produces a low-confidence result, the flow is passed to the second-tier Autoencoder, which computes its reconstruction error; if the error exceeds a dynamically calibrated threshold, the flow is flagged as a potential novel, zero-day threat. Because Random Forest acts as a highly precise filter, the Autoencoder decides on only a small fraction of traffic—mitigating the excessive false positives of pure anomaly-based systems while preserving the ability to catch previously unseen threats.
Conclusion
This research demonstrates a dramatic performance gap between supervised and unsupervised learning for Agricultural IoT intrusion detection. Anchored by benchmarking on CICIDS2017, supervised ensemble models—particularly Random Forest (F1=0.9965, ROC-AUC=0.9998)—achieve near-optimal classification, with Decision Tree close behind (F1=0.9967) and Logistic Regression as the baseline (0.8921). Unsupervised Autoencoder and VAE models achieved exceptionally high recall (often above 0.99) but were hampered by low precision (0.4766–0.5361), confirming their role as highly sensitive yet error-prone anomaly detectors.
Future work should develop AgriIoT-specific benchmark datasets encapsulating LoRaWAN, MQTT, and Modbus traffic; conduct field validation on actual low-power edge hardware; explore federated learning for privacy-preserving cross-farm threat intelligence; and implement algorithmic refinements such as KL annealing for VAE stability and a transition from binary to multi-class attack categorization.
References
- N. Abdalgawad et al., "Generative Deep Learning to Detect Cyberattacks for the IoT-23 Dataset," IEEE Access, 2022.
- A. O. Adewusi, N. R. Chiekezie, and N. L. Eyo-Udo, "Securing Smart Agriculture: Cybersecurity Challenges and Solutions in IoT-driven Farms," World Journal of Advanced Research and Reviews, 2022.
- Z. Ahmad, A. Shahid Khan, C. Wai Shiang, J. Abdullah, and F. Ahmad, "Network Intrusion Detection System: A Systematic Study of Machine Learning and Deep Learning Approaches," Trans. Emerging Telecommunications Technologies, 2021.
- B. Ahmed, H. Shabbir, S. R. Naqvi, and L. Peng, "Smart Agriculture: Current State, Opportunities, and Challenges," IEEE Access, 2024.
- A. Alfahaid et al., "Machine Learning-Based Security Solutions for IoT Networks: A Comprehensive Survey," Sensors, 2025.
- G. Ali, M. M. Mijwil, B. A. Buruga, M. Abotaleb, and I. Adamopoulos, "A Survey on Artificial Intelligence in Cybersecurity for Smart Agriculture," Mesopotamian Journal of Computer Science, 2024.
- E. Altulaihan, M. A. Almaiah, and A. Aljughaiman, "Anomaly Detection IDS for Detecting DoS Attacks in IoT Networks Based on Machine Learning Algorithms," Sensors, 2024.
- K. Bella et al., "An Efficient Intrusion Detection System for IoT Security Using CNN Decision Forest," PeerJ Computer Science, 2024.
- E. Dritsas and M. Trigka, "A Survey on Cybersecurity in IoT," Future Internet, 2025.
- ETSI, "Cyber Security for Consumer Internet of Things: Baseline Requirements," ETSI EN 303 645, 2020.
- W. Fei, H. Ohno, and S. Sampalli, "A Systematic Review of IoT Security: Research Potential, Challenges, and Future Directions," ACM Computing Surveys, 2024.
- IEC, "Industrial Communication Networks — Network and System Security," IEC 62443 series, 2013.
- C. Ioannou and V. Vassiliou, "Network Attack Classification in IoT Using Support Vector Machines," Journal of Sensor and Actuator Networks, 2021.
- J. Li, M. S. Othman, H. Chen, and L. M. Yusuf, "Optimizing IoT Intrusion Detection System: Feature Selection versus Feature Extraction in Machine Learning," Journal of Big Data, 2024.
- Z. K. Maseer, R. Yusof, N. Bahaman, S. A. Mostafa, and C. F. M. Foozy, "Benchmarking of Machine Learning for Anomaly-Based Intrusion Detection Systems in the CICIDS2017 Dataset," IEEE Access, 2021.
- Y. Meidan et al., "N-BaIoT: Network-based Detection of IoT Botnet Attacks Using Deep Autoencoders," IEEE Pervasive Computing, 2018.
- T. Miller et al., "The IoT and AI in Agriculture: The Time Is Now—A Systematic Review of Smart Sensing Technologies," Sensors, 2025.
- NIST, "Foundational Cybersecurity Activities for IoT Device Manufacturers," NISTIR 8259, 2020.
- R. Panigrahi and S. Borah, "A Detailed Analysis of CICIDS2017 Dataset for Designing Intrusion Detection Systems," Indonesian Journal of Data and Science, 2024.
- E. M. Pasca, D. Delinschi, R. Erdei, I. Baraian, and O. D. Matei, "A Vulnerable-by-Design IoT Sensor Framework for Cybersecurity in Smart Agriculture," Agriculture, 2025.
- P. A. A. Resende and A. C. Drummond, "A Survey of Random Forest Based Methods for Intrusion Detection Systems," ACM Computing Surveys, 2019.
- N.-A. Stoian, "Machine Learning for Anomaly Detection in IoT Networks: Malware Analysis on the IoT-23 Data Set," 2020.
- N. N. Thilakarathne, M. S. A. Bakar, P. E. Abas, and H. Yassin, "A Novel Cyber Threat Intelligence Platform for Evaluating the Risk Associated with Smart Agriculture," Scientific Reports, 2025.
- I. Ullah and Q. H. Mahmoud, "Design and Development of a Deep Learning-Based Model for Anomaly Detection in IoT Networks," IEEE Access, 2021.
- I. Ullah and Q. H. Mahmoud, "Design and Development of RNN Anomaly Detection Model for IoT Networks," IEEE Access, 2022.
- A. Verma and V. Ranga, "Machine Learning Based Intrusion Detection Systems for IoT Applications," Wireless Personal Communications, 2020.