Smart Agriculture & IoT Security

2025

Intrusion Detection for Agricultural IoT Networks: An Empirical Comparison of Supervised and Unsupervised Machine Learning Approaches

Hasta Bahadur Chhetri

La Grandee International College, Pokhara, Nepal

E-mail: mail@bimql.link, PMID: N/A doi: 10.5281/zenodo.agri-iot

Abstract:

The rapid proliferation of Internet of Things (IoT) technologies has fundamentally transformed global agriculture, enabling precision farming and automated resource management. However, this digital transformation has simultaneously expanded the attack surface, leaving interconnected agricultural systems vulnerable to sophisticated cyber threats. While machine learning (ML)-based Intrusion Detection Systems (IDSs) have shown promise for securing general IoT networks, their performance and applicability in the specific, resource-constrained context of Agricultural IoT (AgriIoT) remain under-researched. This paper addresses this gap by empirically evaluating five ML models—Random Forest, Decision Tree, Logistic Regression, Autoencoder, and Variational Autoencoder—using the CICIDS2017 benchmark dataset. Experimental results demonstrate that supervised models, particularly Random Forest (F1=0.9965, ROC-AUC=0.9998), significantly outperform unsupervised models in identifying labeled network attacks. Conversely, unsupervised Autoencoders, despite lower overall metrics, provide essential capabilities for detecting zero-day anomalies in unlabeled scenarios, which are prevalent in real-world AgriIoT deployments where attack patterns are continuously evolving. The findings highlight the critical importance of selecting ML approaches based on data availability and the specific operational requirements of smart farms. Furthermore, this study underscores the necessity for lightweight IDS architectures suitable for deployment in developing agricultural contexts, such as Nepal, where infrastructure and technical resources are often limited.

Introduction

The integration of Internet of Things (IoT) technologies into agricultural practices, commonly referred to as smart agriculture or AgriIoT, is revolutionizing the sector by enabling unprecedented levels of precision, productivity, and resource efficiency. Through the deployment of distributed sensor networks, automated irrigation, and intelligent monitoring systems, farmers can optimize inputs, reduce waste, and enhance crop yields, which is vital for global food security.

However, this increasing reliance on interconnected technologies has outpaced the development of corresponding security measures. Agricultural IoT systems are often deployed in physically exposed, remote environments, frequently utilizing low-cost devices with limited computational power, memory, and energy, which precludes the implementation of traditional, heavy security protocols.

This landscape creates fertile ground for diverse cyberattacks, including denial-of-service (DoS), botnet infiltration, and false data injection, which can directly disrupt food supply chains, lead to economic losses, and pose risks to critical agricultural assets. Despite these threats, cybersecurity in AgriIoT is severely underserved in the current literature.

This paper presents a comprehensive empirical comparison of supervised and unsupervised machine learning models applied to the CICIDS2017 dataset, a modern benchmark for intrusion detection. The evaluation encompasses five distinct models, assessing their effectiveness in attack classification and anomaly detection, and contextualizes the findings within the resource-constrained requirements of agricultural environments such as those found in developing countries like Nepal.

Background

IoT in Smart Agriculture

Smart agriculture leverages a complex, multi-layered architecture—comprising perception, network, edge, fog, and cloud layers—to manage agricultural operations efficiently. This paradigm relies on the integration of smart sensing technologies, including soil moisture sensors, weather stations, automated irrigation, and livestock tracking systems. The architecture relies on diverse communication protocols, including long-range, low-power solutions like LoRaWAN for wide-area field monitoring, ZigBee for mesh networks within greenhouses, and lightweight application-level protocols like MQTT and CoAP. This protocol heterogeneity creates significant security complexity, as each layer and protocol presents different attack vectors.

Fig. 1. AgriIoT multi-layer architecture and cyber threat vectors.

Cybersecurity Threats in Agricultural IoT

AgriIoT networks are increasingly exposed to a broad spectrum of cyber threats, exacerbated by the lack of standardized security protocols and the presence of heterogeneous, often unpatched, devices. Common threats include DoS/DDoS attacks, malware and ransomware, botnet participation, spoofing, man-in-the-middle (MitM) attacks, and side-channel vulnerabilities. A false data injection attack targeting soil moisture sensors could manipulate readings to trigger unnecessary irrigation—leading to crop waterlogging—or suppress irrigation entirely, destroying valuable harvests. The IT/OT convergence allows attackers to bridge office-level breaches to industrial-level operational disruption, a physical-cyber intersection unique to the agricultural context.

Machine Learning for Intrusion Detection

Supervised models, such as Random Forest and Support Vector Machines, excel in classifying known attack types using labeled datasets. Unsupervised models, such as Autoencoders, detect anomalies by modeling the baseline behavior of the network, making them suitable for identifying previously unseen or zero-day attacks. Unsupervised Autoencoders are trained to reconstruct input data; their effectiveness is measured by the reconstruction error—the difference between the original input and the reconstruction. A high reconstruction error acts as an anomaly threshold. In security systems there is a fundamental trade-off between the false-positive rate and recall: a system configured for high recall captures more attacks but at the cost of more false positives.

Related Work

The development of efficient NIDS for IoT is a primary security objective, with numerous studies focusing on dimensionality reduction, feature selection, and lightweight model design. Bella et al. proposed an efficient IDS utilizing a CNN–decision forest combination, achieving high detection accuracy and rapid inference. Li et al. compared feature selection versus feature extraction, providing actionable insights for optimizing NIDS performance. While effective for general IoT, such models often rely on centralized processing and lack the context-awareness required to distinguish benign agricultural sensor fluctuations from malicious anomalies.

Research dedicated to smart agriculture has emphasized holistic frameworks. Ali et al. provided a systematic synthesis of AI applications in AgriIoT cybersecurity, documenting nearly 30 specific threat vectors. Thilakarathne et al. proposed a deception-based threat-intelligence platform using honeypots, and Pasca et al. developed a "vulnerable-by-design" framework for generating labeled datasets integrating both cyber threats and sensor faults. For anomaly detection, Maseer et al. benchmarked ten algorithms on CICIDS2017 and found ensemble methods like Random Forest most stable and accurate, while Meidan et al. (N-BaIoT) employed deep autoencoders to detect botnet traffic with low false-alarm rates.

Research Gap

Several gaps remain: a reliance on outdated datasets that fail to represent modern heterogeneous AgriIoT traffic; IDS methodologies tested only in idealized laboratory settings rather than field-validated on real AgriIoT edge hardware; insufficient research into interactions between cyber-attacks and physical sensor failures; and a need for resource-efficient models balancing detection performance against the severe constraints of agricultural sensor networks in developing contexts. This study addresses these gaps by evaluating both supervised and unsupervised approaches on the contemporary CICIDS2017 benchmark.

Methodology

Dataset

This study utilizes the CICIDS2017 dataset, a comprehensive, realistic benchmark containing diverse network traffic including modern DDoS, DoS, infiltration, and web attacks. It comprises 2,827,876 total records with 78 network-traffic features, providing a rich, high-dimensional space for machine-learning evaluation. The flows are labeled by attack type, enabling both binary (normal vs. malicious) and multi-class classification.

Fig. 2. Experimental methodology pipeline.

Preprocessing

The pipeline involved cleaning the raw parquet files, handling missing values, and splitting data to prevent leakage. The supervised training set combined normal samples with attack samples, yielding approximately 100,000 samples for computational efficiency while maintaining representative attack patterns. Min–Max scaling mapped all numeric features into the [0, 1] range to prevent features with larger dynamic ranges from dominating gradient updates in neural-network models. Splitting was stratified to ensure proportional class representation.

Models

Five machine-learning models were evaluated:

Random Forest — an ensemble of decision trees that reduces overfitting via feature randomness and bootstrap sampling, capturing complex non-linear relationships in network traffic.
Decision Tree — a non-parametric method that recursively partitions the feature space by information gain, offering high interpretability for security analysts.
Logistic Regression — a linear baseline that illustrates the limitations of linear models on the non-linear patterns of modern intrusions.
Autoencoder — an unsupervised network (78 → 128 → 64 → 16 → 64 → 128 → 78) trained to reconstruct its input; the reconstruction error serves as the anomaly metric.
Variational Autoencoder (VAE) — a generative model learning the probability distribution of the input within a latent space governed by a prior, providing a probabilistic framework for anomaly detection.

Experimental Setup

The project was developed in Python 3.12 using the uv package manager. Deep-learning models (Autoencoder, VAE) used PyTorch; supervised models (Random Forest, Decision Tree, Logistic Regression) used scikit-learn. Crucially, all training and evaluation ran on a standard laptop CPU without GPU acceleration—an intentional choice mirroring the resource constraints of edge hardware in agricultural deployments. Supervised models were exceptionally fast: Random Forest trained in 3.01 s, Decision Tree in 2.48 s, and Logistic Regression in 2.45 s, underscoring their suitability for rapid, real-time intrusion detection on resource-limited hardware.

Results

The performance of the models evaluated on CICIDS2017 is summarized in Table 1 and Figure 3. Supervised models demonstrate exceptional performance, with Random Forest and Decision Tree providing near-identical F1 scores above 0.996. Random Forest exhibits a marginally superior ROC-AUC of 0.9998 and PR-AUC of 0.9996, indicating better generalization across threshold sensitivities. Logistic Regression lags at F1=0.8921, reflecting the limitations of linear models on high-dimensional, non-linear attack boundaries.

Model	F1	ROC-AUC	PR-AUC	Precision	Recall
Random Forest	0.9965	0.9998	0.9996	0.9982	0.9947
Decision Tree	0.9967	0.9976	0.9973	0.9965	0.9969
Logistic Regression	0.8921	0.9797	0.9657	0.9282	0.8588
Autoencoder (full)	0.6914	0.8363	0.7914	0.6792	0.7042
Autoencoder (quick)	0.6975	0.8023	0.7281	0.5361	0.9979
VAE (quick)	0.6445	0.7219	0.6727	0.4766	0.9946

Table 1. Performance comparison on the CICIDS2017 dataset.

Fig. 3. Model performance comparison on CICIDS2017.

Among unsupervised models, the "quick" configurations of the Autoencoder and VAE reveal a dramatic shift: the quick Autoencoder trades precision (0.5361) for near-perfect recall (0.9979)—catching nearly all malicious instances while generating many false alarms—and the VAE shows similar high recall (0.9946) with the lowest precision (0.4766). The F1 gap between the best supervised model (Random Forest, 0.9965) and the best unsupervised model (Autoencoder quick, 0.6975) is approximately 0.30 points. Notably, the VAE exhibited training instability from a KL-divergence explosion at epoch 11; the fix—KL annealing—was not applied here and remains future work.

Fig. 4. Precision–recall trade-off: supervised vs. unsupervised.

Discussion

Supervised vs. Unsupervised ML

Supervised learning remains the gold standard for high-accuracy attack detection where comprehensive labeled data exists, but the inherent class imbalance and the prohibitive cost of continuous labeling in AgriIoT limit fully supervised deployment. Unsupervised learning offers adaptability: though it shows lower classification metrics, it requires no labels, allowing continuous, autonomous baseline modeling of normal traffic—crucial where novel, zero-day threats emerge in unlabeled protocols. For operators, the choice hinges on labeling budget and risk tolerance, with the ideal being a hybrid architecture.

Why Random Forest Excels

Random Forest's superior performance stems from its ensemble nature, which reduces overfitting and captures non-linear relationships in complex traffic. Across the 78 CICIDS2017 features, it identifies the most discriminative—flow duration, packet length, inter-arrival times—via impurity-reduction rankings, while bootstrap aggregating prevents over-dependence on any single redundant feature, yielding superior generalization and stability.

Limitations

The study is limited by reliance on CICIDS2017, which does not represent the specific traffic profiles of industrial AgriIoT protocols such as LoRaWAN or Modbus. Training deep models on real edge hardware presents challenges not captured in this CPU simulation, and a single dataset may not span the heterogeneous, multi-vendor configurations of real AgriIoT systems. Finally, the binary "normal vs. malicious" paradigm restricts operational utility—operators require multi-class categorization to distinguish, e.g., a DDoS attack from a subtle false-data-injection attempt, as these demand different mitigations.

Developing-Country Context

In contexts like Nepal—characterized by small-scale, fragmented landholdings, intermittent connectivity, limited budgets, and scarce localized cybersecurity expertise—the simplicity of Decision Tree and the low-latency potential of lightweight Autoencoders are particularly relevant. The emphasis must shift toward edge-based, offline-capable IDS that operate autonomously on low-cost hardware, paired with culturally aware, accessible interfaces that empower farmers to monitor security risks without extensive technical training.

Proposed Hybrid IDS Architecture

To leverage the distinct advantages of both paradigms, we propose a two-tier hybrid IDS for resource-constrained AgriIoT. The first tier is a lightweight supervised classifier—ideally Random Forest, given its performance here—acting as the primary defense against known attack signatures. The second tier is an unsupervised Autoencoder configured for anomaly detection, identifying deviations the primary classifier might miss.

The decision flow is structured for efficiency and coverage: each network flow is first evaluated by the Random Forest. If classified with confidence above a threshold, the result is accepted and the appropriate action taken immediately. If the classifier produces a low-confidence result, the flow is passed to the second-tier Autoencoder, which computes its reconstruction error; if the error exceeds a dynamically calibrated threshold, the flow is flagged as a potential novel, zero-day threat. Because Random Forest acts as a highly precise filter, the Autoencoder decides on only a small fraction of traffic—mitigating the excessive false positives of pure anomaly-based systems while preserving the ability to catch previously unseen threats.

Fig. 5. Proposed two-tier hybrid IDS architecture for AgriIoT.

Conclusion

This research demonstrates a dramatic performance gap between supervised and unsupervised learning for Agricultural IoT intrusion detection. Anchored by benchmarking on CICIDS2017, supervised ensemble models—particularly Random Forest (F1=0.9965, ROC-AUC=0.9998)—achieve near-optimal classification, with Decision Tree close behind (F1=0.9967) and Logistic Regression as the baseline (0.8921). Unsupervised Autoencoder and VAE models achieved exceptionally high recall (often above 0.99) but were hampered by low precision (0.4766–0.5361), confirming their role as highly sensitive yet error-prone anomaly detectors.

Future work should develop AgriIoT-specific benchmark datasets encapsulating LoRaWAN, MQTT, and Modbus traffic; conduct field validation on actual low-power edge hardware; explore federated learning for privacy-preserving cross-farm threat intelligence; and implement algorithmic refinements such as KL annealing for VAE stability and a transition from binary to multi-class attack categorization.

References

N. Abdalgawad et al., "Generative Deep Learning to Detect Cyberattacks for the IoT-23 Dataset," IEEE Access, 2022.
A. O. Adewusi, N. R. Chiekezie, and N. L. Eyo-Udo, "Securing Smart Agriculture: Cybersecurity Challenges and Solutions in IoT-driven Farms," World Journal of Advanced Research and Reviews, 2022.
Z. Ahmad, A. Shahid Khan, C. Wai Shiang, J. Abdullah, and F. Ahmad, "Network Intrusion Detection System: A Systematic Study of Machine Learning and Deep Learning Approaches," Trans. Emerging Telecommunications Technologies, 2021.
B. Ahmed, H. Shabbir, S. R. Naqvi, and L. Peng, "Smart Agriculture: Current State, Opportunities, and Challenges," IEEE Access, 2024.
A. Alfahaid et al., "Machine Learning-Based Security Solutions for IoT Networks: A Comprehensive Survey," Sensors, 2025.
G. Ali, M. M. Mijwil, B. A. Buruga, M. Abotaleb, and I. Adamopoulos, "A Survey on Artificial Intelligence in Cybersecurity for Smart Agriculture," Mesopotamian Journal of Computer Science, 2024.
E. Altulaihan, M. A. Almaiah, and A. Aljughaiman, "Anomaly Detection IDS for Detecting DoS Attacks in IoT Networks Based on Machine Learning Algorithms," Sensors, 2024.
K. Bella et al., "An Efficient Intrusion Detection System for IoT Security Using CNN Decision Forest," PeerJ Computer Science, 2024.
E. Dritsas and M. Trigka, "A Survey on Cybersecurity in IoT," Future Internet, 2025.
ETSI, "Cyber Security for Consumer Internet of Things: Baseline Requirements," ETSI EN 303 645, 2020.
W. Fei, H. Ohno, and S. Sampalli, "A Systematic Review of IoT Security: Research Potential, Challenges, and Future Directions," ACM Computing Surveys, 2024.
IEC, "Industrial Communication Networks — Network and System Security," IEC 62443 series, 2013.
C. Ioannou and V. Vassiliou, "Network Attack Classification in IoT Using Support Vector Machines," Journal of Sensor and Actuator Networks, 2021.
J. Li, M. S. Othman, H. Chen, and L. M. Yusuf, "Optimizing IoT Intrusion Detection System: Feature Selection versus Feature Extraction in Machine Learning," Journal of Big Data, 2024.
Z. K. Maseer, R. Yusof, N. Bahaman, S. A. Mostafa, and C. F. M. Foozy, "Benchmarking of Machine Learning for Anomaly-Based Intrusion Detection Systems in the CICIDS2017 Dataset," IEEE Access, 2021.
Y. Meidan et al., "N-BaIoT: Network-based Detection of IoT Botnet Attacks Using Deep Autoencoders," IEEE Pervasive Computing, 2018.
T. Miller et al., "The IoT and AI in Agriculture: The Time Is Now—A Systematic Review of Smart Sensing Technologies," Sensors, 2025.
NIST, "Foundational Cybersecurity Activities for IoT Device Manufacturers," NISTIR 8259, 2020.
R. Panigrahi and S. Borah, "A Detailed Analysis of CICIDS2017 Dataset for Designing Intrusion Detection Systems," Indonesian Journal of Data and Science, 2024.
E. M. Pasca, D. Delinschi, R. Erdei, I. Baraian, and O. D. Matei, "A Vulnerable-by-Design IoT Sensor Framework for Cybersecurity in Smart Agriculture," Agriculture, 2025.
P. A. A. Resende and A. C. Drummond, "A Survey of Random Forest Based Methods for Intrusion Detection Systems," ACM Computing Surveys, 2019.
N.-A. Stoian, "Machine Learning for Anomaly Detection in IoT Networks: Malware Analysis on the IoT-23 Data Set," 2020.
N. N. Thilakarathne, M. S. A. Bakar, P. E. Abas, and H. Yassin, "A Novel Cyber Threat Intelligence Platform for Evaluating the Risk Associated with Smart Agriculture," Scientific Reports, 2025.
I. Ullah and Q. H. Mahmoud, "Design and Development of a Deep Learning-Based Model for Anomaly Detection in IoT Networks," IEEE Access, 2021.
I. Ullah and Q. H. Mahmoud, "Design and Development of RNN Anomaly Detection Model for IoT Networks," IEEE Access, 2022.
A. Verma and V. Ranga, "Machine Learning Based Intrusion Detection Systems for IoT Applications," Wireless Personal Communications, 2020.