- The paper introduces an innovative IDS framework integrating LSTM networks and feature embedding to effectively capture temporal dependencies and categorical data.
- The proposed method achieves a binary classification accuracy of 99.72% on the UNSW-NB15 dataset, outperforming traditional models.
- The study highlights faster convergence using a Many-to-Many training strategy and sets the stage for real-time network security applications.
Network Intrusion Detection based on LSTM and Feature Embedding
Introduction
The paper "Network Intrusion Detection based on LSTM and Feature Embedding" presents a sophisticated approach for intrusion detection systems (IDS) utilizing Long Short-Term Memory (LSTM) networks and feature embedding techniques (1911.11552). The authors aim to address the limitations of existing IDS methods by incorporating temporal dependencies and categorical data into the machine learning models. This approach significantly extends the capabilities of IDS in recognizing complex patterns of network attacks.
Background and Motivation
Traditional IDS methods often rely on expert-defined signatures or anomaly detection techniques, which can suffer from high false positive rates and may not effectively recognize new attack patterns. Machine learning approaches offer a compelling alternative, exploiting large-scale datasets to automatically learn malicious activity patterns. However, many existing machine learning solutions lack the ability to leverage sequential data effectively, which is critical given that network activities are inherently temporal sequences.
Recurrent Neural Networks (RNNs), particularly LSTM networks, offer a promising avenue for capturing these temporal dependencies, as they were originally designed to handle time-series data efficiently. Additionally, categorical data, often found in network traffic logs (e.g., protocol types, states, services), can be incorporated through embedding techniques typically used in NLP tasks.
The paper focuses on integrating these two key elements—temporal dependencies through LSTM and categorical feature embedding—to enhance detection performance.
Figure 1: Embedded words in a continuous vector space. Words are represented as vectors with semantic meaning.
Methods and Model Architecture
The proposed IDS architecture consists of three main components: embedding, LSTM, and fully connected layers. Categorical inputs, which are mapped to continuous vector spaces using embedding techniques, are concatenated with continuous features before feeding into the LSTM layer. The LSTM layer captures sequential information, advancing temporal pattern recognition. In the binary classification scenario, an additional layer transforms multi-class predictions into binary outputs.
Figure 2: Model Architecture: embedding, LSTM, and fully connected layers. `Fully Connected 2' is used only for binary classification.
Learning Strategies
Two training strategies are discussed: Many-to-One (M2O) and Many-to-Many (M2M). M2O trains the model using the final output of a sequence, while M2M utilizes error signals from all outputs within a sequence, potentially speeding up convergence.

Figure 3: Two learning methods: (a) M2O training learns only the last output, and (b) M2M training learns all the outputs in the sequence.
Additionally, a multi-to-binary (M2B) classification strategy is presented, converting multi-class attack type predictions into binary classification of normal vs. attack.

Figure 4: M2B classification: (a) The model is trained to perform multi-classification, (b) The prediction results are merged into binary classification results.
Experimental Results
Utilizing the UNSW-NB15 dataset, the proposed LSTM models achieved significant performance improvements over other methods such as Random Forest and MLP. Notably, the LSTMs with feature embedding reached a binary classification accuracy of 99.72%, showcasing their capability in handling time-series and categorical data simultaneously.
Figure 5: Binary-classification accuracy graphs on the validation data: M2M, and M2M with embedding. The horizontal axis indicates the length of sequence.
The experiments demonstrated that the use of feature embedding improved accuracy by approximately 2% in multi-classification settings, while M2M yielded faster convergence.
Figure 6: Multi-classification accuracy graphs on the validation data: M2M, and M2M with embedding. The horizontal axis indicates the length of sequence.
Implications and Future Work
The integration of LSTM and feature embedding into IDS systems provides a robust framework for capturing complex attack patterns and addressing limitations of traditional methods. These models are capable of real-time detection and adaptation to new attack strategies, making them suitable for deployment in dynamic network environments.
Future work could explore model optimization for embedded systems and IoT environments, where computational resources are limited, as well as further refinement of sequence length requirements for practical real-time applications.
Figure 7: Prediction time in seconds per sequence with various sequence lengths.
Conclusion
The paper contributes an advanced approach leveraging LSTM and feature embedding to enhance IDS capabilities in detecting network intrusions. Experimentally, the method demonstrated clear benefits in accuracy and real-time applicability, paving the way for improved network security solutions. Future developments could focus on reducing model complexity and further improving detection rates across varied network environments.