Deep Packet: A Novel Approach For Encrypted Traffic Classification Using Deep Learning (1709.02656v3)

Published 8 Sep 2017 in cs.LG, cs.CR, and cs.NI

Abstract: Internet traffic classification has become more important with rapid growth of current Internet network and online applications. There have been numerous studies on this topic which have led to many different approaches. Most of these approaches use predefined features extracted by an expert in order to classify network traffic. In contrast, in this study, we propose a \emph{deep learning} based approach which integrates both feature extraction and classification phases into one system. Our proposed scheme, called "Deep Packet," can handle both \emph{traffic characterization} in which the network traffic is categorized into major classes (\eg, FTP and P2P) and application identification in which end-user applications (\eg, BitTorrent and Skype) identification is desired. Contrary to most of the current methods, Deep Packet can identify encrypted traffic and also distinguishes between VPN and non-VPN network traffic. After an initial pre-processing phase on data, packets are fed into Deep Packet framework that embeds stacked autoencoder and convolution neural network in order to classify network traffic. Deep packet with CNN as its classification model achieved recall of $0.98$ in application identification task and $0.94$ in traffic categorization task. To the best of our knowledge, Deep Packet outperforms all of the proposed classification methods on UNB ISCX VPN-nonVPN dataset.

Citations (760)

View on Semantic Scholar

Summary

The paper introduces Deep Packet, a framework that automates feature extraction and leverages deep learning to classify encrypted network traffic effectively.
It employs stacked autoencoders and 1D-CNN models, achieving recall rates of 0.98 for application identification and 0.94 for traffic characterization.
The methodology significantly improves on traditional network management techniques like QoS provisioning and anomaly detection by handling encrypted traffic efficiently.

Deep Packet: A Deep Learning-Based Approach to Encrypted Traffic Classification

The paper "Deep Packet: A Novel Approach For Encrypted Traffic Classification Using Deep Learning" by Mohammad Lotfollahi, Mahdi Jafari Siavoshani, Ramin Shirali Hossein Zade, and Mohammadsadegh Saberian introduces an innovative method for network traffic classification leveraging deep learning techniques. The approach aims to address the challenges associated with the increasing volume of encrypted Internet traffic, which traditional classification methods struggle to manage effectively.

Background and Challenges

Network traffic classification is essential for numerous modern network management tasks, including Quality-of-Service (QoS) provisioning, anomaly detection, and pricing strategies. The task becomes notably complex with the prevalent use of encryption to ensure user privacy, as encrypted packets obscure data patterns essential for classification. Traditional methods, such as port-based classification and deep packet inspection (DPI), are either outdated due to port obfuscation techniques or invasive and ineffective against encrypted traffic.

Proposed Solution

The paper presents "Deep Packet," which integrates feature extraction and classification phases using deep learning methods, specifically stacked autoencoders (SAE) and one-dimensional convolutional neural networks (1D-CNN). This integration is crucial for providing a unified and efficient mechanism for both application identification and traffic characterization, including distinguishing between VPN and non-VPN traffic.

Key Contributions:

Automated Feature Extraction: Unlike traditional methods that rely on hand-engineered features crafted by domain experts, Deep Packet automates this process using deep learning, thus eliminating potential biases and inaccuracies.
Handling Encrypted Traffic: Deep Packet outperforms existing methods by accurately classifying encrypted traffic, a task complicated by the pseudo-random nature of encrypted data.
Granular and Coarse Classification: The framework supports both fine-grained application identification (e.g., distinguishing between Skype and BitTorrent) and broader traffic characterization (e.g., identifying VPN vs. non-VPN traffic categories).

Methodology

The methodology section details the architecture of the two deep learning models utilized:

Stacked Autoencoder (SAE):
- Comprises five fully connected layers, employing dropout to prevent over-fitting.
- Pre-trained in a greedy layer-wise fashion followed by fine-tuning with backpropagation.
- Includes a final softmax classifier for the classification task.
One-dimensional Convolutional Neural Network (1D-CNN):
- Uses a combination of convolutional layers and fully connected layers.
- Hyperparameters are fine-tuned using a grid search to find the optimal configuration for the dataset.
- Final architecture employs a softmax classifier for output.

Both models were evaluated using the ISCX VPN-nonVPN dataset, comprising real-world encrypted traffic labeled by application and activity. The pre-processing phase was particularly crucial, involving the truncation of packet payloads to a fixed length and zero-padding for consistent input sizes.

Experimental Results

The one-dimensional CNN achieved remarkable performance with a recall of 0.98 for application identification and 0.94 for traffic characterization. The stacked autoencoder also demonstrated strong results, with a recall of 0.95 for application identification and 0.92 for traffic characterization. These results surpass traditional classification methods that rely on hand-engineered features, demonstrating the efficacy of deep learning for this task.

Discussion

The paper's analysis includes a thorough examination of model performances through confusion matrices and hierarchical clustering. Notably, the clustering results corroborate the intrinsic similarities between different applications, validating the network's ability to extract meaningful features. The paper also highlights the effectiveness of Deep Packet in classifying encrypted traffic, attributed to its capability to learn underlying patterns associated with different encryption schemes.

Implications and Future Work

The implications of this research are substantial for both theoretical and practical applications. By automating feature extraction, Deep Packet reduces the reliance on domain expertise and accelerates the deployment of traffic classification systems in dynamic and evolving network environments. Additionally, the success of deep learning models in this domain opens the door for further advancements, such as multi-channel classification and improved handling of highly anonymized traffic like that tunneled through Tor.

Future developments could focus on enhancing the granularity of classification, extending Deep Packet's capabilities to more complex network environments, and exploring the integration of other deep learning frameworks, such as recurrent neural networks (RNNs), for capturing temporal dependencies in network traffic.

In conclusion, the Deep Packet framework represents a significant stride in network traffic classification, leveraging the power of deep learning to address the challenges posed by encrypted traffic. This paper underscores the potential of deep learning in transforming network management tasks and sets the stage for future innovations in this crucial area.

PDF Markdown