
125 Кібербезпека / 4 Курс / 3.1_3.2_4.1_Захист інформації в інформаційно-комунікаційних системах / Лiтература / [Sumeet_Dua,_Xian_Du]_Data_Mining_and_Machine_Lear(BookZZ.org)
.pdf158 Data Mining and Machine Learning in Cybersecurity
Jung, J., V. Paxson, A.W. Berger, and H. Balakrishnan. Fast portscan detection using sequential hypothesis testing. In: IEEE Symposium on Security and Privacy, Oakland, CA, 2004.
Leckie, C. and Kotagiri, R. A probabilistic approach to detecting network scans. In: Proceedings of the 2002 IEEE Network Operations and Management Symposium, Florence, Italy, 2002, pp. 359–372.
Muelder, C., L. Chen, R. Thomason, K.L. Ma, and T. Bartoletti. Intelligent classification and visualization of network scans. In: Proceedings of the Workshop on Visualization for Computer Security, Sacramento, CA, 2007.
Muelder, C., K.L. Ma, and T. Bartoletti. A visualization methodology for characterization of network scans. In: IEEE Workshops on Visualization for Computer Security, Minneapolis, MN, 2005, pp. 29–38.
Paxson, V. Bro: A system for detecting network intruders in real-time. In Proceedings of the Seventh USENIX Security Symposium, San Antonio, TX, 1998.
Robertson, S., E.V. Siegel, M. Miller, and S.J. Stolfo. Surveillance detection in high bandwidth environments. In: Proceedings of the 2003 DARPA DISCEX III Conference, Washington, DC, 2003, pp. 130–139.
Roesch, M. Snort-lightweight intrusion detection for networks. In: Proceedings of the 13th USENIX Conference on System Administration, Seattle, WA, 1999, pp. 229–238.
Simon, G., H. Xiong, E. Eilertson, and V. Kumar. Scan detection: A data mining approach. In: Proceedings of the Sixth SIAM International Conference on Data Mining (SDM), Bethesda, MD, 2006, pp. 118–129.
Staniford, S., J.A. Hoagland, and J.M. McAlerney. Practical automated detection of stealthy portscans. In: Proceedings of the Seventh ACM Conference on Computer and Communications Security, Athens, Greece, 2002a.
Staniford, S., J.A. Hoagland, and J.M. McAlerney. Practical automated detection of stealthy portscans. Journal of Computer Security 10, 105–136 (2002b).
Staniford-Chen, S., S. Cheung, R. Crawford, M. Dilger, J. Frank, J. Hoagland, K. Levitt, C. Wee, R. Yip, and D. Zerkle. GrIDS: A graph-based intrusion detection system for large networks. In: The 19th National Information Systems Security Conference, Baltimore, MD, 1996.
Yegneswaran, V., P. Barford, and J. Ullrich. Internet intrusions: Global characteristics and prevalence. In: Proceedings of the 2003 ACM Joint International Conference on Measurement and Modeling of Computer Systems, San Diego, CA, 2003, pp. 138–147.

Chapter 7
Machine Learning for Profiling Network Traffic
Character is that which reveals moral purpose, exposing the class of things a man chooses and avoids.
Aristotle
7.1 Introduction
In this chapter, we address techniques for profiling network traffic. We investigate a large number of methods for profiling normal or anomalous behaviors in cyberinfrastructures, such that we can detect the anomalous patterns accurately and efficiently. By using misuse detection systems, we extract rules or signatures from prior knowledge to characterize anomalous behaviors or intrusions. By using anomaly detection systems, we attempt to learn normal behaviors such that we can recognize both the known and unknown anomalous patterns among the remaining rules. Using hybrid intrusion systems, we combine both the normal and anomalous profiling processes to improve the detection rate and decrease the false-alarm rate. For the above three types of IDSs, it is essential to profile either normal or anomalous behaviors before launching detection procedures. In this chapter, we focus on the components of networks that involve prior interesting events.
Network administrators monitor a huge amount of network traffic flows to identify hidden problems, such as attacks or misuse of services, analyze the network traffic, and identify significant patterns in the traffic flows. For such monitoring
159
160 Data Mining and Machine Learning in Cybersecurity
to be successful, we must provide a tool that can generalize and elucidate the significant characteristics or signatures of network traffic in the report, such that the network administrators reading the report will understand the dominant behaviors in the network, such as the communities of hosts, the provider/server of services, and malicious flows.
In Chapter 6, we discussed scan detection and introduced several methods of scan characterization, such as BAM and ScanVis. The philosophy of profiling network traffic is similar to scan characterization, and we can regard scan characterization as a specific application for this chapter. Scan, or portscan, is a malicious behavior in network traffic and its characterization, including clustering and visualization, can facilitate the network administrators to detect scan attacks. Similarly, profiling will facilitate the detection of broader dominant events in networks.
In this chapter, we introduce profiling techniques and data-mining and machine-learning applications in the profiling systems. First, we define network traffic profiling and introduce related knowledge in the network traffic. Then, we categorize profiling methods according to the pattern types of interest in the network, such as applications. We expose the challenges in network profiling. Second, we introduce the data-mining and machine-learning solutions to the difficulties in network traffic profiling. We outline the workflow of profiling and concentrate on the roles of data-mining and machine-learning methods in the pattern learning and recognition phases.
Third, we study several network profiling systems in detail. We present the fundamental data-mining and machine-learning techniques in the systems, including supervised classification and clustering methods. Then, we illustrate the implementation processes and performances of these techniques in application studies. In addition, we briefly summarize other network traffic profiling methodologies and corresponding applications. Finally, we summarize the development of the network traffic profiling systems and introduce several research directions, as presented in literature.
7.2Network Traffic Profiling and Related Network Traffic Knowledge
Profiling modules perform clustering algorithms or other data-mining and machinelearning methods to group similar network connections and search for dominant behaviors. We distinguish profiling from the term “profile” used for anomaly detection in Chapter 3. Using anomaly detection, we aim to group similar normal data and build a normal model so that we can identify outliers. However, in profiling modules, we focus on grouping similar network behaviors and finding the trends that these behaviors follow.
As with the scan detection introduced in Chapter 6, network profiling methods have been developed for other specific applications, such as heavy hitters, gaming,

Machine Learning for Profiling Network Traffic 161
chatting, p2p, and suspicious traffic in FTP, HTTP, and SMTP. Such profiling applications require access to a system capable of capturing interactions between hosts through empirical signatures or statistical analysis.
Currently, researchers are interested in profiling common behaviors in network traffic. Such behaviors include the communications between hosts and the performance of the hosts. The communication between hosts can be patterned using entropy, traffic volume, feature distributions, and so on. The host performances appear in their port utilization to provide service or other interactions. The host IP addresses and the associated port numbers are used for profiling, to investigate the traffic flows.
Researchers are attempting to solve two of the largest problems in network profiling: the huge amount of network traffic flows and the difficulties in detecting patterns in the traffic data and in the learned patterns. For example, even if we extract the association rules to describe the correlation between traffic flows, the huge number of rules still hampers profiling analysis and pattern recognition. In this case, clustering methods along with data-mining techniques need to extract the dominant patterns efficiently and effectively. Furthermore, visualization ability can strengthen the role of network traffic profiling in cyber administration.
7.3 Machine Learning and Network Traffic Profiling
In this chapter, we focus on network traffic profiling but specifically not on the pattern detection process. Hence, the workflow in Figure 7.1 includes four steps: data collection, data preprocessing, network traffic profiling, and reporting.
Collection of network traffic data
Data preprocessing
Network profiling algorithm
Report and analysis
Figure 7.1 Workflow of network traffic profiling.
162 Data Mining and Machine Learning in Cybersecurity
The network traffic data can be collected online or offline. Most of the profiling techniques work on online data, but only offline data have been used in the applications. Offline profiling is sufficient for some applications, such as traffic classification at the application level using graphlets (Karagiannis et al., 2005). In data preprocessing, features are selected according to a profiling objective or analysis afterward. A network profiling algorithm can be signature-based classification, a data-mining or machine-learning clustering method, or IP blacklist filtering.
We focus on data-mining and machine-learning clustering methods and only briefly introduce the other methods in the latter applications. Both supervised machine-learning and clustering methods are used in the network traffic profiling or pattern learning process. These techniques include common clustering methods, such as association rules mining and classification, k-means clustering, DBSCAN, AutoClass and shared nearest neighbor (SNN), and application-specific algorithms, such as cluster miner in AutoFocus (Estan et al., 2003). Profiling results can be further simplified and abstracted to aid the cyber administrator in analyzing profiling reports. Visualization tools can aid in this process.
7.4Data-Mining and Machine-Learning Applications in Network Profiling
In Application Study 1, we examine the NETMINE framework, which demonstrates how to aggregate and classify association rules from traffic flows, and generalize association rules to guide analysis. In Application Study 2, AutoFocus displays methods for aggregating traffic flows into clusters over the resource consumption, along a single feature and joint features. Application Study 3 contains an example of how to extract significant clusters of behaviors, classify behaviors, and characterize the dominant interactions between dimensions using data mining and entropy, to profile the communication patterns between end users and services. In Application Study 4, we introduce how to use the SNN profiling module in the Minnesota Intrusion Detection System (MINDS) and discover unexpected patterns in network traffic. Application Study 5 demonstrates the traffic pattern classification using k-means, DBSCAN, and AutoClass over traffic statistical features. Examples of data mining and machine learning for network profiling are categorized in Table 1.6.
Application Study 1: NETMINE Using Association Rules Mining and Classification for Network Traffic Profiling
Apiletti et al. (2008) designed the NETMINE framework to characterize the network communications. The objective of NETMINE was to extract the principal association rules in network communities to facilitate the exploration and

Machine Learning for Profiling Network Traffic 163
Data capture
Data stream aggregation and ltering
Association rule generation
Aggregation of rules
Figure 7.2 Workflow of NETMINE.
recognition of significant traffic patterns for cyber administrators or domain experts. As shown in Figure 7.2, NETMINE consists of four components: data capturing, data stream aggregation and filtering, association rule mining, and association rules aggregation.
Apiletti et al. used the available data collection tools to capture network traces. They concurrently preprocessed the captured traces and packets to reduce the sample data size. In these queries, they aggregated similar traffic packets over a continuous sliding time window and filtered out those less-correlated packets for pattern extraction. Given a set of protocol features F = {f1, …, fn}, such as source IP address, each packet is a subset of F, and the associations of these features can be presented using association rules. The sliding windows are associated with two parameters: window size and moving step of the window, both measured by a time unit (e.g., second). The window size measures the coverage of the aggregating and filtering rules in continuous enquiries.
The aggregating function groups the packets that share similar features, such as source IP address. Then, the filtering function removes the packets that account for less than a threshold of the aggregated traffic flows in the sliding window. The preprocessed streaming packets include a large number of infrequent flows, which convey relevant information. To extract those seemingly trivial rules, a feasible solution was proposed to aggregate or generalize the feature values or association rules in a hierarchical taxonomy. For example, Apiletti et al. aggregated IP addresses into subnets and port numbers into three categorical levels for TCP ports, as shown in Figure 7.3.
The items in lower levels aggregated only when their generated rules were below the minimum support value. Itemsets were generated from lower-level k−1 to higherlevel k in iteration k. Only the itemsets above the support level were used for apriori

164 Data Mining and Machine Learning in Cybersecurity
130.192.1.1 |
|
|
|
1 |
|
130.192.1.2 |
Subnet 1 |
|
|
2 |
|
130.192.1.254 |
|
|
|
Well known |
|
|
|
|
|
|
|
130.192.2.1 |
|
|
|
1023 |
|
130.192.2.2 |
Subnet 2 |
Local |
|
1024 |
|
|
|
|
|||
130.192.2.254 |
|
address |
|
1025 |
TCP port |
|
|
|
|
Registered |
|
130.192.254.1 |
|
|
Address |
49151 |
|
|
|
|
|
||
130.192.254.2 |
Subnet 254 |
|
|
49152 |
|
130.192.254.254 |
|
|
|
||
|
|
|
49153 |
|
|
|
|
|
|
Dynamic port |
|
other addresses |
External |
|
|
65535 |
|
address |
|
|
|
||
(a) |
|
|
|
(b) |
|
Figure 7.3 Examples of hierarchical taxonomy in generalizing association rules.
(a) Taxonomy for address. (b) Taxonomy for ports. (Reprinted from Comput. Netw., 53, Apiletti, D., Baralis, E., Cerquitelli, T., and D’Elia, V., Characterizing network traffic by means of the NetMine framework, 774–789, Copyright (2008), with permission from Elsevier.)
rules generation. Then, the generalized rules were classified into groups according to the basic features in network traffic. For example, traffic flows can be semantically presented by rules: {source IP} ̂ {destination IP} and {destination IP} ̂ {source IP} with respective rule deduction direction. Services can be presented by the following rules: {destination address} ̂ {destination port} and {destination port} ̂ {destination address}, and service usage can be presented by the following rules: {destination port} ̂ {source address} and {source address} ̂ {destination port}. The combination of these rules can generate three other basic groups, e.g., traffic flow and service: {destination address} ̂ {destination port, source address} and {destination port, source address} ̂ {destination address}.
Apiletti et al. evaluated the proposed methods on two data sets. The data sets were captured on the backbone network at the Politecnico di Torino. The selected features included source address/port, destination address/port, and flow size. To facilitate the selection of the generalized rules, they used the lift quality index of rule X ̂ Y as follows:
support( X Y ) |
|
lift( X ,Y ) = support( X )support(Y ) . |
(7.1) |
In the above equation, X and Y are two itemsets, and support(X), support(X), and support(X ̂ Y ) are the supports of X, Y and the rule X ̂ Y, respectively. Lift value 1 indicates that the two itemsets are independent of one another; a lift value being
Machine Learning for Profiling Network Traffic 165
greater (less) than 1 indicates a positive (negative) correlation. Experimental results showed that generalized rules were extracted. These generalized rules contained lower frequent itemsets that were insufficient to meet the minimum support level if considered individual rules. The generalized rules were a higher percentage of the total rules when the support threshold was increased.
This method extracts generalized association rules, which provide a highlevel abstraction of the network traffic and allows the discovery of unexpected and more interesting traffic rules. The proposed technique exploits taxonomies to drive the pruning phase of the extraction process. Extracted correlations are automatically aggregated in more general association rules according to a frequency threshold. Eventually, extracted rules are classified into groups according to their semantic meaning, thus allowing a domain expert to focus on the most relevant patterns.
Application Study 2: AutoFocus for Clustering Multidimensional Traffic
Aggregation on one feature or on few features can generalize the network flows, e.g., using association rule generalization in NETMINE. This method can result in the selection of the wrong dimensions for aggregation without any prior knowledge, which can lead the administrator to insignificant features. Thus, identifying the significant features among the traffic streams is necessary. To obtain meaningful aggregation, clustering methods have been proposed in network traffic profiling (Ertöz et al., 2003; Estan et al., 2003; Chandola et al., 2006; Xu et al., 2008).
Estan et al. (2003) proposed a method, called AutoFocus, to automatically characterize and cluster network traffic based on resource consumption along dimensions. The resource consumption was defined as the coverage of traffic volume in the clusters of a network, e.g., using a number of packets to calculate the traffic volume. AutoFocus compressed, combined, and prioritized the clustering results into an easily comprehensive report. Five features were included in this research: source IP/port, destination IP/port, and protocol. The traffic cluster included sets of possible values of these features. As shown in Figure 7.4, AutoFocus consists of three steps: data collection, cluster mining, and report formatting. Data collection, also called traffic parser, accepts packet traces and other raw network data. Cluster miner composes the principal element of AutoFocus by four main components in its clustering algorithm: computing clusters, compressing traffic clusters, computing traffic changes, and prioritizing clusters in a report. In a report, users can recognize traffic categories and clusters, which are presented graphically after aggregation and ranking.
The input data included seven attributes: the five features listed above, and the packet and byte counters. The packet counter reports the number of matched packets in terms of the five features, while the byte counter accounts for the number of bytes in the packets. The “estimate” counter can be computed as the sum of the “estimates” of its children. First, for a single feature, source IP addresses are listed as leaves with subnets as nodes and roots in the hierarchical tree architecture.

166 Data Mining and Machine Learning in Cybersecurity
Data collection
Computing clusters
Compressing tra c report
Clusters miner
Computing tra c changes
Prioritizing clusters
Report
Figure 7.4 Workflow of AutoFocus.
Each node, including the leaves and roots, has a counter. A counter value above the predefined threshold value indicates the corresponding cluster. Once these clusters are found, multiple one-dimensional hierarchies are combined into a dimensionoverlapping structure. Each node in the structure has a parent from each dimension. Clusters are generated when their counters are above the threshold. Optimization methods help to prune the clustering space by focusing on clusters that have onedimensional ancestors above the threshold, and batching clusters.
Second, the compression algorithm traverses all clusters in the order of a specific measure. Each cluster has an “estimate” counter that accounts for the maximum “estimate” among all dimensions (here we have five features). For each dimension, the maximum “estimate” of a cluster corresponds to the sum of the “estimates” of its children. A cluster is reported when the deviation between its “estimate” and real traffic data is above the threshold, or when the “estimate” is replaced by real traffic data.
Third, in a measurement time interval of the actual change of each reported cluster from the previous step is compared to the estimated change of that cluster. A cluster is reported when the difference between the actual change and estimated change is greater than the threshold. Fourth, clusters are ranked using a measure called an unexpectedness score. Assuming features (dimensions) are independent from each other, an unexpected score is defined as the deviation from a uniform model. Given a cluster with a real percentage of volume X% and its features having an independent real percentage of volume {Y1%, …, Yd%}, the unexpectedness
of the cluster is X % Yi % , where d is the dimension size of the cluster and i = 1, …, d. This score measures the anomaly behavior among dimensions.

Machine Learning for Profiling Network Traffic 167
Estan et al. evaluated AutoFocus using the three collected traces on three cyberinfrastructures. The first trace was collected from 31 days of data on a small network exchange point in San Diego; the second trace was collected over 39 days of connections in a large research institute. The third trace was composed of an 8 h trace from an OC-48 backbone link. The investigation showed that AutoFocus recognized unexpected patterns in network traffic, such as a weekly pattern, a temporary network outage, a worm epidemic, and p2p applications.
Application Study 3: Using Information-Theoretic Techniques in Network Traffic Profiling (Table 1.6, F.5)
Xu et al. (2008) developed a general methodology to automatically discover and elucidate significant behavior profiles from Internet backbone traffic, using entropybased data-mining techniques. The authors focused on profiling the communication patterns of the network traffic in an abstraction to facilitate network administrators to understand and identify the anomalous events easily. Four end-host and service features were included in this research: source IP/port and destination IP/port. Along each feature dimension, traffic flows, which had the same feature value in this dimension, aggregated into a cluster. Hence, clusters were generated in all four dimensions. In each dimension, the significance of clusters was measured using entropy. As shown in Figure 7.5, the proposed method consisted of three steps: data collection, traffic profiling, and reporting. Data collection accepted packet-header traces in networks.
|
|
Collecting data |
|
|
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
Extracting significant clustering |
|
|
|||
|
|
|
|
|
|
Traffic profiling |
|
|
|
|
|
|
|
|
Characterizing traffic behaviors |
|
||||
|
|
|
||||
|
|
|
|
|
|
|
Modeling interactions in clusters
Reporting
Figure 7.5 Workflow of network traffic profiling as proposed in Xu et al. (2008). (Xu, K., Zhang, X.L., and Bhattachayya, S., Internet traffic behavior profiling for network security monitoring, IEEE/ACM Trans. Netw. © 2008 IEEE.)