Monitoring and Early Warning for Internet Worms Cliff Changchun Zou, Lixin Gao, Weibo Gong, Don Towsley University of Massachusetts at Amherst {czou, lgao, gong}@ecs.umass.edu, [email protected] ABSTRACT After the Code Red incident in 2001 and the SQL Slammer in January 2003, it is clear that a simple self-propagating worm can quickly spread across the Internet, infects most vulnerable computers before people can take effective countermeasures. The fast spreading nature of worms calls for a worm monitoring and early warning system. In this paper, we propose effective algorithms for early detection of the presence of a worm and the corresponding monitoring system. Based on epidemic model and observation data from the monitoring system, by using the idea of “detecting the trend, not the rate” of monitored illegitimated scan traffic, we propose to use a Kalman filter to detect a worm’s propagation at its early stage in real-time. In addition, we can effectively predict the overall vulnerable population size, and correct the bias in the observed number of infected hosts. Our simulation experiments for Code Red and SQL Slammer show that with observation data from a small fraction of IP addresses, we can detect the presence of a worm when it infects only 1% to 2% of the vulnerable computers on the Internet. Categories and Subject Descriptors K.6.5 [Management of computing and information systems]: Security and Protection—Invasive software General Terms Security, Algorithms Keywords Monitoring, Early detection, Worm propagation 1. INTRODUCTION Since the Morris worm in 1988 [21], the security threat posed by worms has steadily increased, especially in the last several years. In 2001, the Code Red and Nimda infected Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. CCS’03, October 27–30, 2003, Washington, DC, USA. Copyright 2003 ACM 1-58113-738-9/03/0010 ...$5.00. hundreds of thousands of computers [17, 22], causing millions of dollars loss to our society [8]. After a relatively quiet period, the SQL Slammer appeared on January 25th, 2003, and quickly spread throughout the Internet [19]. Because of its very fast scan rate, Slammer infected more than 90% of vulnerable computers on the Internet within 10 minutes [19]. In addition, the large amount of scan packets sent out by Slammer caused a global-scale denial of service attack to the Internet. Many networks across Asia, Europe, and America were effectively shut down for several hours [6]. Currently, some organizations and security companies, such as the CERT, CAIDA, and SANS Institute [3, 4, 23], are monitoring the Internet and paying close attention to any abnormal traffic. When they observe abnormal network activities, their security experts will immediately analyze these incidents. However, no nation-scale malware monitoring and defense center exists. Given the fast spreading nature of Internet worms and their heavy damage to our society, it seems appropriate to setup a nation-scale worm monitoring and early warning system. In order to detect an unknown (zero-day) worm, a straightforward way is to use various threshold-based anomaly detection methods to detect the presence of a worm. We can directly use some well-studied methods established in the anomaly intrusion detection area. However, many thresholdbased anomaly detections have the trouble to deal with their high false alarm rate. In the case of worm detection, we find that there is a major difference between a worm’s propagation and a hacker’s intrusion attack: the propagation of a worm code exhibits simple attack behaviors and usually follows some dynamic models because it is usually a global large-scale propagation; on the other hand, a hacker’s intrusion attack, which is more complicated, usually targets one or a set of specific computers and does not follow any well-defined dynamic model in most cases. Therefore, we do not use any threshold-based anomaly detection methods in this paper. Instead, we fully exploit a worm’s simple behavior based on well-studied epidemic models. We present a Kalman filter to detect the propagation of a worm in its early stage based on observed illegitimated scan traffic, which includes both real worm scans and background noise. The Kalman filter will not only make use of the correlation of the history trace of observation data (not just a burst of traffic at one time), but also the dynamic trend of the propagation of a worm — at the beginning of a worm’s spreading when there are little human counteractions or network congestions, a worm propagates almost exponentially with a constant, positive infection rate. The Kalman filter is activated when the monitoring system encounters a surge of illegitimated scan activities. If the worm infection rate estimated by the Kalman filter stabilizes and oscillates a little bit around a constant positive value, we can claim that the illegitimated scan activities are mainly caused by a worm, even if the estimated value of the worm’s infection rate is still not well converged. If the illegitimated scan traffic is caused by non-worm noise, the traffic will not have the exponential growth trend, then the estimated value of infection rate would oscillate around without a fixed central point, or it would oscillate around zero. In other words, the Kalman filter is used to detect the presence of a worm by detecting the trend, not the rate, of the observed illegitimated scan traffic. In this way, the unpredictable, noisy illegitimated scan traffic we observe everyday will not cause many false alarms to our detection system — such background noise will cause great trouble to traditional threshold-based detection methods. Our algorithms can also provide the estimated value of a worm’s scan rate and its vulnerable population size. With such forecast information, people can take appropriate actions to deal with the worm. In addition, we present a formula to correct the bias in the number of infected hosts observed by monitors— this bias has been mentioned in [5] and [20], but neither of them has presented methods to correct the bias. 1.1 Related Work In recent years, people have paid attention to the necessity of monitoring the Internet for malicious activities. Moore presented the concept of “network telescope”, in analogy to light telescope, by using a small fraction of IP space to observe security incidents on the global Internet [20]. Yegneswaran et al. pointed out that there was no obvious addressing biases when using the “network telescope” monitoring methodology [26]. “Honeynet” is a network of honeypots trying to gather comprehensive information of attacks [12]. Symantec Corp. has an “enterprise early warning solution”, which can collect IDS and firewall attack data from the security systems of thousands of partners to keep track of the latest attack techniques [25]. The SANS Institute set up the “Internet Storm Center” in November 2000, which could gather the log data from participants’ intrusion detection sensors distributed around the world [16]. It has quickly expanded to gather more than 3, 000, 000 intrusion detection log entries every day. Berk et al. proposed a monitoring system by collecting ICMP “Destination Unreachable” messages generated by routers for packets to non-existent IP addresses [2]. Based on such a monitoring system, they also presented a threshold-based detection system called TRAFFEN. The monitoring system we present can be incorporated into the current monitoring systems such as the SANS “Internet Storm Center”. Our contribution in this context is to point out the infrastructure specifically for worm monitoring, and what data should be collected for worm early detection. We also emphasize the functionality of egress monitors, which has been ignored in previous research. Worm monitors can be ingress or egress filters on routers, which can cover more IP space and gather more comprehensive information than the log data from intrusion detection sensors or firewalls. In the area of virus and worm modelling, Kephart, White and Chess of IBM performed a series of studies from 1991 to 1993 on viral infection based on epidemiology models [13, 14, 15]. Staniford et al. used the classical simple epidemic model to model the spread of Code Red right after the Code Red incident on July 19th, 2001 [24]. Their model matched well the increasing part of the observation data. Zou et al. presented a “two-factor” worm model that considered both the effect of human countermeasures and the effect of the congestion caused by worm scan traffic [27]. Chen et al. presented a discrete-time version worm model that considered the patching and cleaning effect during a worm’s propagation [5]. For a very fast spreading worm such as Slammer, it is necessary to have automatic response and mitigation mechanisms. Moore et al. discussed the effect of Internet quarantine for containing worm propagation [18]. However, they did not present how to detect a worm in its early stage. The CounterMalice devices from Silicon Defense company can separate an enterprise network into cells, automatically block a worm’s traffic when detecting the worm. In this way, an infected host inside a cell will not be able to infect computers in other cells of the enterprise network [9]. However, the white paper did not explain how the CounterMalice devices detect a worm at its early stage. 1.2 Discussions In this paper, we mainly focus on worms that uniformly scan the Internet. The most widespread Internet worms, including both Code Red and Slammer, belong to this category (although the Slammer has a bad-coded random number generator, the generator has a good random initial seed. Thus “it is likely that all Internet addresses would be probed equally” [19] by Slammer). Uniform scan is the simplest and yet an efficient way for a worm to propagate when the worm has no prior knowledge of where vulnerable computers reside. We assume that the IP infrastructure is the current IPv4. If IPv6 replaces IPv4, the 2128 IP space of the IPv6 would make it futile for a worm to propagate through blindly random IP scans. However, we believe IPv6 will not replace IPv4 in the near future, and worms will continue to use the random scan technique to spread on the Internet. 2. WORM PROPAGATION MODEL A promising approach for modelling and evaluating the behavior of malware is the use of fluid models. Fluid models are appropriate for a system that consists of a large number of vulnerable hosts involved in a malware attack. The simple epidemic model assumes that each host resides in one of two states: susceptible or infectious. The model further assumes that, once infected by a virus or a worm, a host remains in the infectious state forever. Thus any host has only one possible state transition: susceptible → infectious [10]. The simple epidemic model for a finite population is dIt (1) = βIt [N − It ], dt where It is the number of infected hosts at time t; N is the size of population; and β is called the pairwise rate of infection in epidemic studies [10]. At t = 0, I0 hosts are infectious while the remaining N − I0 hosts are susceptible. This model could capture the mechanism of a uniform scan worm [24], especially for the initial part of a worm’s propagation when the effect of human counteractions and congestion is ignorable [27]. In this paper, we only study the initial part of worm spreading for the purpose of early detection. Therefore, it is suitable for us to use the simple epidemic model (1) instead of other complex models, such as the two-factor worm model in [27]. 4.5 Slow finish phase 3.5 It 3 2.5 2 1.5 1 Slow start phase 100 (2) Fast spread phase 3. 0.5 0 0 2 It = (1 + α)It−1 − βIt−1 where α = βN . We call α the infection rate, the average number of vulnerable hosts that can be infected per unit time by one infected host during the early stage of worm propagation. Before we go on to discuss how to use the worm model to detect and predict worm propagation, we will first present the monitoring system design in the next Section 3 and discuss data collection issues in Section 4. 5 x 10 5 4 “t” as the discrete time index from now on. For example, It means the number of infected hosts at the real time t∆. The discrete-time version of the simple epidemic model (1) can be written as [10]: 200 300 Time t 400 500 600 Figure 1: Worm propagation model For the epidemic model (1), Fig. 1 shows the dynamics of It as time goes on for one set of parameters. We can roughly partition a worm’s propagation into three phases: the slow start phase, the fast spread phase, and the slow finish phase. During the slow start phase, since It N , the number of infected hosts increases exponentially (model (1) becomes dIt /dt ≈ βN It ). After many hosts are infected and then participate in infecting others, the worm enters the fast spread phase where vulnerable hosts are infected at a fast, near linear speed. When most of vulnerable computers have been infected, the worm enters the slow finish phase because the few leftover vulnerable computers are difficult for the worm to search out. Our task is to detect the presence of a worm in its slow start phase. Table 1: Notations in this paper Notation Definition N Number of hosts under consideration ∆ The length of monitoring interval (time unit in discrete-time model) It Number of infected hosts at time t∆ β Pairwise rate of infection α Infection rate per infected host, α = βN Ct Number of infected hosts monitored by time t∆ Zt Monitored worm scan rate at time t∆ η Average scan rate per infected host p Probability a worm scan is monitored yt Measurement data in Kalman filter wt White noise in observation at time t∆ δ Constant in equation yt = δIt + wt R Variance of observation error MWC Abbr. of “Malware Warning Center” α̂ Estimated value of α Aτ Transpose of a matrix A N (µ, σ 2 ) Normal distribution with mean µ and variance σ 2 In this paper, we use discrete-time model for worm modelling. Time is divided into intervals of length ∆, where ∆ is the discrete time unit. To simplify the notations, we use MONITORING SYSTEM In this section, we propose the architecture of a worm monitoring system. The monitoring system aims to provide comprehensive observation data on a worm’s activities for the early detection of the worm. The monitoring system consists of a Malware Warning Center (MWC) and distributed monitors as shown in Figure 2. 3.1 Monitoring System Architecture Figure 2: A generic worm monitoring system There are two kinds of monitors: ingress scan monitors and egress scan monitors. Ingress scan monitors are located on gateways or border routers of local networks. They can be the ingress filters on border routers of the local networks, or separated passive network monitors. The goal of an ingress scan monitor is to monitor scan traffic coming into a local network by logging incoming traffic to unused IP addresses. For management reason, Local network administrators know how addresses inside their networks are allocated; it is relatively easy for them to set up the ingress scan monitor on routers in their local networks. For example, during the Code Red incident on July 19th, 2002, a /8 network at UCSD and two /16 networks at Lawrence Berkeley Laboratory were used to collect Code Red scan traffic. All port 80 TCP SYN packets coming in to nonexistent IP addresses in these networks were considered to be Code Red scans [17]. Berk et al. presented a worm monitoring system by collecting the ICMP “Destination Unreachable” messages generated by routers for packets to unused IP addresses [2]. In fact, such ICMP data are essentially the same data as the data collected by the ingress scan monitors here: when encountering packets to unused IP addresses, the routers of local networks can either send ICMP messages to the monitoring system of [2], or send such information to the MWC of the monitoring system in this paper. An egress scan monitor is located at the egress point of a local network. It can be set up as a part of the egress filter on the routers of a local network. The goal of an egress scan monitor is to monitor the outgoing traffic from a network to infer a potential worm’s scan behavior. Ingress scan monitors listen to the global traffic on the Internet; they are the sensors of the global worm incidents (or called “network telescope” in [20]). However, it is difficult to determine the behavior of each individual worm from the data collected by ingress scan monitors since ingress scan monitors cannot capture most of the scans sent out by an infected host. On the other hand, if a computer inside a local network is infected, the egress scan monitor on this network’s routers can observe most of the scans sent out by the compromised computer. The closer the egress scan monitor is to an infected computer, the more complete data could be obtained about the worm’s scan behavior. For worm early warning at real-time, distributed monitors are required to send observation data to the MWC continuously without significant delay, even when the worm scan traffic has caused congestion to the Internet. For this reason, a tree-like hierarchy of data mixers can be set up between monitors and the MWC: the MWC is the root; the leaves of the tree are monitors. The monitors nearby a data mixer send observed data to the data mixer. After fusing the data together, the data mixer passes the data to a higher level data mixer or directly to MWC. An example of data fusion is the removal of redundant addresses from the list of infected hosts. However, the tree structure of data mixers creates single points of failure, thus there is a trade-off in designing this hierarchical structure. 3.2 Location for Distributed Monitors Ingress scan monitors on a local network may need to be put on several routers instead of only on the border router — the border router may not know the usage of all IP addresses of this local network. In addition, since worms might choose different destination addresses by using different preferences, e.g., non-uniform scanning, we need to use multiple address blocks with different sizes and characteristics to ensure proper coverage. For egress scan monitors, worms on different infected computers will exhibit different behaviors. For example, the Slammer’s scan rate is constrained by an infected computer’s bandwidth [19]. Therefore, we need to set up multiple egress filters to record the scan behaviors of many infected hosts at different locations and in different network environments. In this way, the monitoring system could obtain a comprehensive view of the behaviors of a worm. 4. DATA COLLECTION AND BIAS CORRECTION After setting up the monitoring system, we need to determine what kind of data should be collected. The main task for an egress scan monitor is to determine the behaviors of a worm, such as the worm’s average scan rate and scan distribution. Denote η as the average worm scan rate, which is the average number of scans sent out by an infected host per monitoring interval ∆. The ingress scan monitors record two types of data: the number of scans they receive during the t-th monitoring interval, t = 1, 2, · · · and the IP addresses of infected hosts that have sent scans to the monitors by the time t∆. If all monitors send observation data to the MWC every monitoring interval, the MWC can obtain the following observation data at each discrete time epoch t, t = 1, 2, · · · : (1). The worm’s scan distribution, e.g., uniform scan or scan with address preference, (2). The worm’s average scan rate η, (3). The total number of scans monitored in a monitoring interval from time (t − 1)∆ to t∆, denoted by Zt , (4). The number of infected hosts observed by time t∆, denoted by Ct . In this paper, we primarily focus on worms that uniformly scan the Internet. Let p denote the probability that a worm scan is monitored by the monitoring system. If the ingress scan monitors cover m IP addresses, then a worm scan has the probability p = m/232 to hit the monitors. We assume that in the discrete-time model all changes happen right before the discrete time epoch t, then we have E[Zt ] = ηpIt−1 (3) 4.1 Correction of Biased Observation Ct Each worm scan has a small probability p of being observed by the monitoring system, thus an infected host will send out many scans before one of them is observed by the ingress scan monitors (like a Bernoulli trial with a small success probability). Therefore, the number of infected hosts monitored by time t∆, Ct , is not proportional to It . This bias has been mentioned in [5] and [20], but never been corrected. In the following, we present an effective way to obtain an accurate estimate for the number of infected hosts It based on Ct and η. In the real world, different infected hosts of a worm have different scan rates. To derive the bias correction formula, let us first assume that all infected hosts have the same scan rate η (we will show the effect of removing this assumption in the following simulation). In a monitoring interval ∆, a worm sends out η scans on average, thus the monitoring system has the probability 1 − (1 − p)η to detect at least one scan from an infected host in a monitoring interval. At the time (t − 1)∆, the monitoring system has observed Ct−1 infected hosts among the overall infected ones It−1 . During the next monitoring interval from (t − 1)∆ to t∆, every host of those not yet observed ones, It−1 − Ct−1 , has the probability 1 − (1 − p)η to be observed. Suppose in the discrete-time model, all changes happen right before the discrete time epoch t, then the average number of infected hosts monitored by time t∆ conditioned on Ct−1 is E[Ct |Ct−1 ] = Ct−1 + (It−1 − Ct−1 )[1 − (1 − p)η ]. (4) Removing the conditioning on Ct−1 yields E[Ct ] = E[Ct−1 ] + (It−1 − E[Ct−1 ])[1 − (1 − p)η ]. (5) Then we can derive the formula for It as: It = E[Ct+1 ] − (1 − p)η E[Ct ] . 1 − (1 − p)η (6) 5 Since E[Ct ] is unknown in one incident of a worm’s propagation, we replace E[Ct ] by Ct and derive the estimate as x 10 3.5 Infected hosts It Observed infected C t Estimated It after Ct+1 − (1 − p)η Ct . Iˆt = 1 − (1 − p)η # of infected hosts 3 Now we analyze how the statistical observation error of Ct affects the estimated value of It . Without considering non-worm noise, suppose the observation data Ct is Ct = E[Ct ] + wt (8) (9) 2 1.5 1 0.5 where the statistical observation error wt is a white noise with variance R. Substituting (8) into (7) yields Iˆt = It + µt , bias correction 2.5 (7) 0 100 200 300 400 500 Time t (minute) 600 700 Figure 3: Estimate It based on the biased observation data Ct (Monitoring 217 IP space) where the error µt is 5 4 η wt+1 − (1 − p) wt . 1 − (1 − p)η (10) 3.5 # of infected hosts µt = Since E[µt ] = 0, the estimated value Iˆt is unbiased (under the assumption that all infected hosts have the same scan rate η). The variance of the error of Iˆt is V ar[µt ] = E[µ2t ] 1 + (1 − p)2η = R [1 − (1 − p)η ]2 We simulate Code Red propagation to check the validity of the bias correction formula (7). In the simulation, N = 360, 000. The monitoring system covers 217 IP addresses (equal to two Class B networks). The monitoring interval ∆ is set to be one minute; the average worm scan rate is η = 358 per minute. Because different infected hosts have different scan rates, we assume each host has a scan rate x that is predetermined by the normal distribution N (η, σ 2 ) where σ = 100 in the simulation (x is bounded by x ≥ 1. We will explain how we choose these parameters in Section 6). The simulation result is shown in Fig. 3. Fig. 3 shows that the observed number of infected hosts, Ct , deviates substantially from the real value It . After the bias correction by using (7), the estimated Iˆt matches It well in the simulation before the worm enters slow finish phase ( Iˆt deviates a little from It in the slow finish phase). Equation (10) shows that the estimated value should be unbiased because in deriving the bias correction formula we have assumed that all hosts have the same scan rate η, which is not the case in this simulation. In our simulation, some hosts have very small scan rate; these hosts will take much longer time to hit the monitors than other hosts. Thus in the slow finish phase, many unobserved infected hosts are hosts with very low scan rate. Therefore, the bias correction formula has some error due to the decreasing of the average scan rate for those unobserved infected hosts. In fact, we run another Infected hosts It Observed infected C t Estimated It after bias correction 2.5 2 1.5 1 (11) The equation above shows that V ar[µt ] is always larger than R, which means the statistical error of observation Ct is amplified by the bias correction formula. In addition, if the ingress scan monitors cover smaller size of IP space, p would decrease, then (11) shows that the estimate Iˆt would be noisier. For this reason, if we want to get an accurate estimate Iˆt through bias correction, the monitoring system must cover enough IP space. 3 x 10 0.5 0 100 200 300 400 500 Time t (minute) 600 700 Figure 4: Estimate It based on the biased observation data Ct (Monitoring 214 IP space) simulation by letting all hosts having the same scan rate η (i.e., let σ = 0); then the Iˆt after bias correction matches well with It for the whole period of a worm’s propagation. Fig. 4 shows the simulation results if the monitors only cover 214 IP addresses. The estimate Iˆt after the bias correction is very noisy because of the error amplification effect described by (11). 5. EARLY WARNING AND ESTIMATION OF WORM VIRULENCE In this section, we propose estimation methods based on recursive filtering algorithms (e.g., Kalman filters [1],) for stochastic dynamic systems. At the MWC, we recursively estimate the parameters β, N , and α based on observation data at each monitoring interval (these three parameters have the relationship α = βN ). In the following, we will first provide a Kalman filter type algorithm to estimate parameters α and β. Let y1 , y2 , · · · , yt , be measurement data used by the Kalman filter algorithm. Suppose the observations have one monitoring interval delay: yt = δIt−1 + wt (12) where wt is the observation error. δ is a constant ratio: if we use Zt as yt , then δ = ηp as shown in (3); if we use Iˆt−1 derived from Ct by the bias correction (7), then δ = 1. 5.1 Estimation Based on Kalman Filter From (12), we have It−1 = yt /δ − wt /δ. (13) Substituting (13) into the worm model (2) yields an equation describing the relationship between yt and the worm’s parameters: yt = (1 + α)yt−1 − β 2 yt−1 + νt , δ (14) where the noise νt is 2 νt = wt − (1 + α)wt−1 − β(wt−1 − 2yt−1 wt−1 )/δ. (15) A recursive least square algorithm for α and β can be cast into a standard Kalman filter format [1]. Let α̂t and β̂t denote the estimated value of α and β at time t∆, respec 1+α . If tively. Define the system state vector as Xt = −β/δ 2 we denote Ht = [ yt−1 yt−1 ], then the system is described by Xt = Xt−1 (16) yt = Ht Xt + νt The Kalman filter to estimate the system state Xt is 2 ] Ht = [ yt−1 yt−1 Kt = Pt−1 Htτ /(Ht Pt−1 Htτ + Rν ) (17) Pt = (I − Kt Ht )Pt−1 X̂t = X̂t−1 + Kt (yt − Ht X̂t−1 ) where Rν is the variance of noise νt and can be set to 1. From the experiments, we find that the value of Rν is not important. νt in (15) is a correlated noise. The Kalman filter (17) can be extended to consider such correlated noise to derive unbiased estimates of α and β in theory. However, if we use the unbiased filter, we will have more parameters to estimate, then the new filter will converge slower than the proposed filter (17). Our experiments also confirm this conjecture. In this paper the primary objective is to derive the rough estimate of α as quickly as possible. Therefore, it is better to use the simple Kalman filter (17) for worm early detection. If we use Zt as the measurement yt in the Kalman filter but do not know δ (e.g., if we do not have data from egress scan monitors), we still can estimate the infection rate α by letting δ = 1. The Kalman filter (17) does not depend on δ in estimating α; the value of δ only affects the estimated value of β. 5.2 Estimation of the Vulnerable Population The parameter β in model (2) is on the order of 1/N . Thus in the Kalman filter above, the two elements in the state Xt differ in the order of N . For this reason, the Kalman filter performs poorly in estimating the small value β. Consequently, the estimation of N by using N̂ = α̂/β̂ is not good. We present an effective way to estimate the population N based on η and the estimate α from the Kalman filter above. A uniformly scanning worm sends out on average η scans per monitoring interval; each scan has the probability N/232 to hit a host in the population under consideration. Hence, at the beginning when most hosts in the population are vulnerable, a worm can infect, on average, ηN/232 hosts per monitoring interval (the probability of two scans sent out by a single infected host hitting the same target is negligible). From the definition of infection rate α, we have α = ηN/232 . Therefore, the population N is N= 232 α η (18) where the average worm scan rate η can be obtained directly from egress scan monitors in the monitoring system. We can use this equation to estimate N along with the Kalman filter in estimating α. In this way, the estimation of N can have similar convergence properties to that of the estimation of α from the Kalman filter. 5.3 Overview of the Steps to Detect a Worm The MWC collects and aggregates reports of worm scans from all distributed monitors in real-time. For each TCP or UDP port, the MWC has an alarm threshold for monitored illegitimated scan traffic Zt . The observed number of scans Zt , which contains non-worm noises, is below this threshold at most time when there is no global spreading worm. If the scan traffic is over the alarm threshold for several consecutive monitoring intervals, e.g., Zt is over the threshold for three consecutive times, the Kalman filter will be activated. Then the MWC will begin to record Ct and calculate the average worm scan rate η from the reports of egress scan monitors. Because Ct is a cumulative observation data that could cumulate all non-worm noise, the MWC begins to record data Ct only after the Kalman filter is activated. The Kalman filter can either use Ct or Zt to estimate all the parameters of the worm at the time t∆ (t = 1, 2, 3, · · · ). For accuracy, the MWC can also use two filters based on Ct and Zt respectively in order to cross check to verify the results. The recursive estimation will continue until the estimated value of α shows a trend: if the estimate α̂ stabilizes and oscillates slightly around a positive constant value, we have detected the presence of a worm; if the estimate α̂ does not stabilize in a long time, or it stabilizes and oscillates around zero, we believe the surge of illegitimated traffic is caused by non-worm noise. 6. 6.1 SIMULATION EXPERIMENTS Simulation Settings We simulate both Code Red on July 19th, 2001 [7] and the SQL Slammer on January 25th, 2003 [19]. First, we explain how we choose the simulation parameters. In the case of Code Red, more than 359, 000 Code Red infected hosts were observed on July 19th, 2001 by CAIDA [17]. Thus in our simulation we set the Code Red vulnerable population N = 360, 000. Staniford et al. [24] used a different format but the same epidemic model as (1) to model Code Red, where their model’s parameter K has K = βN = α [27]. They determined that K = 1.8 for the time scale of one hour. Therefore, for the discrete time unit of one minute in our simulation, α = 1.8/60 = 0.03. From (18) we can reversely derive η = 232 α/N = 358 per minute, i.e., Code Red sent out on average about 358/60 = 5.965 scans per second. Because different infected hosts have different scan rates, we assume that each host has a constant scan rate x , a rate that is independently predetermined by a normal distribution N (η, σ 2 ) where σ = 100 (the scan rate x is bounded by x ≥ 1). In our simulation, the ingress scan monitors cover 220 IP space. We also assume I0 = 10 at the beginning. SQL Slammer propagated in the same way as Code Red by randomly generating target IP addresses to scan [19]. According to [19], a Slammer infected host sent out on average 4,000 scans per second at the beginning. The authors in [19] also observed that 75,000 hosts were infected in the first 30 minutes. Thus in the case of Slammer simulation, we set η = 4000 and N = 100, 000 (since many infected hosts did not scan the monitors in the first 30 minutes due to the congestions caused by Slammer). Because the scan rate of Slammer was bandwidth limited and varied a lot among different computers, in our simulation the scan rate x of a host is predetermined by the normal distribution N (4000, 20002 ) (x is bounded by x ≥ 1). In the discrete-time simulation, the monitoring interval ∆ is set to be one minute for Code Red, and one second for the very fast SQL Slammer worm. 6.2 Background Noise Consideration We need to consider background non-worm noise in our simulations. Fortunately, Goldsmith provided simple data of the background noise for Code Red activities monitored on a Class B network (covers 216 IP addresses) [11]. Goldsmith recorded TCP port 80 SYN requests from Internet hosts to any unused IP addresses inside his local network — such method is the same as the ingress scan monitors in our proposed monitoring system. The observation data showed that the background noise was small compared to Code Red traffic and the noise did not vary much. If we use normal distribution to model the background noise, then for each hour the number of noise scans follows N (110.5, 302 ) and the number of noise sources follows N (17.4, 3.32 ). We try to hold the statistics of the observed background noise in our experiments: we monitor 220 IP space, which is 16 times as large as what Goldsmith monitored, so the number of noise scans or noise sources should be enlarged by 16 times; we use one minute in stead of one hour as the monitoring interval, thus we should decrease the number of noise scans or noise sources by 60. In this way, in our Code Red simulations, the noise added into the observation data at each monitoring interval follows N (29.5, 82 ) for Zt and N (4.63, 0.8932 ) for Ct . Of course, this kind of extension of noise is very rough, but it is the best we can do based on the data available. Currently, we are trying to obtain the detailed log data on Code Red and Slammer from others in order to have more realistic experiments. For the SQL Slammer, we do not have any observed data on its background noise. Thus we simply assume that it has the same background noise as the Code Red. In the simulation experiments, the alarm threshold for Zt is set to be two times as large as the mean value of the background noise. The Kalman filter will be activated when the illegitimated scan traffic Zt is over the alarm threshold for three consecutive monitoring intervals. In this way, the Kalman filter will not be frequently activated by the surge of noise traffic. 6.3 Code Red Simulation and Early Warning For Code Red, Fig. 5(a) shows It in one simulation run as a function of time; Fig. 5(b) shows the estimate of infection rate α̂ as time goes on by using observation data Zt . In this simulation run, Zt at time 126, 127 and 128 minutes are over the alarm threshold 59, thus the Kalman filter is activated at 128 minutes. Fig. 5(c) shows the estimate α̂ by using Ct after bias correction (7). Both estimates converge to the real value of α, but the estimate by using Ct is smoother. Our objective of the estimation is to find whether the estimate stabilizes and oscillates slightly around a positive constant value. Therefore in the following, we will always use Ct after bias correction to estimate α with the Kalman filter if not mentioned explicitly. We estimate the vulnerable population size N by Equation (18) at each discrete time. Fig. 6 shows the estimated value of N as time goes on. For comparison, we also use the estimated parameter β̂ derived from the Kalman filter (17) to directly calculate N̂ by N̂ = α̂/β̂. The result is also shown in Fig. 6. This figure shows that Equation (18) can provide a more accurate estimate of N than the direct estimation from the Kalman filter. In this simulation run, Code Red infects 1% of the vulnerable population at 199 minutes, and 2% of population at 223 minutes. Fig. 5(c) shows that during the time when Code Red infects 1% to 2% population, the estimate of the worm’s infection rate α has already stabilized to show the constant positive property (though the estimated value is still very rough). Therefore, the MWC can detect the presence of the worm when it infects about 1% to 2% of vulnerable population (the estimation of N is rougher but it is less important than the worm infection rate in order to detect a worm). 6.3.1 Variability in Worm Propagation Worm propagation is in fact a stochastic process. In order to check if the Kalman filter detection algorithm works well under most cases, we use the same parameter settings in the previous simulation to run the Code Red simulation for 100 times. Fig. 7 shows the upper and lower bounds and the average value of the number of infected hosts in these 100 simulation runs. For each of these 100 simulation runs, we use the Kalman filter to estimate the worm’s infection rate α. Among these 100 runs, the worm infects 2% of the vulnerable population at the time between 200 to 258 minutes. During the estimation process of each simulation run, the estimated value of α will gradually decrease its oscillation and converge to its true value (as shown in Fig. 5(c)). For each simulation run, we obtain the maximum and minimum values of estimated α after the worm infects 2% population — the oscillation of the estimated α will not exceed this boundary after the worm infects 2% population. This boundary tells us how well we can obtain a stabilized estimation. The oscillation bounds are shown in Fig. 8 for these 100 simulation runs. In order to check if the Kalman filter has worse performance when the worm propagates faster, in Fig. 8 we have sorted the 100 simulation runs according to the time when the worm infects 2% population. In other words, the worm in simulation i in Fig. 8 infects 2% of vulnerable population earlier than the worm in simulation j if i < j. This figure shows that the performance of the Kalman filter is not affected by the variability of the spreading speed of the worm. One reason is that the Kalman filter is activated earlier when the worm spreads faster. Another reason is that when the worm spreads faster, the signal-to-noise ratio of the observed data will become higher, thus the estimation will converge faster. 5 x 10 3.5 2.5 2 1.5 1 Real value of α Estimated value of α 0.25 0.2 0 100 200 300 400 Time t (minute) 500 0.1 0.1 0.05 0 600 0.2 0.15 0.15 0.05 0.5 Real value of α Estimated value of α 0.25 Estimated infection rate Estimated infection rate # of infected hosts 3 150 a. Code Red propagation 200 250 Time t (minute) 300 0 350 b. use observed data Zt 150 200 250 Time t (minute) 300 350 c. use Ct after bias correction Figure 5: Code Red simulation and Kalman filter estimation of infection rate (for one simulation run) 5 5 x 10 x 10 3 4 3 2 Real population size N From estimated β From η and estimated α 1 150 200 250 300 Time t (minute) Real value of infection rate Upper bound of estimates Lower bound of estimates 0.08 0.06 2 1.5 0.04 1 0.02 0.5 350 400 Figure 6: Estimate of population N for Code Red 6.3.2 2.5 Upper bound Lower bound Mean value Estimated infection rate bound 5 0 0.1 3.5 # of infected hosts Estimated population N 6 0 0 100 200 300 400 Time t (minute) Figure 7: Variability in Code Red simulation (for 100 simulation runs) Early Warning for a “Hit-list" Worm “Hit-list” concept was first presented in [24]. The worm has a built-in list that contains thousands IP addresses of potentially vulnerable machines. The worm will first propagate by only scanning computers on this list. After infecting most hosts on the list, the worm uses random scans to try to infect other vulnerable hosts on the Internet. At the hit-list scanning phase, the worm does not send probes to nonexistent IP addresses. Therefore, we assume that the monitoring system could collect worm scans only after the worm finishes the hit-list scanning phase. In our simulation, we assume that at time 0 the worm has already infected all hosts on the hit-list, i.e., it has a large number of initially infected hosts. Because of the worm’s high scan traffic when it begins to randomly scan, the Kalman filter will be activated immediately at the beginning. Fig. 9(a) shows the propagation of a hit-list worm that has a 1000-entry hit-list compared with previous Code Red propagation (they have the same simulation settings except the number of initially infected hosts). From Fig. 9(a) we observe that the hit-list does not affect the worm propagation pattern. Fig. 9(b)(c) show the estimation of the worm infection rate α and the vulnerable population N , respectively. Compared to Fig. 5(c), Fig. 9(b) shows that the Kalman filter estimation of the infection rate converges to the true value faster than the estimation for a worm without hit-list. It is because the worm with hit-list sends out larger amount of scans; the signal-to-noise ratio of the ob- 500 600 0 20 40 60 80 Simulation Run ( sorted ) 100 Figure 8: Bound of α̂ after worm infects 1% population (for 100 simulation runs) servation data in this experiment is higher than what used in the original Code Red experiment. In this simulation run, the hit-list worm infects 1% and 2% of vulnerable population at time 45 and 69 minutes, respectively. Fig. 9(b) shows that the estimate has already stabilized with only small oscillation by 69 minutes. Therefore, the WMC can still detect the presence of the hit-list worm when it infects only 1% to 2% of vulnerable population. 6.4 Slammer Simulation and Early Warning For SQL Slammer, Fig. 10 shows the results for one simulation run. Fig. 10(a) shows that without congestion and human counteractions, the SQL Slammer can infect most of the vulnerable hosts within 3 minutes. In reality, the Slammer quickly reduced its propagation speed because its huge scan traffic caused congestion or even broke down some networks. Therefore, it took about 10 minutes to infect 90% of vulnerable computers [19] instead of the 3 minutes here. However, at the beginning when there were few congestions and human counteractions, the Slammer still propagated according to the epidemic model used here (see Fig. 3 in [19]). From Fig. 10(a), we know that the worm infects 1% vulnerable population at time 45 seconds. Fig. 10(b) shows the estimated value of the infection rate α as time goes on. It shows that when the worm infects 1% population, the estimated value of α is already stabilized with small oscillation for a while (though the oscillation central point is higher 5 5 x 10 2.5 2 1.5 1 0.25 0.2 0.15 0.1 0.05 0.5 0 0 100 200 300 400 Time t (minute) 500 a. Hit-list worm propagation Real population size N Estimated value 4 3 2 1 0 600 x 10 5 Estimated population N # of infected hosts 3 Estimated infection rate 1000 entry hit−list Code Red simulation 6 Real value of α Estimated value of α 3.5 50 100 Time t (minute) 0 150 b. Kalman filter estimation of α 50 100 Time t (minute) 150 200 c. Estimation of population size N Figure 9: Hit-list worm simulation and Kalman filter estimation (for one simulation run) 4 x 10 Estimated infection rate # of infected hosts 8 6 4 2 0 0 2 0.5 50 100 Time t (second) 150 a. SQL Slammer propagation 200 Real value of α Estimated value of α 0.4 0.3 0.2 Estimated infection rate 10 Real value of α Estimated value of α 1.5 1 0.5 0.1 0 20 40 60 Time t (second) 80 b. Kalman filter estimation of α (time unit: 1 second) 100 0 15 35 55 75 Time t (second) 95 115 c. Kalman filter estimation of α (time unit: 5 seconds) Figure 10: SQL Slammer simulation and Kalman filter estimation (for one simulation run) than the true value at this moment). Thus the MWC can detect the worm when it infects about 1% of vulnerable population. Fig. 10(b) shows that at the beginning of estimation, the Kalman filter overestimates the value of α. This is because the discrete-time model (2) is a first-order discretization of the continuous model (1). The discretization introduces error in the infection rate α when the worm keeps exponentially increasing at the beginning. Fig. 10(c) shows the simulation results when we use 5 seconds as the monitoring interval ∆. Because now the discrete time unit is larger, the discretization error shows up clearly in this figure — the estimated value α̂ is higher than its true value. 7. DISCUSSIONS AND FUTURE WORKS We have used the epidemic model for the estimation and prediction. While this gives good results so far, we need to develop more detailed models to reflect a future worm’s dynamics. For example, if a worm spreads through a topology, or spreads by exploiting multiple vulnerabilities, or is a meta-server worm, then the dynamics will not always follow the epidemic model. For non-uniformly scanning worm, such as the Code Red II, we might have different observations by setting up the monitors on different places. The non-uniform scanning behavior of a worm may also affect the bias correction (7). For a future unknown worm, through analysis of the worm scan distribution by using the data from egress scan monitors, the MWC can determine if the worm is uniformly scanning the Internet or not. If it is not, the MWC can use data Zt , or directly use data Ct without bias correction, to detect and predict the worm. The monitoring interval ∆ is an important parameter in the system design. For slow spreading worm, it could be set to be long, but for fast spreading worm such as the Slammer, the time interval should be quite small in order to catch up with the worm’s speed. How can we select the appropriated ∆ before we know the worm’s presence and its speed? We need to do further research on designing a recursive estimation algorithm that uses adaptive sampling rate. Currently, one way we think of is to tag the time stamp with each observed scans. Then at the MWC, several estimators run in parallel with different monitoring intervals — from the tagged time stamp the correct Ct or Zt for every estimators can easily be restored. It could be useful to develop distributed estimation algorithms so as to reduce the latency and traffic for the report to a central server. We may also want to use a continuous version of the Kalman filter. This approach would reduce the significance of the monitoring interval selection and would work nicely with the distributed estimation setting. The worm detection method presented here assumes that only worm scans can cause exponential increasing scan traffic to monitors, while other background scan noise cannot. We believe this is a reasonable assumption. If we want to further improve the detection accuracy, however, we can add some other rule sets in the detection system. For example, in order to distinguish a worm attack from a DDoS attack, we can exploit the differences between them: DDoS attack has one or several targets while a worm’s propagation has no specific target. The infrastructure of the monitoring system in this paper is already built up in the real world, such as the SANS’s “Internet Storm Center” [16] or Symantec’s enterprise early warning network [25]. However, there are still significant practical issues in setting up such monitoring system, especially the security and privacy issues in data sharing. For a fast spreading worm such as the SQL Slammer, human’s manual actions will not catch up with its speed even if the early warning is provided at the beginning of the worm’s spreading. Automatic mitigation is the only way to defend against such kind of worm attack. How to decrease false alarm rate, detect a worm earlier, and collect observation data in time are the key factors in incorporating the early warning system with automatic mitigation. 8. CONCLUSIONS We propose a monitoring and early warning system for Internet worms to provide an accurate triggering signal for mitigation mechanisms in the early stage of a future worm. Such system is needed in view of the propagation scale and speed of the past worms. Although we have been lucky that the previous worms have not been very malicious, the same can not be said for the future worms. Based on the idea “detecting the trend, not the rate” of monitored illegitimated scan traffic, we present a Kalman filter to detect the presence of a worm in its early stage. The analysis and simulation studies indicate that such a system is feasible, and the “trend detection” methodology poses many interesting research issues. We hope this paper would generate interests of discussion and participation in this topic and eventually lead to an effective monitoring and early warning system. 9. ACKNOWLEDGEMENTS This work is supported in part by ARO contract DAAD1901-1-0610; by DARPA under Contract DOD F30602-00-0554; by NSF under grant EIA-0080119, ANI9980552, ANI-0208116, and by Air Force Research Lab. 10. REFERENCES [1] B.D.O. Anderson and J. Moore. Optimal Filtering. Prentice Hall, 1979. [2] V.H. Berk, R.S. Gray, and G. Bakos. Using sensor networks and data fusion for early detection of active worms. In Proc. of the SPIE AeroSense, 2003. [3] Cooperative Association for Internet Data Analysis. http://www.caida.org [4] CERT Coordination Center. http://www.cert.org [5] Z. Chen, L. Gao, and K. Kwiat. Modeling the Spread of Active Worms, In IEEE INFOCOM, 2003. [6] CNN News. Computer worm grounds flights, blocks ATMs. http://europe.cnn.com/2003/TECH/internet/01/25/ internet.attack/ [7] eEye Digital Security. .ida ”Code Red” Worm. 2001. http://www.eeye.com/html/Research/Advisories/ AL20010717.html [8] USA Today News. The cost of Code Red: $1.2 billion. http://www.usatoday.com/tech/news/2001-08-01code-red-costs.htm [9] CounterMalice: military-grade worm containment. http://www.silicondefense.com/products/countermalice/ [10] D.J. Daley and J. Gani. Epidemic Modelling: An Introduction. Cambridge University Press, 1999. [11] Dave Goldsmith. Possible CodeRed Connection Attempts. Incidients maillist. http://lists.jammed.com/incidents/2001/07/0149.html [12] Honeynet Project. Know Your Enemy: Honeynets. http://project.honeynet.org/papers/honeynet/ [13] J. O. Kephart and S. R. White. Directed-graph Epidemiological Models of Computer Viruses. In Proc. of IEEE Symposimum on Security and Privacy, pages 343-359, 1991. [14] J. O. Kephart, D. M. Chess, and S. R. White. Computers and Epidemiology. In IEEE Spectrum, 1993. [15] J. O. Kephart and S. R. White. Measuring and Modeling Computer Virus Prevalence. In Proc. of IEEE Symposimum on Security and Privacy, 1993. [16] Internet Storm Center. http://isc.incidents.org/ [17] D. Moore, C. Shannon, and J. Brown. Code-Red: a case study on the spread and victims of an Internet Worm. In Proc. ACM/USENIX Internet Measurement Workshop, France, November, 2002. [18] D. Moore, C. Shannon, G. M. Voelker, and S. Savage. Internet Quarantine: Requirements for Containing Self-Propagating Code. In IEEE INFOCOM, 2003. [19] D. Moore, V. Paxson, S. Savage, C. Shannon, S. Staniford, and N. Weaver. Inside the Slammer Worm. IEEE Security and Privacy, 1(4):33-39, July 2003. [20] D. Moore. Network Telescopes: Observing Small or Distant Security Events. In USENIX Security, 2002. [21] D. Seeley. A tour of the worm. In Proc. of the Winter Usenix Conference, San Diego, CA, 1989. [22] CAIDA. Dynamic Graphs of the Nimda worm. http://www.caida.org/dynamic/analysis/security/nimda/ [23] SANS Institute. http://www.sans.org [24] S. Staniford, V. Paxson, and N. Weaver. How to Own the Internet in Your Spare Time. In 11th Usenix Security Symposium, San Francisco, August, 2002. [25] Symantec Early Warning Solutions. Symantec Corp. http://enterprisesecurity.symantec.com/ SecurityServices/content.cfm?ArticleID=1522 [26] V. Yegneswaran, P. Barford, and J. Ullrich. Internet Intrusions: Global Characteristics and Prevalence. In ACM SIGMETRICS, June, 2003. [27] C.C. Zou, W. Gong, and D. Towsley. Code Red Worm Propagation Modeling and Analysis. In 9th ACM Symposium on Computer and Communication Security, pages 138-147, Washington DC, 2002.