Modeling Virus Propagation in Peer to Peer Networks


Modeling Virus Propagation in Peer-to-Peer
Networks
R.W. Thommes M.J. Coates
Department of Electrical and Computer Engineering Department of Electrical and Computer Engineering
McGill University McGill University
3480 University St, Montreal, QC, Canada H3A 2A7 3480 University St, Montreal, QC, Canada H3A 2A7
Email: rthomm@tsp.ece.mcgill.ca Email: coates@ece.mcgill.ca
Abstract The popularity of peer-to-peer (P2P) net- model. Finally, in Section V we examine the effect that
works makes them an attractive target to the creators
varying a number of model parameters has on the steady-
of viruses and other malicious code. Indeed, recently a
state behaviour of the network.
number of viruses designed specifically to spread via P2P
networks have emerged. In this paper we present a model
A. P2P Network Overview
which predicts how a P2P-based virus propagates through
This section highlights the key features shared by
a network. This model is a modified version of the S-E-
popular P2P Networks, including Kazaa, eDonkey2000,
I (Susceptible-Exposed-Infected) model from the field of
and Gnutella [5]. Every peer connected to the network
epidemiology. Our model classifies each peer as falling
has a shared folder containing all the files the user wishes
into one of three categories based on the number of
infected files it is sharing. We derive differential equations to make publicly available for download by others on the
which comprise the deterministic model and examine the
network. When a user wants to download a file, he begins
expected behaviour of the P2P network as predicted by
by sending out a search request. Eventually he will
these equations.
receive back a list of files matching the search criteria.
Keywords peer-to-peer networks, viruses, epidemiology
The specific manner in which this list is generated varies
among the various P2P networks, but in all cases the
I. INTRODUCTION
query response is the result of the examination of the
Several factors make peer-to-peer (P2P) networks par- shared folders of a subset of all peers connected to
the network. Once the user elects to download one of
ticularly susceptible to the spreading of malicious code.
the files from the list, his client attempts to set up a
The early P2P networks such as Napster could only be
connection to a peer sharing the file and begins receiving
used to trade MP3 files, which essentially cannot contain
malicious code [1]. However, contemporary P2P net- the file. Depending on the specific network, the client
may attempt to simultaneously download different parts
works such as Kazaa / Fastrack [2] and eDonkey2000 [3]
of the file from a number of peers in order to expedite
are able to disseminate executable files which may
the operation. P2P clients typically save new downloaded
contain viruses. As P2P has increasingly entered the
files in the shared folder  making them immediately
mainstream  the eDonkey2000 network alone typically
available to other users.
has over 2 million users connected at any given time [4]
 many users lack the technical knowledge to detect A number of worms and viruses that exploit P2P
suspicious files or scan for viruses. As a result, malicious networks have already surfaced. The majority of these
files may not be rapidly contained. behave in a similar fashion. Specifically, when a user
In this paper we examine how files infected with downloads a file containing the virus and executes it,
viruses propagate through a P2P network. We begin by a number of new files containing the virus are created
presenting a relatively simple model in Section II and and placed in the client s shared directory. Some types
describing its various parameters. Next, in Section III of viruses, including Achar [6] and Gotorm [7], generate
we derive several differential equations that govern the a fixed list of filenames when executed. More advanced
expected evolution of the network over time. In Sec- viruses, such as Bare [8] and Krepper [9], randomly pick
tion IV we analyze the steady-state behaviour of our the list of filenames from a large pool of candidates.
B. Related Work every one of the N peers making up the network falls
into one of the three categories. Thus, for all values of
The advent of mathematical Epidemiology  the field
t, N = S(t) +E(t) +I(t).
of biology which models how diseases spread in a
We assume that the total number of uninfected files
population  is generally credited to McKendrick and
in the network is fixed at M. The total number of
his seminal 1926 paper [10]. Previous work in applying
infected files at time t is given by K(t). The expected
epidemiology to modeling how computer viruses and
proportion of infected files in the network, q(t), is
other malware spreads between machines dates back
K(t)
therefore q(t) =K(t)+M . When a user downloads a
to the early 1990s: Kephart and White published a
file, we assume the probability of choosing an infected
paper [11] on the topic in 1991. More recently, Zou et al.
file will be dependent on the prevalence of infected files
utilized epidemiology to model the spread of the Code
in the network. We model this dependence as being
Red across the Internet [12].
time-invariant in the sense that it only depends on the
There have been a number of recent papers which
current value of q(t), and denote the function mapping
model file propagation in P2P networks. Two notable
q(t) to the probability of downloading an infected file
examples include a 2005 paper by Dumitriu et al. [13]
as f{q(t)}. In Section IV we set f{q(t)} = Ä…q(t) to
which models the spread of polluted files in P2P net-
simplify our analysis. However, we concede that this may
works, and a 2004 paper by Qiu and Srikant [14] which
not necessarily reflect the download behaviour in P2P
models the performance of the BitTorrent P2P protocol.
networks in an accurate manner.
II. MODEL DESCRIPTION
There are three distinct events that may occur in the
The intent of our model is to predict the expected network which affect one or more of the time-varying
behaviour of a virus which spreads through a P2P variables described above. These events include a peer
network in the form of malicious code embedded in ex- downloading a file from another, a peer executing a
ecutable files shared by peers. We make the simplifying shared file, and an Infected peer recovering. The average
assumption that all users download files to their shared rates at which each of these events occurs are governed
folder. We are not concerned with the transfer of media by three parameters:
files which cannot contain malicious code, and do not
S: Average rate, in files per minute, at which
model them. Note that we use the term user in this paper
each peer downloads new files (this includes time spent
to refer to a person using a P2P client program. The term
searching and setting up the connection to another peer).
peer is used to collectively refer to a P2P client and the
user directing its behaviour.
E: Average rate, in files per minute, at which
This model classifies all peers as falling into one of
each peer executes shared files. We assume that a peer
three classes: Susceptible, Exposed, or Infected:
executes files in the order in which they are downloaded.
Susceptible  Peers that are not sharing any infected
R: Average rate, in  recoveries per minute , at
files, but are at risk of downloading infected files. The
which Infected peers recover. A recovery occurs when
number of peers in this category at time t is denoted by
all infected files are removed, returning the peer state
S(t).
to Susceptible.
Exposed  Peers that have downloaded one or more
infected files, but have not executed them. The number
III. MODEL EQUATIONS
of peers in this category at time t is denoted by E(t).
Table I summarizes which time-varying variables are
affected by each of the three events that may occur in
Infected  Peers that have executed an infected file.
the network:
Upon execution, a total of of c infected files reside in
The state progression for all peers in our model is
the peer s shared folder. The number of peers in this
S E I S.... We now derive the differential
category at time t is denoted by I(t).
equations that govern the evolution of our P2P model.
A. Rate at which number of Infected peers change
An Infected client may be detected by the user, who
will then proceed to remove all the infected files, thereby When an Infected peer recovers, the number of In-
returning the state of the peer to Susceptible. At all times, fected peers decreases by one. Recoveries occur at rate
Event Variables Affected
infected files are created when an Exposed peer becomes
File downloaded q(t), S(t), E(t)
Infected. The rate of change is thus E(t)S(c - 1).
File executed q(t), E(t), I(t)
An Infected peer will always share c files, so a
Peer recovers q(t), I(t), R(t)
recovery results in a reduction of c infected files. The
TABLE I. Variables potentially affected by each possible event
rate is therefore -I(t)Rc. The overall rate of change
of K is therefore:
RI(t). When an Exposed peer executes an infected file, dK(t)
= S(t)Sf{q(t)}+E(t)E(c-1)-I(t)Rc (4)
the number of Infected peers increases by one. Since files
dt
are executed in order of download, the file executed by
We note that if the names of generated files are chosen
an Exposed peer will always be the infected file which
from a pool of names >> c, Infected peers can continue
it had downloaded to become Exposed . This occurs at
to download infected files and the above equation does
a rate of EE(t). Therefore,
not hold. However, we will not consider this case in any
additional detail in this paper.
dI(t)
= -RI(t) +EE(t) (1)
dt
IV. STEADY-STATE BEHAVIOUR
B. Rate at which number of Exposed peers change
If the P2P network reaches a steady-state equilibrium
dE(T ) dI(T ) dS(T )
The rate at which the number of Exposed peers
by some time t = T , then = = =0.
dt dt dt
decreases due to infection is given by the negative of
Ü Ü
Defining ź, I, S, as the steady-state values of, respec-
the second term in (1). The rate at which previously
tively, E(t), I(t), and S(t), Equation (1) implies that:
Susceptible peers become Exposed is dependent on the
E
aggregate rate at which they download files: SS(t), Ü
I = ź (5)
R
multiplied by the probability that a downloaded file is
infected: f{q(t)}. The overall rate is therefore:
If we define Ä and µ as, respectively, the expected
number of infected files each Exposed and Infected
dE(t)
= -EE(t) +SS(t)f{q(t)} (2)
peer is sharing in steady-state, then q, the proportion
Ü
dt
of infected files in steady-state may be expressed as:
C. Rate at which number of Susceptible peers change
Ü
źÄ + Iµ
This is governed by the negatives of the the first term
q = (6)
Ü
in (1) and the second term in (2): Ü
M + źÄ + Iµ
dS(t)
Substituting (5) into (6) provides:
= -SS(t)f{q(t)} + RI(t) (3)
dt
ź(ÄR + µE)
D. Rate at which number of infected files in the network
q = (7)
Ü
MR + ź(ÄR + µE)
changes
There are three events which result in a change in
If f{q} > 0, equation (2) implies that, in steady state:
Ü
the number of infected files in the network: a peer
E
Ü
downloads an infected file, an Exposed peer becomes S = ź (8)
Sf{q}
Ü
Infected, and an Infected peer recovers. We assume that
Ü Ü
all downloaded files are executed, and that a peer does
Since S = N - I - ź, equation (5) can be utilized to
not download any additional files prior to executing the
express N as:
most recently downloaded file.
E
Ü
Peers cannot share more than one copy of a file with S = N - ź(1 + ) (9)
R
the same name. If the number of unique filenames is lim-
If f{q(t)} is proportional to q(t): f{q(t)} = Ä…q(t),
ited to c, only Susceptible peers can download infected
we may obtain a closed-form expression for ź by
files. Exposed peers do not download any additional files
substituting (7) into (8), equating with (9), and solving
before becoming Infected, and Infected peers are sharing
for ź:
all c possible infected files. Thus, the rate of change due
to downloads is S(t)Sf{q(t)}.
RÄ…(NS(µE + ÄR) - MER)
ź = ; q >0
Ü
An Exposed peer always has one infected file before
(ÄR + µE)(SÄ…(R + E) +ER)
becoming Infected, meaning in all cases c - 1 new (10)
5
Ü
x 10
The expression for I follows trivially from (10)
8
and (5):
7
EÄ…(NS(µE + ÄR) - MER)
Ü
I = ; q >0
Ü
6
(ÄR + µE)(SÄ…(R + E) +ER)
(11)
5
Ü
If q =0, it follows from (6) that ź = I =0. It is of
Ü
4
interest to consider Equation (11) as it approaches 0. In
the limiting case, approached from above, we have the
3
equality
2
NS(µE + ÄR) =MER (12)
1
Since we assume that all downloaded files are eventually
0
0 200 400 600 800 1000
executed, it follows that E = S if we consider these
Time (hours)
rates to be averaged over a sufficiently long interval.
Under this assumption, (12) provides the following min-
Fig. 2. The number of infected peers vs. time for different initial
imum average recover rate, min in order for all infected
conditions. The solid line corresponds to 10 000 infected files initially
R
in the network, the dashed line: 100 000 initial infected files, the
files to eventually be removed from a P2P network:
dotted line: 1 000 000 initial infected files
NµE
min = ; M>NÄ (13)
R
files and both Ä… and E may be approximated as linear
M - NÄ
over the ranges considered, whereas the dependence on
This equation indicates that, if f{q(t)} = Ä…q(t), then
c suggests a log-function.
min is a linearly increasing function of E.
R
VI. CONCLUSION
V. RESULTS
We have presented a model of how infected files
In this section we provide some examples of virus
spread in a P2P network, and derived expressions for the
behaviour in P2P network predicted by our model. The
steady-state behaviour in the case where the probability
value of Ä is 1 and µ = c, which follows from the
of a peer downloading an infected file is proportional to
discussion in Section III-D. Figure 1 illustrates how the
the prevalence of infected files in the network. In future
number of peers falling into each of the three categories
work we will derive a function mapping file popularity
evolve over time, and eventually reach a steady state.
to download rates in a manner that more closely mirrors
In this case, E = S =3.47 × 10-3 files per minute,
user behaviour in an actual P2P network and model the
which corresponds to 5 downloads per day. The average
dynamics of a virus that can choose file names from a
time for a peer to recover is 24 hours, meaning R is
pool much larger than c.
6.94 × 10-4. The number of peers, N, is 2 million and
there are 60 million clean files M. This example makes
REFERENCES
use of the model in which the number of unique possible
[1] F-Secure,  F-secure hoax information pages: Mp3 virus,
files is limited to c, and c is 10. Finally, f{q(t)} =
http://www.f-secure.com/hoaxes/mp3.shtml, 1998.
0.5q(t). Initially, there are 10 000 Exposed peers, each
[2]  Kazaa, www.kazaa.com.
sharing one infected file. [3]  Edonkey2000, www.edonkey2000.com.
[4]  eDonkey2000 server list, http://ocbmaurice.no-ip.org/slist
In Figure 2 we examine the number of infected
/serverlist.html.
peers in the network when varying the initial number
[5]  Gnutella protocol development, http://rfc-gnutella.
of infected peers. After about 700 hours, the three
sourceforge.net.
[6] Viruslist.com,  P2p-worm.win32.achar.a, http://www.viruslist.
networks reach the same steady-state. This is also the
com/en/viruses/encyclopedia?virusid=23893, May 2003.
behaviour implicitly predicted by equation 11, since it is
[7] Symantec,  W32.hllw.gotorm, http://securityresponse.
independent of any initial condition (as long as at least
symantec.com/avcenter/venc/data/w32.hllw.gotorm.html,
one infected file initially exists in the network).
August 2003.
[8] Viruscan,  W32/bare.worm, http://www.virus-scan-software
Figures 3, 4, and 5 examine the effect that, respec-
.com/latest-virus-software/latest-viruses/w32bare-worm.shtml,
tively, c, Ä…, and E have on the steady-state number of
2003.
infected peers and the proportion of infected files. The
[9] Sophos,  Sophos virus analysis: Troj/krepper-g, http://www
relationship between the number of infected peers and .sophos.com/virusinfo/analyses/trojkrepperg.html, July 2004.
Number of Infected Peers
6
x 10
0.07
2
0.06
Susceptible Peers
0.05
1.5
0.04
1
0.03
0.02
0.5 Infected Peers
0.01
Exposed Peers
0 0
0 200 400 600 800 1000 0 200 400 600 800 1000
Time (hours) Time (hours)
(a) The number of peers in each group (b) The proportion of infected files
Fig. 1. Example of the dynamic behaviour of a P2P network exposed to a virus. The network reaches steady-state after about 600 hours.
5
5
x 10
x 10
15
15
10
10
5
5
0
0
0.2
0.8
0.15
0.6
0.1
0.4
0.05
0.2
0
0.2 0.25 0.3 0.35 0.4 0.45 0.5
0
0 20 40 60 80 100
Download Rate (Files per Hour)
c
Fig. 5. Examining the effect of E on the steady-state number of
Fig. 3. Examining the effect of c on the steady-state number of
infected peers and infected files.
infected peers and infected files.
5
x 10
[10] A.G. McKendrick,  Applications of mathematics to medical
15
problems, Proc. Edinb. Math. Soc., vol. 44, pp. 98 130, 1926.
[11] J.O. Kephart and S.R. White,  Directed-graph epidemiological
10
models of computer viruses, in Proc. IEEE Symp. Security and
Privacy, Oakland, CA, May 1991.
5
[12] C.C. Zou, W. Gong, and D. Towsley,  Code red worm propa-
gation modeling and analysis, in Proc. ACM Conf. Computer
0
and Comm. Soc., Washington DC, Nov 2002.
[13] D. Dumitriu, E. Knightly, A. Kuzmanovic, I. Stoica, and
0.1
W. Zwaenepoel,  Denial-of-service resilience in peer-to-peer
file-sharing systems, in Proc. ACM Sigmetrics, Banff, Canada,
June 2005.
0.05 [14] D. Qiu and R. Srikant,  Modeling and performance analysis of
bittorrent-like peer-to-peer networks, in Proc. ACM Sigcomm,
Portland, OR, Aug. 2004.
0
0.5 0.6 0.7 0.8 0.9 1
Ä…
Fig. 4. Examining the effect of Ä… on the steady-state number of
infected peers and infected files.
Number of Peers
Proportion of Infected Files
Number of Infected Peers
Number of Infected Peers
Proportion of Infected Files
Proportion of Infected Peers
Number of Infected Peers
Proportion of Infected Files


Wyszukiwarka

Podobne podstrony:
Gardner A Multiplicity of Intelligences In tribute to Professor Luigi Vignolo
Computer Virus Propagation Model Based on Variable Propagation Rate
Kopelmann, Rosette Cultural variation in response to strategic emotions
Shock wave propagation in a branched duct
Flashback to the 1960s LSD in the treatment of autism
How to Debate Leftists and Win In Their Own Game Travis L Hughes
IEEE Finding Patterns in Three Dimensional Graphs Algorithms and Applications to Scientific Data M
[WAŻNE] Minister Falah Bakir s letter to Wall Street Journal Don t forget Kurds role in Iraq (05
Introduction to multivariate calibration in analytical chemistry
Sex Secrets How To Turn A Woman On, Satisfy Her In A Big Way
Halting viruses in scale free networks
Dynamic Performance of a Microturbine Connected to a Low Voltage Network
using linux to install windows xp with network booting
Malware in Popular Networks
Worm Propagation Modeling and Analysis under Dynamic Quarantine Defense

więcej podobnych podstron