We're firing-up the Esphion blog again! We've had our heads-down for the past year growing the business and developing the next generation of our solution. So, there is lots to talk about. More to come...
We're firing-up the Esphion blog again! We've had our heads-down for the past year growing the business and developing the next generation of our solution. So, there is lots to talk about. More to come...
My apologies for not posting a new blog entry in such a long time. A busy work schedule, business trips and the holidays thrown in for good measure prevented me from paying as much attention to the blog as I should have.
Well, we are one month into the new year now. Worms and viruses keep coming at us. One vulnerability, which was discussed in great length in recent weeks was the so-called WMF (Windows Meta File) flaw. Merely by visiting a compromised web-site, a user's PC could already be infected.
Interestingly, in September of last year I wrote an article about the possibility of an emerging black-market for vulnerabilities. The idea is that certain individuals are willing to pay money to get their hands on exploits, which allow them to compromise more machines. These machines can then be used for lucrative businesses, such as spam, p0rn hosting, DDoS attacks, click-fraud, etc.
Today eWeek reported that the WMF exploit was available for money, in exactly this black-market for vulnerabilities, weeks before security researchers even knew about it. For $4000 the exploit was offered in the middle of December last year by Russian hacker groups. Here is an interesting quote from the article:
There are dozens of these sites with hackers offering zero-day code for sale all the time. They even have a mechanism to test the code to make sure it is legitimate and will get past anti-virus software.
The lesson we can learn from this is: Zero-day attacks will remain a threat to our network and computer security. Therefore, we will continue to see attacks that manage to evade signature based security solutions.
Several publications today commented on the new SANS Top-20 report, which was just published. Noteworthy about this year's report is the fact that in 2005 the authors of worms have moved their attention from operating system bugs to vulnerabilities in application code and even network devices.
Largely, this can be attributed to increased security efforts by the OS vendors, such as Microsoft's now regularly scheduled 'patch Tuesdays'. But while a lot of attention has been paid to OS security and the automatic and regular application of necessary patches, much less has happened for application software. For the most part, patching of applications is still irregular, slow, manual, and basically ... patchy.
In 2005 there was a remarkable number of highly critical vulnerabilities in various networking devices, such as switches and routers, but also in security software. We all remember the Witty worm, which so elegantly took advantage of a protocol-parsing vulnerability in firewall code. It demonstrated that even with a relatively small vulnerable population, it is possible to accumulate significant numbers of zombies, if the worm is well written.
What are the consequences of these trends? With increased attention to the non-OS layers of the overall computation stack, we can expect to see more diverse worms. An individual worm may not infect many millions of systems, but can still deliver a sizable population of zombies to its creators.
Since worms are now focusing on applications and network devices as well, there are obviously many more potential points at which a network may become compromised. If patching and securing a network felt like a never-ending nightmare before - apparently it is going to get worse now. I think it is safe to assume that the developments, which are reflected in the Top-20 report point towards a heightened possibility for any organization to experience a worm outbreak on their internal networks, and are going to result in ever larger windows between discovery of a vulnerability and patches eventually being available and applied.
What can be done? Well, we have to accept the fact that outbreaks are going to happen. There are too many attack points to guard them all. The famous dissolving perimeter has just become even more of a problem. While a key focus for every organization has to remain on outbreak prevention, it is vitally important also to devote resources to outbreak management. Assume that an outbreak will happen - what are you going to do about it? Signature based systems, which may be great at preventing known exploits from passing through, will be powerless when faced with a zero-day vulnerability.
An ideal, towards which every organization could strive, is the self-defending network. In such a network, anomaly detection systems are deployed to ensure that any outbreak, even of a zero-day worm, can be detected, analyzed and mitigated quickly.
In general, we can say this: Outbreaks will happen, but only if you see them, and understand them, will you be able to do something against them. Anomaly detection solutions provide instant visibility and analysis of suspicious behavior on the network. Therefore, they are a vital component in any outbreak management strategy.
We all know that it is a good idea to place smoke detectors throughout our houses. In theory, though, we could also wait for the neighbors, or the community in general, to call the fire-department once they see flames coming out of our house. Or if there is a larger fire in our neighborhood, we could assume that the sound of the sirens will be enough to alert us.
But we don't think that way. While community action is good, and emergency broadcasts about approaching fires are certainly very necessary, we still also place our own smoke-detectors in our house. Why is that?
The reason of course is that our own smoke-detectors are best situated to detect a fire in the very early stages, right where it matters - in our house. At such an early stage, there is only very little smoke. No neighbor will see this, no fire sirens will sound, no fire-trucks will come rushing down our driveway. There is too little smoke for anyone else to see. Instead, we very much need to have our own personal alarm in our own, private space. Nobody else has the same insight that we have, nobody else can see or smell the air in our own house, and nobody else can detect this fire, our very own personal problem, as fast as we ourselves can.
This is quite obvious, and I am sure that nobody will really disagree with these statements.
Therefore, it is surprising to me that many network operators and corporations are placing the security of their network into the hands of the community, if you will, without allowing for the presence of their own smoke detectors. What I am referring to is the tendency of many organizations to feel safe and secure, once they have installed IPSs (Intrusion Prevention Systems) in their network, which receive updates and new signatures from the global data center of the IPS vendor. Let me ask you this:
What does the global data center know about localized anomalies and issues, that are specific to your own network?
If your network is targeted by a specific DDoS attack, or if there is a zero day worm spreading in your network, or if there is a traffic anomaly caused by equipment failure or misconfiguration, or if there are failing or misbehaving applications or users... how can some global data center, which does not see any of this be of help to you?
The answer, of course, is that it cannot help. To get back to the analogy of the smoke and fire, these data centers are great to inform you of fires in your community. They can sound the sirens to alert you to approaching storms, which also have affected others already. In some cases, they may even be able to prevent issues on your network, by uploading a signature for one of those global anomalies before your network is hit.
However, these global data centers are quite useless when it comes to detecting the first wisps of smoke, indicating something that is specific to your network alone.
So, to be truly protected, you see that you need your own smoke-detector for your network. These network smoke-detectors are called anomaly detection systems. Every mission critical network has to have one of those, since signature-based IPSs alone cannot help at all with any issue that is localized, and that is affecting your network, either by accident or by design.
The good thing is, though, that you don't need to throw away your investment in IPSs or deep-inspection firewalls. Quite the opposite. A good anomaly detection system will be able to produce fine-grained signatures out of the first traces of an anomaly, which can then be fed to the IPSs for filtering. See my articles here and here.
For fire-protection, we rely on a community effort along with local smoke detection. We intuitively know that this is best practice. The same holds true for our networks. Having IPSs, which can protect against known threats, is necessary. But at the same time, having an anomaly detection solution as the local smoke detector is equally necessary, and equally needs to be considered as best practice. Without it, our threat detection and management capabilities are simply not complete.
Please allow me to get a bit philosophical today, about highly complex, dynamic systems. I promise, at the end there is a rather important network angle to all of this...
The Butterfly Effect
In chaos theory, the term butterfly effect describes a particularly interesting observation: Given a sufficiently complex and dynamic system, even the smallest variation of the starting conditions will result in unpredictable long-term behavior. The effect probably got its name from the example that even the gentle flapping of a butterfly's wings will potentially influence the earth's weather and climate (a phenomenally complex system) in the long run. In other words, if this butterfly would not have flapped its wings, then maybe we would not have had a storm on the other side of the world a few months later.
Incredible as this may sound, such is the behavior of highly complex, dynamic systems: Changing just a single parameter may eventually result in a completely different and unexpected outcome. The butterfly effect can be demonstrated with various mathematical formulas. Real world systems usually 'suffer' from permanent additional random input, which makes accurate predictions about their behavior even more difficult. On the most basic level of matter, the Brownian motion of particles is completely unpredictable. If you allow me to stretch the euphemism further, the Brownian motion essentially transforms every single atom into a randomly flapping little butterfly. Keeping this picture in mind, you can see that any attempt of accurate long-term prediction is doomed to failure. Our notoriously inaccurate long-term weather forecasts are a good example.
Interestingly, we are surrounded by such complex systems in our daily life. The air that flows over our cars causes chaotic vortices behind it, raindrops run down our windows in unpredictable paths. Yet, albeit often entirely unpredictable, we are very rarely surprised by these effects. Why is that? Because we have learned to deal with them on a macroscopic, rather than microscopic level.
Take the example of the raindrops on the window: We may not be able to predict their exact path, but we do know that eventually the water will flow down. The drop may zig or zag, but unless it gets stuck somewhere half way, it will eventually make it's way downward. Intuitively, we realize that any attempt to predict the exact motion of those drops is entirely futile. Our understanding of the world is sufficiently confirmed by the simple fact that the drops will eventually find their way down. That's all we need to know and all we can know.
The whole is more than the sum of the parts
Here is another interesting observation, which contributes to the unpredictable behavior of complex systems. 1+1 is 2. In math this is simple. In the real world, however, we have cases where the whole certainly is different than the sum of the parts.
For example, take two sufficiently sized lumps of weapons-grade Uranium. Put them together. What do you get? Not one big lump of Uranium, but instead an entirely surprising crater in the ground. Another good example is our brain. It is made up of a huge number of very simple cells, the Neurons, which are connected via the Synapses. Put it all together, and you don't have a blob of cells, but something rather wonderful and astonishing: A brain capable of thoughts and memories.
Again, we are surrounded by examples of this. In fact we are an example of this effect. Yet, we tend to not think about this at all. Why? Because in many cases, we see the result of the whole before we even realize that there are all those parts playing a role in it.
And what does this have to do with networks?
Glad you asked...
Consider today's networks: They are getting more complex, no doubt about it. There are more vendors, more pieces of equipment, more architectures, more people and computers, more use models and applications. So, not only are the individual networks more complex than in the past, they are also more unique. The uniqueness is not surprising, considering that there is this increasing number of variables, which defines the network. No two organizations will have the exact same network.
So, now imagine one of those overtaxed corporate or provider networks, operating more or less within acceptable boundaries. Suddenly, and quite possibly completely out of the control of the network operator, a new application is unleashed onto the network. Skype is an application like that. A worm outbreak is an extreme case of such an application. P2P traffic is another example, which has been building up and morphing and shifting over the last couple of years. So, what happens when something new is added to the network, may it be another application or another piece of equipment, or another batch of users? The truth is that very often, not even the network operator will know...
Many operators try to deal with this on a macroscopic level. As we have seen, this is the natural tendency for us. So they add large amounts of excess capacity to the network, hoping it will be able to deal with whatever comes their way. But of course, this is inefficient. Corporate network operators don't even have that option, since they need to worry about more than just bandwidth and availability - they also need to ensure the security of the network and its attached computers.
Chaotic complexity of networks
Quite often, we hear from potential customers that they don't even know anymore what exactly is happening on their network. That is how complex these systems have become. The fact that there is some excess capacity has often been the saving grace of those installations. Just like a huge number of chaotic moving water drops can be controlled on a macroscopic level, by forcing them all through a water pipe, many network operators have taken a step back, and simply hope that the access capacity will have the same effect: It all keeps flowing, even though we have no idea what is really going on within the pipe.
But at the moment a more fine-grained control is required, for example for detailed SLAs, or just for network security considerations, this approach fails. Then we are suddenly back to rules, signatures and policies. What do they represent? A microscopic approach to network and traffic management.
Remember the cases of the raindrops, running down the window? We have seen that their motion is unpredictable. If you observe a single drop, you can probably make a pretty good prediction what the motion will be like in the next tenth of a second (a short-term forecast), but after that, predictions will become inaccurate.
Networks have demonstrably reached a point where exact predictions about their behavior is not possible anymore. Change a little bit, add a little bit, and the outcome is unpredictable and often surprising.
The futility of rules
Why then, do we still rely on rules, policies and signatures in our attempt to control those networks? It is essentially a law of nature that a microscopic control approach is not well suited at all for a complex, and possible chaotic system.
Firstly, it has become quite impossible to write enough rules to cover all the use cases and situations for a network. Secondly, once there is just a slight change to the network, the unpredictable nature of its behavior may render many of those rules useless. As a result, maintaining rule sets and policies for complex networks quickly becomes a never ending Sisyphean task.
Taking an intelligent macro-view
Networks are becoming more complex and more unique. The number of variables that describe those complex systems are always increasing. In light of this, I propose that an operators of complex networks should not rely on controls on the microscopic level. Instead, a macro-view of the network needs to be taken. Instead of providing rules for every network condition, the overall behavior of the network should be considered. This leads us then to the field of behavioral anomaly detection.
I am not proposing that all rule and signature-based systems should be torn out of a network installation. These systems are useful to provide some basic boundaries within which the network traffic has to operate. Almost like the water pipes for the chaotic moving water. However, it is not possible to exhaustively describe the behavior of the network with those rules. Instead of wasting time and resources in attempting the impossible, it makes much more sense to complement the network architecture with a rule-less behavioral anomaly detection system. This system will be able to detect macro-trends (comparable to an observation such as raindrops run downwards), without having to know the exact detail about every single packet.
If the anomaly detection system is good, it will manufacture detailed rules and signatures on the fly, which allow the operator to handle anomalies as they happen. This allows mitigation, without having to express in rules and signatures about what to look for, ahead of time. These on-the-fly rules apply to the behavior of the network right now. Comparable to more accurate short-term forecast vs. much less accurate long-term forecast.
I hope I was able to provide some food for thought and highlight some aspects of the nature of complex systems. Today's networks approach complexity levels, which already result in unpredictable, chaotic behavior. The butterfly effect and the simple statement that the whole is often more than the sum of the parts, nicely describe the situation.
Chaotic systems are inherently unpredictable in their behavior. Therefore, any attempt to express predictions is doomed from the onset. If we can agree that many networks may exhibit complex and possibly chaotic behavior, then it is instantly obvious that some aspects cannot possible be covered by rule and signature sets. Writing those sets and maintaining them can be utterly frustrating and, in the end, useless.
As a solution to this dilemma, I propose the deployment of behavioral anomaly detection systems, which are able to observe the network and provide more accurate rules on-the-fly and in realtime, taking into account the current network condition and actually observed anomalies.
So, next time your network faces a meltdown, remember the butterfly and the raindrops. This is probably a good time then to remember that pre-supplied rules and signatures are not any more accurate than long-term weather forecasts, and that something more intelligent is needed in the network.
Let me try to clear up some confusion about the meaning of zero day protection. Unfortunately, many vendors of security solutions modify the definition of this term as needed, to make their products appear in the most positive light. After all, they all want to be able to say: We offer zero day protection!
Well, not so quick, please.
First of all, it is important to distinguish between two different concepts: The zero day vulnerability on one hand, and the zero day exploit on the other. Too often, those two terms are used interchangeably, even though they mean something very different.
Zero Day Vulnerability
If there is some vulnerability in a system, which nobody except the discoverer of that vulnerability knows about, then we talk about a zero day vulnerability. What this implies is that the security community and public do not know about the vulnerability, at all. Therefore there are no signatures for it and no patches. If the discoverer of the vulnerability is set on compromising other computers, then they may start to attack systems at their leisure. Everyone will be taken by surprise.
Zero Day Exploit
This is something entirely different: A zero day exploit is the term used to describe the attempt to take advantage of a known vulnerability, but with a new kind of exploit. If an IPS or IDS has rules to detect the attempt to take advantage of this known vulnerability, then they may detect this event. In effect, even though we have not seen a particular exploit before, we may still be protected, because just trying to take advantage of the underlying vulnerability will result in some specific session / packet content that can be discovered via signatures. Examples here are different mutations of the same worm, which all take advantage of the same vulnerability, but have somewhat modified code to do so.
One may discuss the theoretical definitions all day long. In the end, for the individual network operator, all that matters is whether their network is protected. The best signature-based system does not help, if it does not have the latest signatures, yet. For example, the Witty worm was, in effect, exploiting a zero day vulnerability for most networks. The particular vulnerability it exploited had been announced about one day before the worm broke out. Thus, the vulnerability was known to the security community. However, most networks did not have signatures for it, yet, and thus, for all practical purposes, it was a zero day vulnerability as far as they were concerned.
Zero Day Protection
So, what is zero day protection then? For vendors of traditional signature based systems (most IDSs and IPSs), zero day protection is the ability to protect against zero day exploits. They rely on the fact that they know ahead of time of a particular vulnerability. This allows them to provide signatures for the mere attempt to take advantage of the vulnerability. As we have seen with the Witty worm, this approach does not guarantee protection, or even detection of a new worm. Other trends are further contributing to the ever shrinking time window between discovery of a vulnerability and the release of a new worm trying to take advantage of it.
True zero day protection therefore cannot ever rely on any prior knowledge. For true zero day protection, a security solution needs to be able to discover abnormal behavior of hosts or networks, without needing any signatures databases. In addition, such a solution needs to be able to extract fine-grained signatures from the observed anomaly. Only if both conditions are met it is possible to architect self-defending networks, which can deal even with a true zero day vulnerability.
Yesterday, CERT issued this warning. It describes a vulnerability in a special protocol processing module of the popular SNORT intrusion detection software. Using a maliciously crafted packet, an attacker may gain control over the machine that runs SNORT (!). It is expected that the simplicity of this exploit (a buffer overflow) means that a worm will be appearing soon.
A very similar scenario took place with the Witty worm earlier last year. What do these scenarios have in common? In both cases, a deep-packet inspection security solution, performing protocol analysis, actually became a security weakness. Somewhat similar cases have also repeatedly happened with various anti-virus software packages from different vendors.
Ok, I admit it: The title of this article is probably more controversial than necessary. Firewalls and IPS solutions, which are capable of deep-packet inspection and protocol analysis, by themselves are quite useful. They do not represent a security risk any more than any normal server or desktop does...
Wait a minute...
We do assume that servers and desktops are a security risk of sorts, or else why would we deploy a deep-packet inspection firewall? Why should servers or desktops be a security risk? Because those machines need to parse the various application level requests that come their way over the network. And in doing so, they may be vulnerable to maliciously malformed requests and packets, if the code performing the parsing contains a bug. Buffer overflow exploits are quite common, caused by missing or incorrect range checking of the parameters that are contained within the request.
So, what do we do to combat the weakness introduced by code that has to perform the complex request parsing? We introduce even more code that performs complex request parsing, in the form of a deep-packet inspection firewall or protocol parsing IPS/IDS. Somehow, this does not seem to make any sense. Now, instead of one potentially vulnerable system (the server, for example), we have two potentially vulnerable systems on the network.
Of course, it can be argued that the vendors of the security solutions will be extra careful in writing their protocol parsers. But in the end, the people writing them are still just that: People. Human beings, who are prone to make mistakes once in a while, just like the authors of server and client software sometimes make mistakes.
In general, the more complex the solution, the more room there is for potential vulnerabilities. In this case, the parsing of potentially maliciously formed packets and requests can be quite complex. The attacker is in control here, because the security solution has to parse whatever was sent their way.
There are a class of security solutions, which do not suffer from that particular weakness. Ordinary firewalls, IPSs and IDSs that perform simple signature matching, anomaly detection systems that work on traffic meta data, and so on. All of these solutions have in common that they don't really parse packets or application requests. They look at fixed data fields in packet headers, for example. Not much parsing going on in that case. Or they count packets and packet types. Also, nothing the attacker sends needs to be parsed.
The moral of the story is a well-known security architecture paradigm: Don't trust a single-layer or single-solution approach to security. In this case, the ideal solution would be to pair the protocol-analyzing deep-inspection firewall or IPS/IDS with an anomaly detection system. The former takes care of well-known exploits, the latter of all the zero-day exploits and site specific anomalies. It may even produce signatures for zero-day vulnerabilities on the fly and feed them to the deep-packet inspection firewall or IPS (see here for more on that). In addition, the anomaly detection solution may also notice when the protocol-parsing solution has been commandeered by an attacker for different tasks.
Put in other words: If one needs the analysis and filtering of protocol-parsing and deep-packet inspection, it would be a good idea to have something else watching the watch-man, so to speak.
Departing from the usual text-only style of my articles, today I would like to share a picture with you. It came out of an attempt to find a graphical representation of the necessity for fine-grained filters when it comes to the mitigation of network anomalies. I talked about that topic before.
The point I repeatedly made in this blog is: Unless you have fine-grained filtering capabilities in place, any attempt to mitigate a network anomaly may be comparable to doing open-heart surgery with an axe. In particular, Netflow based solutions are always severely limited in the accuracy of any filter recommendations they can produce. This is caused by the fact that they don't have access to any of the information needed to produce such fine-grained signatures. Instead, they see an abstraction of the traffic (flow-records) rather than the traffic itself.
Packet-based anomaly detection solutions, however, can see all the information they need to produce truly fine-grained signatures, because they have access to the raw packet data.
The concept I would like to introduce then is the Smallest Possible Superset (SPS) of an anomaly. The SPS describes how well the recommended mitigation filter matches the anomalous traffic. Of course, the filter should ideally cover at least 100% of the anomaly. However, if the filter is too broad, it will cover more than necessary, resulting in innocent traffic being filtered as well. The smaller the SPS the better. This graphic illustrates the point:
Illustration of the Smallest Possible Superset (SPS) of an anomaly
We can see the overall traffic in a network, illustrated as the gray area. The anomaly, for example a DDoS attack on a web-server, is presented in red. The purple area is the SPS, which is described by the mitigation filter that was recommended by an anomaly detection solution.
In case of a flow-based system, the SPS may be quite large. Imagine the web-server is under TCP-SYN attack to port 80 from random source addresses. If the anomaly detection solution does not see the raw packets, all it can do is to recommend that all SYN packets to port 80 of that web-server be filtered. Clearly, that would shut down any further activity for that server. Even perfectly legitimate connection requests would be denied.
A packet-based anomaly detection solution, however, can look for identifying characteristics in the SYN packets that make up the attack. Thus, once identified, those characteristics can be used to describe the SPS much more accurately.
Especially in conjunction with an in-line filtering system, for example a good IPS, such real-time and fine-grained mitigation filters can be very effectively implemented. Overall network impact on the legitimate operation of the network will be significantly reduced, due to the fact that the SPS is much smaller, and very closely matches the anomalous traffic.
In past postings to this blog, I have often talked about the merits of anomaly detection: How it can proactively protect networks against the unknown and how it can improve ROI on existing investments in security and infrastructure. Nevertheless, even though anomaly detection should by now be part of a best-practices approach to any network security architecture, there still is the need for customer education.
The reason for this is that traditional security solutions typically were based on deterministic rules. Firewalls blocked specified ports. IPSs/IDSs blocked/detected certain pre-specified signatures. We now know that such deterministic approaches are not sufficient anymore when faced with modern threads such as zero-day worms or rapidly changing DDoS attacks. Therefore, anomaly detection has become a necessary addition to any multi-layered security approach.
Because the deterministic security solutions have dominated our thinking whenever network security was considered for such a long time, it is sometimes difficult to appreciate how anomaly detection differs from that approach, even if the business values are in theory obvious.
Therefore, in this article, I would like to give a brief introduction into how we at Esphion perform anomaly detection.
What Esphion's anomaly detection is NOT
First, it is important to understand what we do not do:
Our anomaly detection does not rely on any pre-specified rules, any baselines, any models, any signatures, or any other prior knowledge.
This is a very important point. In the moment you use prior knowledge for anything, you are faced with two problems:
IPSs and IDSs require constant updates to their signature databases. Firewalls need updated lists of ports to block or allow. This configuration needs to be performed regularly. If a new threat or anomaly emerges, then those solutions are blind to this until they have been updated.
Please note that even many vendors in the anomaly detection market are still requiring prior knowledge. This often does not come in the form of explicit rules and signatures, but instead requires those solutions to baseline the normal behavior of the network. This usually takes some time during which the system cannot report on anomalies, but instead simply observes the traffic in the network. This time is usually called the bedding-in period. How long this period is depends on the vendor's specific approach. However, many vendors recommend several days or even weeks. Obviously, if the usage profile of your network changes considerably, this bedding-in period needs to be repeated. Consequently, such solutions are not very scalable in real-world network environments.
Esphion's approach to anomaly detection is completely independent of any prior knowledge. Not only do we not use any established rules or signatures, but we also do not require a bedding-in period. Once installed, Esphion's solution is virtually instantaneously ready to report on observed network anomalies. We recommend for around two-hours to pass after installation, but that is all. This time is also not used for any baselining activity, but instead to prime a statistical pre-processing pipeline in our system. Once this is done, however, this step does not need to be repeated, even if the network environment should change.
So, what's important to remember from all of this? In the moment you rely on any prior knowledge of any kind, whether configured or learned, you are already at risk. Esphion's solution fortunately does not have this problem.
The basis of detection: Traffic meta-data
Our approach to anomaly detection is scalable, by restricting heavy-duty traffic analysis to when it is really needed. Therefore, much higher packet rates can be supported than in the case of IPSs or IDSs, which need to perform in-depth scanning of every packet.
Instead, our anomaly detection utilizes data about the network traffic. In effect, the presence of anomalies is detected by means of traffic meta-data. As network packets pass by our listening sensors (we call them agents), we merely record statistics about this traffic. For example, how many TCP packets, how many UDP packets, etc. Slightly more detailed, we may record how many TCP-Syn packets we see, how many TCP-Fin packets, and so on. We keep track of a few thousand such statistics.
This data is constantly collected and forms the foundation on which our anomaly detection is build. How so? As it turns out, one can detect even the onset of network-impacting anomalies by careful examination of those statistics. Under normal usage conditions, these statistics behave in certain ways relative to each other. In the face of an anomaly, for example a DDoS attack or a worm outbreak, the way those statistics relate changes subtly.
The brain of detection: Neural networks
If you were to plot the various statistics that we collect about the network traffic, you would see how they are changing during times of network anomalies. The changes would be subtle, and it would not look very specific to the casual observer. Even so, you would probably be able to realize that something strange is going on. You can do that, because every human being has a great pattern recognition engine: The brain. It may not be good at quantifying (what exactly is going on and to what degree), but certainly good at qualifying that there is something amiss.
To provide this capability in an automated fashion, Esphion utilizes neural networks to observe the traffic meta-data. Neural networks, as we know, are a computer-based emulation of the brain's neurons. These neural networks know how to recognize that the network traffic is changing in ways that are not seen during normal operation. So, in effect, the neural networks act as a 24/7 intelligent observer of network meta-data.
The footwork of anomaly detection: Zero-day signature extraction
Remember that so far we have only worked with traffic meta-data: Light-weight statistics about the network traffic. However, in order to properly mitigate an anomaly a little bit more information is needed. Since Esphion's agents are listening to the raw network traffic, we have access to all the information we need, including data contained in the packet headers and payloads.
Once the neural networks have detected the presence of a network anomaly, the agents are instructed to capture actual sample traffic. We then perform a more CPU intensive analysis only on this truly relevant, pre-qualified traffic. This analysis looks to find any patterns, which uniquely distinguish this traffic from other, normal traffic on the network.
Note that this is not the comparison of observed traffic to pre-configured signature databases. As discussed, that would be the wrong approach. Instead, the analysis starts with no assumptions at all and simply detects if there are elements in the observed traffic, which are unique to the packets that are part of the anomaly. As a result, we get very fine-grained zero-day signatures, usually within seconds after the onset of an anomaly. No matter if it is a DDoS attack, a worm outbreak, or a network malfunction - very quickly the network operator has a detailed signature in their hands, which can then be used to surgically remove the offending traffic.
Well, there it is: A high-level overview of how our anomaly detection works. The key points are that there is absolutely no prior knowledge required, and thus, the system can not be blind-sided by zero-day anomalies, changing network conditions, or long bedding-in periods. Combine this with intelligent neural networks, and the capability to observe the raw network traffic to extract zero-day signatures, and the result is a truly powerful additional layer of security, which should be present in any mission critical network.
Lately, as we are talking to customers and partners, we are increasingly hearing one particular message more and more often: Disappointment in the performance of IPSs ... budgets being reallocated from IPS deployment projects to NBAD projects. In this article, I would like to explore where the disappointment comes from, and how IPS deployments (and investments) can be rescued.
The wonderful world of IPS marketing
Unhappyness with IPSs. Where does this remarkable turn of events come from? After all, not too long ago, IPSs were heralded as the be-all and end-all of network security. The one-stop-shop for all that is required to keep an enterprise's data, hosts and networks safe and sound.
Even then, though, many security analysts already pointed out the accepted best-practices approach: There should always be multiple layers to any network security architecture. Do not rely on a single point solution. However, there is something inherently attractive about an IPS for any organization. and this message proofed to be more powerful than any warning: Here is one box, which can act as a firewall, but also as a much smarter filter on the data that needs to be let through. It may even scan e-mail for viruses. And it is all updated automatically, remotely, right? No problem then! If a new worm or virus should come around, there will be new signatures for it uploaded on the device in a hurry, and my network will be safe, right? One box to secure it all...
Too much hype
Ironically, this simple and effective marketing message now turns out to be the IPS's undoing. Even more ironically, the technology of the IPSs is fundamentally sound. These are good devices, which really work. The problem is that claims for their capabilities have been blown out of proportion. Therefore, customer expectations have been elevated to levels, which the technology cannot meet in the real world.
The prime example for this is the claim that IPSs can protect a network against zero day attacks or anomalies. But the simple fact is that a signature-based pattern matching mechanism, the core technology of an IPS, can never detect zero day exploits or anomalies. Even those IPSs which can detect RFC violations in protocols are essentially matching a signature (the description of the protocol in the RFC) against the observed traffic.
What's the problem?
If your network security relies on signatures then you have to ask yourself: Where do those signatures come from? In the case of IPSs, the signatures are generated by human beings in the IPS vendor's data center. Skilled security specialists are working there to examine any newly found type of malware or anomaly, and generate signatures suitable for their device. These are then uploaded to all the installations out there. But there are of course two significant shortcomings to this approach:
So, what happened in many of those networks in which shiny, new IPSs were installed? Time and again, even those networks protected by IPSs were taken down or were affected by zero day outbreaks, anomalies or attacks. And obviously, any organization that bought into the marketing message of the IPSs, is disappointed (to say the least) when this happens after a significant investment into IPS solutions.
What IPSs are good at
I said earlier that I believe IPSs are in fact good devices, with good technology. You may wonder how I can say that in light of the disadvantages I just listed. Well, I also said that the problem was mostly with overblown customer expectations. I really do think that IPSs are very good at what they are designed to do: Look for patterns, even deep inside of packets or connections, match those against some known set of signatures, and perform some specified action based on that. This is all, no more no less. Many IPSs have become very good at this, and provide wonderfully fine-grained means to examine and filter traffic.
Clearly, though, this impressive ability does not help with any zero day anomaly, anything that is specific to the network in which the IPS is deployed, or anything the IPS does not know about in advance.
How to rescue the IPS investment
Many organizations that have invested heavily in IPSs are now wondering how they could better leverage this investment. Is there a way to effectively use IPSs even when faced with true zero day anomalies? Glad you asked, because as you can imagine, I think we have an answer...
As we have seen, IPSs are very good when they know what they are looking for. If they have accurate signature databases, then they will be able to find whatever matches those signatures. If it is not in there, however, then the IPS is effectively blind. So, the key obviously is to get the signatures for any zero day anomaly into the IPS as quickly as possible.
This cannot be achieved if we have to rely on some remote data center, in which humans work at human speed on a selective set of anomalies that they have been made aware of. We need something that can look at whatever anomaly or issue your network is facing right now. The anomaly must not only be detected, but also must be analyzed to the point where an accurate signature for this anomaly can be provided. Automatically, and rapidly.
The solution you are looking for is a packet-based network anomaly detection solution. In Esphion's netDeFlect, we do not only have the ability to detect anomalies such as DDoS attacks, worm outbreaks, misbehaving applications and other network impacting events in seconds, thanks to our specially trained neural networks. In addition, we also have zero day signature extraction modules. These analyze the anomalous traffic right there, in your network, and extract the characterizing signature for the anomaly. All of this takes place fully automatically, and within just seconds.
The network operator can then choose to have this signature translated into a variety of formats, router ACLs, IDS signatures but also IPS signatures. These are ready to be applied to the IPS, which then becomes an instantaneous mitigation device, even for zero day anomalies, that are entirely local to your network. For additional information about how this works, and how you can get a powerful, self-defending network security architecture out of this, please see my article here.
So, what do we get out of all of this? I think the message I would like to get across is this: IPSs are good devices, but even they need to be part of a multi-layer security architecture. This is best practice, and has held true in the past, and continues to hold true, even in the age of powerful, multi-function devices, such as IPSs. IPSs are good at what they are doing: Finding what they know about. But for zero day anomalies, an IPS can only be effective if it is supported by one of those additional layers in the security architecture: An intelligent anomaly detection system, which has the ability to extract fine-grained signatures even for zero day anomalies.
The capabilities of such an anomaly detection solution and of IPSs are the perfect match: Fine-grained signatures are derived out of the anomaly detection and analysis. And only IPSs have the fine-grained filtering capabilities to really take advantage of those signatures.
So, if you have already invested in IPSs, but would like to really leverage and protect this investment, by using those IPSs even during zero day anomalies, do consider the deployment of a network anomaly detection solution. That is how to get the most out of your IPS.
We like to describe netDeFlect, our anomaly detection solution, as proactive. However, you might ask, how can detection be proactive? After all, detection by definition takes place as or after something happens. Being proactive, however, implies that something is done before an event takes place. The dictionary definition of proactive (according to dictionary.com) is:
pro·ac·tive or pro-ac·tive
Acting in advance to deal with an expected difficulty; anticipatory...
Clearly, therefore, detection cannot be proactive.
In our case, though, the word proactive does not refer to the actual detection of an anomaly. As good as it is, even netDeFlect cannot detect an anomaly before it happens, of course. Our research department always works on great new technologies to be added into our products, but the ability to see into the future has not been implemented, yet.
In the context of netDeFlect, the word proactive means several things:
The detection of something can never be proactive. But as we can see, the way netDeFlect operates certainly is proactive, and the new capabilities that it provides to network operators, are so as well.
You can read plenty here about our views - as with all approaches there are different techniques being applied. For instance, we stand apart from the competition by taking a packet-based approach to detecting changes in network behavior - using neural networking. That way we are not dependent on aging signature databases and can generate fine-grained signatures in seconds.
We're thrilled to have been named one of Red Herring's Top 100 Private Companies of Asia. The list is Red Herring’s 2005 selection of the 100 most promising private technology companies in Asia. Cool. Here is what Greg (our CEO) had to say in his press release:
“The last twelve months have seen Esphion achieve some major milestones. We brought our high speed neural network based appliances to market and have quickly developed a notable client base. We have expanded development facilities in New Zealand, established sales operations throughout Asia Pacific, Europe and most recently, in the United States. The Red Herring award involved a rigorous appraisal of our business and as such, it is extremely gratifying to receive recognition that we are on the right path.”
In my recent article about the shrinking patch-window, we already talked about the need for zero-day anomaly detection in networks. There is less and less time available for patching any new vulnerabilities, since it takes malware authors now only around two days to release a new worm, after an exploit has been published. Clearly, this trend is only going to get worse, thereby making signature-based systems even more vulnerable.
Yesterday then, the Sydney Morning Herald published an article, in which it is also claimed that zero-day threats are becoming more common. Even more interesting is the article's brief discussion about the increasingly common practice of software vendors to offer bounties for newly discovered vulnerabilities. The idea is that if someone finds a vulnerability in a piece of software they are more likely to contact the vendor if there is a chance to get some money for it, instead of just posting it on the Internet. That way, the vulnerability may be fixed before anyone else, including worm authors, will find out about it.
However, the article points out that if the offered bounty is not satisfactory to the discoverer of the vulnerability, they may just find someone else who is willing to pay more. The implication is, of course, that we may witness the beginning of a large-scale black-market for newly discovered vulnerabilities.
We all know that by now much of the malware that we are inundated with is written for financial gain: Spam zombies, DDoS zombies for extortion, botnets for rent, phishing, information gathering, etc. Obviously, there is money to be made. Therefore, it is not at all far-fetched to think that the same individuals who make money by renting out or using botnets would be willing to pay for a newly discovered vulnerability. After all, such a vulnerability would allow them to gather even more zombies through the release of a new worm.
Therefore, we may be faced with the fact that now a new line of individuals can benefit from illegitimate activities on the Internet: Computer experts who find vulnerabilities for a living, and sell them off to the highest bidder. It seems to me that the business of using, writing and enabling malware draws wider and wider circles.
Obviously, for the operators of mission critical networks, zero-day anomaly detection is now needed more than ever, because things only seem to take turns for the worse.
In my previous blog entry, I talked about the new liabilities faced by organizations whose data security may have been compromised by worms. To summarize: Having a worm outbreak on the internal network in certain industries may violate various regulations, and thus cause legal consequences for affected organizations. I argued that anomaly detection systems, which can alert operators to the presence of a worm in the network and aid in the rapid mitigation of the outbreak, have become part of best-practices and thus should be mandatory for all organizations.
Today then, as if to make the point, Red Herring published this article, in which they talk about the arrival of the business worm. It describes how hard big financial organizations have been hit by the Zotob worm, and how smaller, more targeted worms can be used to extract business information. It also elaborates on the fact that the Zotob worm was mostly confined to corporate environments, where an explosive outbreak occurred behind the heavily defended perimeter of the network. This, of course, supports the point I made repeatedly in this blog: You need anomaly detection within your network to be alerted to a worm outbreak. Defending the perimeter is close to useless in preventing these events, as Zotob has shown.
The article concludes with these fitting remarks:
With the time between the discovery of a vulnerability and a virus outbreak shortened significantly, enterprise users will have to institute new protocols to deal with worms and viruses of the future...
In short, they will have to become more proactive ... To do this, they will need to take care of security problems before they reach users ...
In an opinionated, but insightful article on The Inquirer web-site, an author mentions the case of a large financial organization, whose network was compromised by the recent Zotob worm. The article then goes on to discuss the liabilities that financial and health organizations may face when it is discovered that they had such a security breach.
Worm outbreaks as security events
Firstly, it is important to recognize that a worm infection in any corporate network is indeed a security breach: You cannot guarantee which backdoors or trojans are or were running on an infected machine. Thus, any data on that machine may have been accessed by some unauthorized outside party.
This means then that organizations in certain industries, in which regulations apply such as the Sarbanes-Oxley act, may face severe penalties for any such security incident. At the same time, these organizations are burdened with an IT infrastructure, which is prone to be buggy and with exploitable vulnerabilities popping up with either no or only very short advance notice. As we know, the patch-window is shrinking, which gives organizations less and less time to react, and may even be entirely absent in the case of a true zero-day exploit.
Claiming best-practices as defense
One of the key defenses that an organization has against any legal charges is to claim that whichever procedures and systems they had in place are commonly accepted as best-practice in the industry. For example, for the longest time, having firewalls was best-practice, and not much more was agreed upon for network security. These days, we know that firewalls alone are not sufficient, and defense in-depth is needed. For any large corporation, best-practice includes also timely patching and internal firewalls, just to name a few items.
All of these defense mechanisms, however, have a serious drawback: They are based on prior-knowledge. On rules and signatures, which have to be configured or loaded and kept up-to-date. Defending against the unknown, such as a zero-day worm, or a worm which takes advantage of a very recent exploit, is not readily possible with those traditional means of security.
Organizations need to improve their security architectures
I think it is only a matter of time before the various regulatory agencies realize that many organizations in critical industries and sectors are just not doing enough. For example, behavioral based anomaly detection has been around for a while. With truly intelligent solutions, such as those based on neural networks, these systems can rapidly detect outbreaks even of zero-day worms. With the right solution, mitigation signatures can be extracted in real-time, without having to rely on any prior knowledge. With that in place, networks can protect themselves rapidly. See here for more detail on how this can be done.
Anomaly detection is already part of the best-practices
Therefore, anomaly detection is a readily available tool, which is ready for main-stream deployment. The regulatory agencies will soon realize that there is no reason not to have those solutions as part of a best-practices security architecture.
So when will organizations learn that they need to significantly improve their security approach? The article says:
Here is when they'll learn, when someone notices that getting infected violates a whole bunch of laws, and that brings down the legal hammers on them.
It is not necessary to let it come that far. Organizations today already have cost effective and very powerful solutions available in the form of intelligent, neural-network powered behavioral anomaly detection systems. Deploying those in your network will lend a lot of credibility to any claim of following best-practices.
InformationWeek today is running an article, in which they discuss how the currently active Zotob worm illustrates one key emerging fact about computer and network security: The so-called patch-window is rapidly disappearing.
The patch-window, of course, is the time between announcement of a vulnerability and the arrival of the first malware which tries to exploit that vulnerability. In that time period a vulnerable machine was therefore not likely to be exploited, and thus, this was the time period that the network or system administrators had to apply the necessary security patches.
Often, the authors of this malware actually derive the exploit code from reverse-engineering the patches, which are issued by the vendors. In the past, the patch-window was measured in weeks or even months. These days, as was the case with the Zotob worm, it just took a few days. Apparently, the authors of the malware are becoming more adept and efficient in constructing working exploit code and releasing it into the wild.
One could claim that even a patch window of a few days should be sufficient, since modern operating systems tend to provide convenient and easy-to-use patch mechanisms. However, that is simply not the case: Everyone trying to upgrade all the computers in a sufficiently large enterprise network will be able to attest to that.
And if the ever shrinking patch-window is not enough of a concern, there is always the looming threat of a true zero-day worm, a worm that takes advantage of a previously completely unknown exploit, or an exploit for which no patch at all is available. This Witty worm was a great example of how real this threat is.
Johannes Ullrich, chief research officer at the SANS Internet Storm Center, is quoted in the InformationWeek article as saying: "Defense in depth is your only chance to survive the early release of malware." Defense in depths means that one does not simply rely on perimeter defenses, such as firewalls, but instead has security systems in place throughout the network.
It is our position that ideally, since signatures of an exploit are not available for zero-day threats, these security systems should not rely on signatures, or on any prior knowledge. Instead, behavioral anomaly detection can be used to great effect in those situations. For more details, may I also point out what I wrote here about where to deploy anomaly detection. Also see here about how those systems allow for the construction of a self-healing and adaptive network infrastructure, which can deal with those issues, no matter what happens to the patch window.
Today I want to talk a little bit about some of the underlying technologies we use in netDeFlect, Esphion's network anomaly detection solution. There are some core technologies, such as neural networks, which enable us to detect network anomalies by means of a highly-trained, specialized artificial intelligence, which is present in each netDeFlect installation. Other core technologies include components for high-speed packet processing, sophisticated data structures for fast lookups of information and advanced visualization and reporting capabilities.
To provide a complete, powerful and flexible solution, a lot of different technologies had to be combined in an innovative way. One of the things I find very attractive, but which normally takes place entirely hidden 'behind the curtains' is our use of load-balancing for the CPU intensive task of analyzing network packets and the extraction of zero-day signatures.
Before joining Esphion, I worked in a company, which specialized in server load-balancing solutions. I was an early employee there, and developed much of their core technology. Some of the largest web-sites in the world used our load-balancing systems to keep their business running, and manage massive amounts of hits on their servers. Therefore, I am excited that at Esphion we were able to put load-balancing again to good use.
Extraction of fine-grained signatures
After netDeFlect's neural networks detect an anomaly, the system then aims to provide signatures of this anomaly to the network operators. This has to happen fully automatic, and within seconds of the onset of the anomaly. Having those signatures then allows the creation of filter instructions for various network devices. I talked here about how our solution, in combination with already existing network infrastructure, can result in a network security architecture, which is surprisingly resilient even against zero-day attacks.
It is important to note that we are talking about true zero-day signatures, which in no way rely on prior knowledge or a signature database. Using several sophisticated algorithms, we can rapidly extract those signatures out of life packet samples, thereby enabling network operators to instantly get a handle on such anomalies, and to instantly proceed with the mitigation of the anomaly.
The signatures we provide need to be fine-grained to the extent that it must be possible to filter out the bad traffic with only minimal or no impact on the legitimate traffic (see here for more information on this requirement). Since anomalous traffic may at first look exactly like ordinary traffic, this is not a simple task: The algorithms to perform this analysis are quite CPU intensive.
Balancing the signature extraction work-load
When our solution is installed in a customer's network, there is usually a centralized controller, as well as a number of agents (sensors, in effect), which are distributed across the network.
After an anomaly has been detected, and after all the necessary data and packet samples have been correlated, it would be easiest to simply run the signature-extraction algorithms on the central controller. However, such a design would not be scalable.
Esphion always has had a focus on innovation, which should be reflected in the practicality and usefulness, but also elegance of our solutions. Therefore, in order to accommodate the often CPU intensive extraction of anomaly signatures, our engineering team has implemented an underlying load-balancing mechanism. So, when there are traffic samples that need to be analyzed for signatures, the controller will properly distribute the work among the installed agents in the network. Thus, the overall work-load is split among the agents, who all collaborate under the supervision of the controller to arrive at the required fine-grained signatures.
As a result, our solution has a built-in scalability, since it grows with the network in which it is installed. I think we have a powerful, elegant and intelligent architecture, which benefits the network operators, because they always get zero-day signatures within just seconds of the onset of an anomaly, no matter how big their network is. This scalability allows us to analyze anomalies faster and in much greater detail, thereby discovering information about anomalies that otherwise would remain hidden.
Recently, as outlined in this CNET article, several research teams have published papers in which they explain that intelligent worms can avoid being detected by the large worm-detection and early warning networks. Those networks are run by various security organizations, such as SANS, but also by some commercial security companies.
The details are explained in that article. In essence, those networks use honeypots and the monitoring of activity on unused IP addresses to detect worm activity or capture worm samples. Both approaches can be detected with different techniques.
So, as a consequence, it should be possible to write a worm, which escapes detection of those networks long enough, to make sure that no signatures for that worm can be produced and published before it has already reached critical mass. This is yet another example of why signature-based systems are leaving an organization vulnerable for too long, in the face of potential zero day threats.
Worse yet: A fast spreading worm, such as SQL/Slammer, for example, does not even need to bother with avoiding the worm detection networks, since it will have reached all possible targets faster than any signature can be published anyway.
This just goes to show that for proper worm protection, one cannot rely on third parties to publish signatures in time. If you want to keep your network safe, you need to deploy your own worm protection, right into the middle of it. An intelligent system, such as Esphion's netDeFlect, will be able to detect a worm outbreak immediately, and provide signatures and filter instructions, even for zero day worms, that can be used to stop the worm before it gains momentum in your network.
The Worm Blog recently pointed to an interesting paper, in which the detection and incidence handling of an SQL/Slammer outbreak in an enterprise is discussed. Remarkably, the enterprise actually had a reasonably secure setup. They knew about the new worm, thought they had a handle on it, and still were compromised.
This real-life incidence report represents a great example of why prior assumptions can lead to massive security failure. Note that all signature based systems rely on prior assumptions (the signatures). Also note that many security architectures are built on prior assumptions. When you read the paper, you can see that the assumptions made by the security staff are actually quite reasonable. Yet, these assumptions eventually led to failure, which just goes to show that additional intelligence in the network is needed.
The paper contains a lot of background information about the Slammer worm, as well as the network setup of that enterprise. If you don't want to read the whole thing, I would like to point you to the chapter 'Identification' which starts on page 19 and ends on page 21. If you just read those three pages, you can take away some very important points:
All in all, it took the business some four hours to conclude with certainty that they had been compromised and were fighting a worm outbreak. Four hours, in which servers were slow to respond, had to be rebooted frequently, and during which their tech-support had to field numerous calls by customers, complaining that the services were slow.
Contrast this with the situation as it would have presented itself with an intelligent anomaly detection system in place:
Only seconds have elapsed from outbreak to detection, analysis and start of containment. Is this fast enough? It is, as I had discussed here.
So, in conclusion, the paper presents an excellent case why any organization, which really wants to protect their uptime and assets, should consider the deployment of an intelligent anomaly detection solution, right in the middle of their network.