A few weeks ago, I wrote about the need and advantages of being able to generate fine-grained signatures of network anomalies. In that article, I also mentioned that those anomaly detection solutions, which are based on flows (for example Cisco's Netflow) are quite powerless when it comes to the generation of signatures, since they do not get any insight into the actual packet payload or even headers. I concluded that for effective signature generation, one needs to have a packet-based solution, which can see all the data all the time, not just some meta-summary of the network traffic.
However, there are additional significant disadvantages of a flow-based approach to anomaly detection, which I would like to elaborate on today.
Firstly, where do those flows come from? They are generated by third-party network devices, such as routers or switches. This has several interesting consequences:
- The routers and switches need to perform CPU intensive calculations for every packet they see, in order to properly accumulate the flow-information.
- Routers and switches need to set aside a significant amount of RAM, in order to store the flow table.
- Routers and switches may suffer heavily under certain attacks. What does that mean for flow generation?
- Exported flows are send over the network to some collector. How much traffic is that?
You observe, you change
In quantum mechanics, there is something called the uncertainty principle, which (in drastically simplifying words) states that you cannot observe something without changing it. The mere presence of an observer or test equipment changes what we observe.
Back in the world of network anomaly detection, we know that it is a non-trivial task for a switch or router to generate flow records. By consuming significant CPU cycles and memory, enabling flow-generation changes the behavior of the switch or router, usually by making it less responsive and pushing it somewhat closer to the edge of its capabilities.
For an anomaly detection solution, which is based on flows, the flow-generation needs to be enabled on the routers or switches. Just like stated in the uncertainty principle, our attempt to observe the network has itself a significant impact on how the network performs and behaves. The mere fact that we are observing is changing what we hope to see.
Not overloaded, yet?
The third point made above states that many anomalies or attacks, such as DDoS or worm outbreaks, can have a significant impact on network infrastructure elements, such as routers or switches. By having the resource intensive flow-generation enabled, those elements are even further taxed, thereby making them more likely to break under times of increased stress.
Of course, if the source of the flow-records disappears into the tar pit of CPU overload or the sunset of frequent reboot cycles, where does a flow-based anomaly detection solution get the flow-records from, that it needs to have any kind of visibility? As it turns out, flow-generation taxes the most heavily used resource on your network even further. To add insult to injury, its eventual failure then leaves the anomaly detection solution entirely blinded.
In general it is not a good idea to rely on the availability of network infrastructure, if one wishes to monitor possible attacks on that very infrastructure.
Increasing the severity of the anomaly
The fourth point made earlier refers to the mechanism by which a flow-based anomaly detection solution gets the flow-records: Over the network. This is usually not the high-capacity 'public' network, but instead an internal management network. Unfortunately, this network, or the management interfaces of the router or switch, may be maxed out by the additional load.
Consider that a typical flow record is a few dozen bytes in length (depending on Netflow version). Consider further that a DDoS attack may generate packets that are just 20 to 40 bytes in size, and that they can be crafted to each represent a new flow. Thus, the amount of flow record data is almost as big as the attack traffic itself. And all of this gets unloaded over the management interface of the router. Clearly, this is not practical, since here again, a resource is taxed more heavily when we are already under attack.
Sampling as a solution? Not really...
In short: With flow-generation enabled the effect of a network anomaly on the network infrastructure is multiplied, making the job easier for an attacker.
Some vendors of flow-generating devices offer sampling to eliviate this problem. In the case of sampling, only every nth packet is considered for the generation of flows. Clearly, this reduces CPU load and the number of flows that are exported. However, this also dramatically reduces accuracy. For example, if only every 100th packet is sampled, then flows that are a few dozen packets in length are likely to appear as only one packet. This makes them indistinguishable from flows that really consist of only a single packet. Average flow-length, however, is an important indicator in the state of the network. With sampling this data is lost. Besides, entire conversations between hosts may slip through the cracks, if sampling is enabled.
Clearly, sampling is only a stop-gap measure, which does not truly address or even fix the fundamental weakness of flows: A tradefoff between accuracy on one hand and massive resource impact on the network infrastructure on the other hand.
Packet-based anomaly detection
How does packet-based anomaly detection address those problems? For the most part, the problems simply do not occur at all.
For starters, a packet-based anomaly detection solution, such as Esphion's netDeFlect, can happily feed raw packets off a fiber tap. While the uncertainty principle still applies, of course, it does so on a much lower level: The signal in the fiber is slightly changed, but only to a degree that the interface hardware can easily deal with and compensate for. There is literally zero impact on the router or switch itself. No additional CPU cycles, no additional memory. In fact, the router or switch is entirely unaware that the traffic is listened to via a fiber tap.
A popular deployment alternative for a packet-based solution is to utilize a mirror or spanning port on the router or switch. This does have some impact on the device. However, mirroring or spanning is performed on the data plane (ASICs), rather than the control plane (CPU) of the device. Therefore, the impact is typically negligable, compared to the generation of flow-records, and does normally not involve the CPU of the router or switch at all.
Finally, no data needs to be exported across the control network from the infrastructure device to the anomaly detection solution. This means that the magnitude of the anomaly or attack is not inadvertently multiplied by the export of an excessive number of flow records.
Conslusion
A packet-based solution continues to see all packets, no matter how overloaded the network infrastructure devices are. As long as there are packets on the wire, they can be seen, counted, analyzed and reported on. Therefore, the accuracy of the packet-based approach is not only much higher, but is also accomplished with a greatly reduced overhead and impact on the network.
Juergen
Comments