<img height="1" width="1" style="display:none" src="https://www.facebook.com/tr?id=1078844192138377&amp;ev=PageView&amp;noscript=1">

Blog

VMware 101: Virtual Network Design

One of the most common questions we receive regarding VMware vSphere environments (and virtualization environments in general) is how to set up virtual networking.

While a misconfigured environment can still work properly, issues still arise with scalability, security, and ease of configuration. (This applies to almost all hypervisor solutions, but this post focuses on VMware vSphere environments.) Here's my two cents on designing a virtual network using VMware vSphere.

The core choice for how to configure your environment revolves around virtual LAN (VLAN) tagging.

As a quick refresher, VLANs can take a single switch or group of switches and divide them into subgroups that are in completely separate broadcast domains. Each port on a switch can be configured as either an access port with a single VLAN ID (where the device connected to it doesn’t require any configuration), or a trunk port with one or multiple VLAN IDs (relying on the connected device to choose what VLAN ID a packet will traverse over).

There are three possible methods of configuring vSphere networking:

Each method moves the stage at which a packet has its VLAN tagged or untagged.

External Switch Tagging (EST)

With EST, the vmnics (physical NICs in the ESXi host) are configured as access ports on the switch. As packets enter the switch from the host NIC, they are tagged with the appropriate VLAN and sent to their appropriate destination for that VLAN (a gateway, for example). As traffic flows towards the host, packets are untagged by the switch and presented as standard untagged packets for the host.

The idea behind this setup is that each NIC (or pair of NICs, for redundancy) is assigned its own specific VLAN. EST is the simplest to configure because it only uses access ports on the physical switch and no configuration is needed on the hosts (besides matching vmnics numbers to physical ports on the host and switch). The primary problem with EST, however, is its scalability. The more VLANs you require, the more physical NICs you need on each host. The main drawback to this approach is scalability. Switchports are expensive, and you can only add so many NICs into a given host.

Virtual Switch Tagging (VST)

VST solves this limitation by moving the tagging and untagging of traffic down to the ESXi host’s vSwitch. On the physical switch, ports are configured as trunk ports containing all VLANs that need to be accessed from any VM on the host or cluster. On the vSwitch, port groups are created for every VLAN that is required. For example, if there were VMs needing access on VLANs 5, 8, and 22, then three port groups will be created on the same vSwitch. Each one would specify a specific VLAN. In this setup, all tagged traffic destined for the ESXi host will remain tagged by the physical switch. This tagged traffic will then be untagged by the ESXi host as they proceed down to the VM itself. By having the vSwitch handle the untagging of traffic, individual VMs do not need to be aware of what VLAN is it on, and therefore the VMs just work without any additional configuration out of the box.

VST requires more configuration on the physical switch and on the ESXi host to be sure that all vmnic ports on the same vSwitch are required with the exact same configuration. If these configurations aren’t inline, intermittent packet loss could occur. This packet loss is the result of traffic sometimes leaving one (properly configured) vmnic or a different (improperly configured) vmnic, depending on how the switch is set up. When properly configured, however, VST can make an environment incredibly flexible. VST also gives admins a clear picture of how traffic is flowing between the physical and virtual network. Unlike EST, adding port groups (and thus, VLANs) is incredibly easy and scalable. Distributed switches can support 10,000 port groups, which is well over the capacity that the majority of environments will ever use.

VST topologies are by far the most common when deploying most VMware environments.  The reason for this is because they strike a good balance between scalability (no reliance on physical switchports and NICs to expand the number of VLANs in a network) and ease of use (no need to configure every VM to specify what VLAN it is a part of).

Virtual Guest Tagging (VGT)

VGT sends tagged traffic all the way down to the guest VM. In this instance, the physical switch is configured as a trunk port, similar to VST, and the vSwitch port groups are configured for the special VLAN ID 4095. In order for traffic to pass end-to-end, the guest OS needs to be able to support 802.1q VLAN tags and should be configured for the specific VLAN that the guest needs to communicate on.

VGT is best when used in Lab environments, where perhaps a server administrator needs to be able to change VLANs on a guest without the hassle of changing it in vSphere. VGT can also be used if there is not a dedicated or knowledgeable VMware administrator on staff, and moving between VLANs is easier in-guest. Still, VGT can create an issue when it comes to vision into the network. It can become difficult to determine which VMs are associated with which VLANs, and even poses security concerns, as a VM can be set to a different VLAN without the virtualization administrator knowing about it.

ServerCentral's Approach

At ServerCentral, we standardize on VST deployments on all internal cloud, Enterprise Cloud, and Private Cloud deployments. This gives us the flexibility and scalability we need to provide customers with a vast array of options for their deployments. Choosing VST over EST not only increases the speed in which we can deploy new VLANs, it keeps switchport costs down, allowing us to pass those savings on to customers. While VGT would give us similar speed and cost savings, it creates too much in-guest customization, has the potential to be not as secure, and makes it very hard to perform root cause analyses should they be needed.

Topics: Networking Virtualization

How VMware Virtual SAN 6.1 Can Support Your Remote Applications And PoPs

With the ecommerce industry growing each year, international business is no longer an enterprise-only sport. With small and midsize companies entering the global footprint game, their IT infrastructure needs to follow suit as they seek to engage and keep customers around the world.

How Small and Midsize Companies Can Expand Globally

The issue many companies face, however, is in providing a redundant, reliable solution to house servers in their secondary locations. These locations are often considerably smaller than a main Point of Presence (PoP) and bring with this unwanted latency. Add to this the need for redundancy at the storage layer, which often revolves around NAS or SAN devices, and you’re looking at a potentially large upfront cost.

With VMware’s latest release of its VSAN platform, businesses now have a solid foundation to support their production-level remote applications, without the large CapEx cost of multiple servers and a SAN backend. 

VSAN is a hyper-converged infrastructure platform that allows professionals to use storage inside the ESXi hypervisors as shared storage across the cluster. As a quick refresher, it works by presenting all storage inside of two or more hypervisors as a single datastore that all hypervisors can mount. VSAN also stores copies of all data in multiple locations, providing redundancy during a total hypervisor failure. In more complex setups, VSAN can also be used with multiple fault domains, which can support the failure of entire cabinets or even entire sites with no loss of data availability. In short, many of the benefits which traditionally have been in the realm of dedicated SANs are now available for a much lower cost (especially when deployed as a managed service).

While VSAN has always been able to scale up into large clusters as primary storage for central data centers, VSAN 6.1 offers a couple of new features that allow it to also scale down to support a remote branch office or a small/emerging market PoP.

What You Couldn’t Do Before Virtual SAN 6.1

2-Node VSAN
Perhaps the biggest addition making this functionality possible is the choice to deploy a 2-node VSAN cluster. With this new feature, VSAN can now scale down in parity with other important VMware technologies such as vMotion, HA, and DRS. In older versions of VSAN, 3-node clusters were the absolute minimum. This added size, complexity, and (most importantly) cost to a remote solution - which typically prevented VSAN’s use in these scenarios.

While a 2-node VSAN requires a third virtual appliance to act as a witness in another data center (which prevents the possibility of a split-brain scenario should networking be cut between the two hosts), this remote site would most likely be connected to an existing, larger vCenter environment. It’s important to note that this virtual appliance is free, unlike an extra, unneeded hypervisor.

SMP-FT
With vSphere 6.0, VMware overhauled the abilities of VMware fault tolerance. With vSphere 6.0, it became possible to deploy a fault-tolerant VM (which consisted of a VM running on two hypervisors simultaneously), resulting in zero downtime during a hardware failure on either hypervisor. With VSAN 6.1, VMware extended this feature onto hypervisors running with VSAN. Even with just two nodes, remote sites still find themselves overbuilt. Utilizing SMP-FT is an easy way to take advantage of these extra resources and increase uptime.

Windows Server Failover Clustering Support
WSFC has become a core tenant of any highly available Windows Server environment. With technologies such as Exchange, SQL Server, and DFS all utilizing aspects of failover clustering, many organizations found VSAN lacking in support for their primary applications. With VSAN 6.1, supporting a remote SQL cluster allows redundancy at the storage, hypervisor, service, and application level.

All-Flash Support
While technically a VSAN 6.0 feature, this bears mentioning when discussing remote PoP VSAN environments. Traditionally, one major issue with storage in remote locations is the lack of performance. To get high performance arrays, dozens of spinning disks would need to be deployed in a SAN and carefully maintained. This not only increases cost, but complexity and failure rate also. With ever-faster flash based disks arriving to the market at a blistering pace, VSAN can use all-flash arrays to get a very high level of performance out of a very small number of drives. For an even higher level of performance, VSAN 6.1 supports cutting-edge technologies such as ULLtra DIMM and NVMe, reducing or eliminating traditional SSD issues such as connectivity, controller, and bus bottlenecks to allow even lower latencies for critical applications.

The Verdict

With its support for a wide variety of applications, very high IOPS, low-latency performance, and a small, highly redundant environment that can grow as you need, VMware’s VSAN 6.1 platform is proving to be an excellent choice when requirements dictate an enterprise-grade solution without an enterprise-grade cost.

Topics: Networking Tips

Bringing Sanity to Routing Over IPsec

IPsec routing has a reputation for being unwieldy. This isn’t entirely undeserved. Among the two main ways IPsec tunnels are configured, policy-based IPsec configurations are especially bad at this. They completely eschew routing via a standard routing table, making packet flow harder to troubleshoot and adding excessive administrative overhead. Unfortunately, certain vendors don’t allow any other type. Even among vendors that support the other method of route-based tunnels, it’s not always smooth sailing.

Let’s take a look at the most common IPsec routing scheme, bare IPsec, using phase 2 selectors as routing rules. Phase 2 selectors are declarations by the devices on both sides of the tunnel indicating what traffic is allowed through. If they don’t match, the tunnel doesn’t come up.

For example, device A sits at the border of network 10.100.100.0/24 with its own internal IP address, 10.100.100.1. Device B’s internal side sits at 10.200.200.1/24. To keep things simple, the phase 2 selectors are set as 10.100.100.0/24 and 10.200.200.0/24. Simple enough. Sort of.

From a network engineer’s point of view, this is an uncomfortable proposition. Where exactly does 10.200.200.0/24 sit in relation to 10.100.100.0/24? Normally, there’d be some sort of link between them (usually a point-to-point link with IPs on both ends that a routing table could point to). Since this particular IPsec tunnel doesn’t contain any kind of logical link with IPs on the ends, the routing table simply has an entry for 10.200.200.0/24 as “over on the other side of the IPsec tunnel.”

This works, but isn’t very clean. Worse is when a policy-based IPsec configuration is active; in which case there's no routing table entry at all. The routing logic is instead derived from a policy that states “when a packet from 10.100.100.0/24 has a destination within 10.200.200.0/24, shove it through the encryption engine and down the IPsec tunnel, where it will be decrypted and forwarded on its way.” With a large number of routing policies active, this can quickly turn into a game of “find the policy that governed the packet flow," which is no one’s favorite.

Complicating matters is the fact that some vendors don’t even care what you put in the phase 2 selector, and instead rely on routing table entries and policies for routing logic. The phase 2 selector is simply a config line that's there to match with the other side, so there’s a good chance that it’s completely meaningless.

Finally, this is not a scalable configuration. What happens when we have multiple networks on each side? That’s right, we need a phase 2 selector for each possible combination. In our example with two networks, we need a phase 2 selector. Two networks on each side of the link bump that up to 8 selectors (4 selectors per side—each network needs a selector to each network). In the figure below, 3 non-contiguous networks on each side need 18 selectors—9 for each side! And this is if you’re lucky enough to have CIDR specified networks on each side. There are many times when the administrator in charge of the remote network just wants to allow individual IPs to traverse the tunnel. Can you imagine the number of selectors you’d need if they had a list of 10+ individual IPs?

 

IPsec diagram Exhibit A

It should be clear at this point that bare IPsec phase 2 selectors are a poor choice when it comes to routing logic. What we need instead is a standard “virtual” cross-connect, or something that acts like a point to point link. Not only that, but we need to separate routing logic from security and firewall logic instead of mashing them together in a horrible Frankenstein creation. Luckily, this kind of thing exists and is well supported across multiple vendors. I’m talking about Generic Routing Encapsulation, or GRE.

GRE is a relatively simple protocol that operates at a layer 3 of the OSI model. Its underlying transport is layer 3 (IP) and it accepts packets (IP, IPv6, multicast), and it presents itself as a layer 1 physical link with layer 2 (Ethernet) framing. What this means is that any protocol that can run on Ethernet should work fine over GRE, including useful routing protocols such as OSPF and BGP. And because it operates like a regular link, its behavior is predictable (unlike a bare IPsec tunnel, which sits in a gray area between a logical link and a policy route). GRE tunnels are completely stateless, making them very useful in situations where the underlying transport may or may not be available at any one time.

How does this help with IPsec?

By configuring a GRE tunnel between the endpoints of the IPsec tunnel, we can ignore all the odd constraints of routing over IPsec and simply deal with a bog standard GRE link. In addition, we can treat the GRE link as a standard security plane and clearly define policies that allow traffic in and out, as opposed to a security policy that’s also handling routing, leading to an extremely confusing config. Our previous example with the 18 phase-2 selectors configured to handle full mesh network routing can be replaced with two selectors over a GRE link—one on each side. Routing is handled via traditional static or dynamic routing tools. If we ever need to add or remove networks from each side of the link, we simply add or remove routing entries. If you’re using dynamic routing, you don’t even have to do that, as the routing protocols do it for you. Compared to the IPsec method of maintaining matching, phase-2 selectors where you have to add/remove an exponentially increasing number of selectors as networks change, this is a huge win in terms of time savings and reducing configuration complexity.

Let’s take a look at an actual example.

IPsec diagram GRE over IPsec

Here we have two IPsec endpoints that can reach each other over the internet using their public IPs, 1.1.1.1 and 2.2.2.2. If you recognize these example IPs, it’s because we use a lot of Juniper equipment here at ServerCentral. Behind 1.1.1.1 (we’ll call this the local network) are 3 subnets: 10.100.100.10.0/24, 10.100.100.0/24, and 10.100.111.0/24. Behind 2.2.2.2 (we’ll call this the remote network) are 3 more subnets: 10.200.20.0/24, 10.200.200.0/24, and 10.200.222.0/24. We’d like for hosts on any of the local networks to reach any of the hosts in any of the remote networks and vice versa. Rather than take the arduous position of routing via 18 phase 2 selectors, we’ll use one set of phase 2 selectors plus a GRE tunnel to make future expansion trivial to scale up or down.

The first thing we’ll do is create a bare IPsec tunnel for the GRE tunnel to sit on top of. This is the only thing we’ll need a phase 2 selector for, and the IPs we assign to the endpoints are an unroutable island unto itself. These are only needed to provide the external endpoints for our GRE tunnel. In this example, we’ll use 10.0.0.2/31 for the local network, and 10.0.0.3/31 will be utilized on the remote network.

Once the IPsec tunnel is up, 10.0.0.2 and 10.0.0.3 should be able to ping each other, confirming that we have a secure link between 1.1.1.1 and 2.2.2.2.

Now we set up the GRE tunnel. A GRE tunnel is configured with a minimum of 4 parameters. The first two parameters are the source and destination IPs of the tunnel. In this case, they are the IPsec endpoints of 10.0.0.2 and 10.0.0.3. The third parameter is the actual IP of GRE tunnel itself on the local side. In our example, we use 10.50.50.2/31. The final parameter is the far side IP of the GRE tunnel on the remote side. Here we use 10.50.50.3/31.

Here we again perform a test, this time for GRE connectivity. 10.50.50.2 should be able to ping 10.50.50.3 and vice versa.

Finally, we configure routing. If we’re using a dynamic protocol like OSPF, we simply configure it to advertise and accept routes from the far side of the tunnel. If you’re more traditional and prefer static routes, you would add routes to all the remote subnets via 10.50.50.3, and on the remote side, add all routes to the local networks via 10.50.50.2. Provided your security policies are set up correctly to allow traffic to flow properly between all those networks, you’re done.

You have a full mesh network without the complexity of a multitude of phase-2 IPsec selectors. As you add or remove networks, you just add or remove static routes as necessary (and, as mentioned before, this can be made even easier by fully automating routing with a dynamic routing protocol such as OSPF).

Let’s take a step back and look at why this works with a packet flow example:

  • Host 10.100.100.100 would like to send an HTTP syn packet to 10.200.200.200, so it sends it off to its router.
  • The router knows 10.200.200.200 is available via an IP at 10.50.50.3 (reachable via 10.50.50.2), so it forwards it on to that IP. This packet will traverse the GRE tunnel, colored black in the example.
  • The IPsec device sees a packet come in to the tunnel with a destination of 10.50.50.3 and encapsulates that packet within another external packet. This external packet is an ESP packet via the IPsec tunnel itself, between 10.0.0.2 to 10.0.0.3. This is the actual IPsec encapsulation, shown as red in the example.
  • The ESP packet has yet another encapsulation process, this time with the public IP of the local IPsec devices – 1.1.1.1. Its destination is the remote IPsec device – 2.2.2.2. The packet is encrypted and hashed for authentication at this point. In the example, this is the green colored section.
  • The ESP packet traverses the internet, fully encrypted with an authentication header and arrives at 2.2.2.2.
  • 2.2.2.2 now begins the reverse process of all the previous encapsulations. First, it authenticates and decrypts the ESP packet, thus stripping off the outer-most layer of encapsulation. It sees that this is a packet destined for the IPsec endpoint of 10.0.0.3, sourced from 10.0.0.2. The packet is forwarded onto 10.0.0.3.
  • 0.0.3 sees the packet is a GRE packet and strips the GRE encapsulation off. It now sees the inner packet was sourced from 10.100.100.100 with a destination of 10.200.200.200.
  • Consulting the routing table, it knows where that host is, and forwards the now-original packet onto the destination host.

And there you have it.

GRE is an efficient protocol and doesn’t use a lot of processing overhead as it encapsulates and de-encapsulates packets. On the other hand, the IPsec ESP encapsulation can be processor intensive, especially with a large amount of packets per second, which is why choosing the right encryption and authentication schemes is vital in maintaining a responsive network and high throughput.

On a final note, there are some vendors out there like Juniper who have come to their senses on IPsec routing and have implemented a way to get this done even cleaner, bypassing the whole GRE tunnel layered over IPsec. The configuration is similar, though a beyond the scope of this particular article. If you’re interested in discussing it, contact me.

It should be fairly clear why using standard routing tools is a far more scalable and efficient way to configure IPsec tunnels. Unfortunately, configuring tunnels this way is less common than brute forcing the matter with a slew of phase 2 selectors. To be honest, I’m not sure why this is, as any systems or network administrator worth his or her salt will always choose efficiency over chaos. I suspect much of it has to do with the way IPsec can sometimes come off as mysterious and arcane, and once it’s working, many wash their hands of it (until it breaks!). Fortunately, ServerCentral has wide experience with various vendor IPsec implementations over many years. While we have our favored IPsec vendor implementations (shout out to Juniper), we’ve worked with clients whose devices have ranged from DIY Linux endpoints to chassis-based security blades, and encourage IPsec tunnels to be deployed this way. With all that experience, we’re often asked why we implement our tunnels a certain way, and hopefully this series has shed some light on why we do it the way we do.

Read part 1, Optimizing IPsec Tunnels
Read part 2, IPsec Parameter Choice Rationales

Topics: Networking

IPsec Parameter Choice Rationales

On the previous episode of As The IPsec Tunnel Churns, we discussed how IPsec configurations running in tunnel mode are established. This week, let’s get into the nitty gritty of why those parameters were chosen. So, why did we choose those particular parameters?

The answer is simple: speed and security. Certain options increase security, and certain options increase speed. Note that something that decreases security doesn’t necessarily increase speed—these are two separate and independent metrics.

Hash Algorithm: HMAC-SHA1

Hashing algorithms are for verifying data integrity, not encryption. This is important for secure communications. You want to make sure the packet you sent is the packet that arrived.

Without hashing, a nefarious party could throw a bunch of garbage into your packets and there’d be no way of knowing—even if your data is encrypted. On the public internet, “nefarious parties” could be anyone that could theoretically view the data stream. These days that list could be endless; your transit providers, any state agencies with taps into transit networks, hackers who have secured access to routing and switching equipment, or even employees with or without ill intent.

A correctly implemented hash can negate this threat. A hash is simply a mathematical operation that runs on a dataset (in this case, a single IP packet) and generates a unique string. When it arrives at its destination, the hash is re-run. If the packet is changed in transit, the resulting hash will no longer match the computed value. Hashes are designed so that even tiny changes in a packet will radically alter the hash, and are extremely difficult (if not impossible) to reverse engineer.

The two common hashing options for IPsec are MD5 and SHA1.

IPsec tunnels use keyed-hash message authentication code (HMAC) versions of these algorithms. While vanilla MD5 has been proven broken, HMAC-MD5 is still considered secure. SHA1 is considered even more secure, at the cost of some computational overhead (i.e., it’s slower than MD5). Given phase 1 is focused more on security, we opt for the slower but more secure SHA1.

Note that you can set hashing to NONE. Never use this; it’s only included in the IPsec standard as a testing mechanism. This disables hashing (and at that point you may as well not even bother with an IPsec tunnel).

Newer hashing options include SHA variations with larger bits, such as SHA-256, SHA-384, and SHA-512. They’re also a viable option, though many vendors don’t yet support them.

Encryption Algorithm: AES256

There are a number of algorithms for encrypting traffic. Data Encryption Standard (DES) used to be the standard. While simple and fast, it was easily broken, and became obsolete in 1999. Triple DES (3DES) came in to replace it and is still in use today, but it’s terribly slow. Though these encryption algorithms can still can be used, they are highly discouraged. Instead, more modern algorithms should be used, particularly the Advanced Encryption Standard (AES) suite. Not only is AES far faster than 3DES, it’s also considered more secure. Another bonus: AES is hardware-accelerated on a wide variety of processors, making it even quicker while using less processing power.

Remember: utilizing less processing per packet allows for more packets to be encrypted, which translates into increased throughput.

AES comes in a number of key size variants, starting at AES128. AES128 is considered to be secure today and for the foreseeable future.

Note: AES256 is even more secure, but the additional security afforded by jumping to 256 bits is similar to comparing the thickness of two pieces of paper; it’s there, but for all practical purposes there’s no difference. Also, AES256 is logically slower than AES128.

Key lifetime: 86400 seconds

Key lifetime is an interesting parameter. For one, you can forgo timing altogether and set it as a byte count; once x amount of bytes has been processed through the tunnel, it will renegotiate. However most people use timing instead of byte counts. To be honest, when utilizing a decent hashing and encryption method, any key life under 24 hours is more than adequate, as no one will be able to brute force your key in that amount of time. You could even set it to as long as months (if your hardware supported it), but as we’re concerned with security and not performance in phase 1, 24 hours is acceptable.

Pre-shared key: k2;2.6TbYl+{/qa

Although certificates can be used and are more secure, no one ever wants to go through the hassle of setting up a key infrastructure dedicated to IPsec usage. Therefore, it’s important to pick a pre-shared key that’s relatively secure; this means no dictionary words, at least a few symbols, a few numbers, and a minimum of 15 characters.

The Diffie-Hellman key exchange may be secure, but it’s not going to matter if your pre-shared key is 12345.

Another note: don’t use the same pre-shared key across different phase 1 configurations. That’s just bad practice.

Pre-shared keys, no matter their length, have no effect on performance. It should be obvious that you should never share the PSK with anyone. When using certificates you can verify the remote side via a third party, but with PSKs, your only source of verification is the fact that no one else should know the key. If anyone else is aware of the PSK, you are vulnerable to a man-in-the-middle attack.

Mode: main

Phase 1 can operate in two modes: main or aggressive. In main mode, IKE negotiations occur in three sets of packet exchanges, with the last verification exchange occurring over an encrypted channel. Aggressive reduces the amount of cross-talk between endpoints by cramming all the parts of the phase 1 negotiation up until the final part of the DH key exchange into one packet. This will pass some normally encrypted parts in the clear; though these parts aren’t considered vital, we opt for main mode since we phase 1 is focused more on security rather than quick negotiations.

So there you have it – Alpha and Beta now have a phase 1 security association active and can move on to phase 2.

Phase 2

As mentioned before, phase 2 does the actual authentication and encryption of data across the tunnel. On modern hardware, certain options can shave off a few milliseconds per packet. While that may not sound like a lot, consider that a small to moderately used IPsec tunnel will be encrypting and decrypting 13,000 packets per second; tiny increases in hashing/encryption/decryption speed will have large effects on throughput. We therefore want the fastest IPsec algorithms we can implement without compromising security.

Hash Algorithm: HMAC-MD5

HMAC-MD5 is quite a bit speedier versus HMAC-SHA1 and still secure. NONE would be the fastest, but completely defeats the purpose of an IPsec tunnel.

Encryption Algorithm: AES128

AES is faster and more secure than DES, 3DES, and Blowfish (and its updated variant, Twofish). Neither is well supported in most vendor hardware. AES256 doesn’t add much in terms of security and comes with a speed penalty, so AES128 it is.

Key lifetime: 43200 seconds

This refers to the phase 2 security association key life, which is independent of the phase 1 security association key life. Given far more data is being encrypted using the phase 2 parameters, this can lead to a nefarious third party gathering a much larger data corpus of your encrypted packets for analysis. That said, MD5/AES128 is still impossible to brute force, so setting this depends on the data you’re securing and how secure you want it to be. If you don’t care about the transiting data after it arrives, you can set the key life to something quite long (though best practices state you should make it shorter than the phase 1 key life). If you care about someone pulling your packets offline and attempting to brute force them, you may way to set the key life to something quite short. They would still need a machine that is unfeasible to create now and in the future.

Keep in mind that every time the key life expires and phase 2 renegotiation starts, packets will be queued up and possibly dropped (if the renegotiation is slow) until it completes. Renegotiations are fairly quick (on the order of 1-2 seconds), but that's still something to be aware of. Therefore, for performance reasons and to keep potential interruptions to a minimum, we keep the key life relatively long.

Perfect Forward Secrecy: disabled

Perfect Forward Secrecy (PFS) is a phase 2 specific configuration option. What this does is ensure that no phase 2 security association keys are derived from the same phase 1 security associations. While more secure for the very paranoid, this will cause phase 1 renegotiations whenever phase 2 keys expire. Not only are you subjected to phase 2 SA renegotiation delays, but those are compounded as the phase 1 SAs are also renegotiated. In certain other encryption situations such as TLS/SSL, PFS is very important. For IPsec, the benefit is minimal unless you’re using poor pre-shared keys combined with bad encryption methods.

Results

On our Juniper SRX 650s with optimal IPsec configurations, we can saturate a 1-gigabit port. Switching to SHA1/3DES with extremely short key lives and Perfect Forward Secrecy enabled (group 14) drops throughput to half that, as the tunnels are constantly renegotiating with inefficient hashing and encryption algorithms. Torturing the processors by sending a large number of tiny packets kills throughput to a quarter of what we can do.

So, when creating IPsec tunnels, don’t just accept the defaults, many of which are woefully bad choices. Understand what the parameters are and make informed decisions to maximize your existing infrastructure’s performance.

Next up, the final part of our IPsec overview: why using IPsec configurations to handle your routing is a terrible idea, and the proper way to do it.

Read part 1, Optimizing IPsec Tunnels
Read part 3, Bringing Sanity to Routing Over IPsec

Topics: Networking

Optimizing IPsec Tunnels

Metro Ethernet and MPLS circuits are often the first things people think of when trying to connect disparate networks across long distances or public networks. Those transport technologies work great, but the venerable IPsec has a number of points in its favor:

  • It’s been around forever, so it’s very standardized.
  • It's pre-built into most routers and security devices, so it’s cost-effective.
  • It typically runs over the Internet, so there are no providers or vendors to deal with (other than the administrators controlling the remote end). If you control both ends, you don’t have to work with anyone at all.

Even though IPsec is well established and available across numerous devices, most people don’t optimize their setup to maximize their throughput, decrease CPU processing, or streamline network configurations.

This is most likely a result of IPsec’s notoriously finicky nature, as connections between sites won’t establish without what appear to be a complex set of hashing algorithms, encryption methods, and configurations that only apply to certain parts of an IPsec session. Let’s fix that.

IPsec can operate in two modes: tunnel and transport.

By far the most common is tunneling, which we’ll focus on. Transport, however, is a better choice that’s often overlooked.

IPsec Transport Mode

Transport operates similar to the way TLS/SSL works on the layer 7 level: any communication between two endpoints using a particular layer 7 protocol is authenticated (using hashes) and/or encrypted (using encryption algorithms). In the case of the well-known HTTPs protocol, this is standard web traffic.

IPsec in transport mode operates almost exactly the same way. It encrypts traffic between two endpoints with one important advantage: it encrypts ANY layer 7 protocol, and that protocol doesn’t have to know anything about encryption. This is because IPsec operates at layer 3, a much lower level in the networking stack. The benefit of this is two-fold:

  • The layer 7 protocol doesn’t have to know anything about authentication and encryption for it to be utilized.
  • It will automatically function for any protocol, whether it’s expected or not.

For example, no one in their right mind would operate a telnet daemon on a public interface in this day and age, but establish an IPsec transport link between two points, limit the telnet daemon to only respond to traffic that arrived via the secure IPsec transport, and that telnet traffic is fully secure. Everything is encrypted, from mundane pings to SIP traffic. And while you may think, “Well that’s nice. Encryption between two points is great but useless for connecting two networks together like tunnel mode does,” you have to realize that you can layer on all sorts of protocols you’d normally never run with public IPs. These include layering on protocols such as IP-in-IP, or Generic Routing Encapsulation (GRE) tunnels. GRE tunnels in particular are far easier to configure and scale out versus using IPsec in tunnel mode to handle routing.

This was actually to be the basis for something called opportunistic encryption, which was originally going to be required in IPv6; yes, everything on IPv6 was going to be automatically encrypted, whether the layer 7 protocol supported it or not. Due to certain technical hurdles, it never made it into the final IPv6 RFC. That said, transport mode is not often used because implementing it with certain vendors is either impossible and the documentation is buried or nonexistent; if it’s available, it’s usually only configurable via command line interfaces, while tunnel mode is exposed in clicky user interfaces. That’s unfortunate, as it has a number of very interesting uses. Today’s focus will on bog standard IPsec tunnel mode.

IPsec Tunnel Mode

Authenticated and encrypted tunnel mode is what most people think of when IPsec is discussed. In this mode, two endpoints sitting on the edge of two separate networks establish a secure link across a hostile network (usually the internet), and packets flow between the two networks via that link as if they were directly connected without the terrifying internet in-between. Tunnel mode accomplishes this by accepting IP packets from the internal network with the original headers, encrypting the entire packet, then encapsulating it within another packet. A hash of the packet is generated to insure authenticity when it arrives. That new packet contains the original inside of it, encrypted along with a hash, as the payload. This final packet is created using the Encapsulating Security Payload protocol (ESP), also known as IP protocol 50. ESP operates at layer 4 of the networking stack, and is therefore on the same level as protocols such as TCP and UDP. This is important to understand as packets between two endpoints don’t use TCP, UDP, ICMP, or anything like that; ESP is its own beast.

This packaged packet is forwarded to the other endpoint over the public and/or hostile network. After it’s received, the encapsulated packet within is extracted, decrypted, authenticated for validity using the hash, then forwarded on as determined by the original packet’s IP headers and the device’s routing policies. The sending and receiving machines have no idea that the packet traversed the hostile network.

Easy enough. But the devil’s in the details. So let’s break down how the link is formed and what we can do to streamline things.

IPsec Phases

IPsec links are established in two phases, appropriately referred to as phase 1 and phase 2. Once each phase is established and verified as authentic by both sides, they consider this an active security association (SA). Each phase has its own set of SAs that utilizes a key to both authenticate and encrypt packets. After a specified expiration limit (usually a time counter, such as 24 hours), the SAs are rotated out with new ones that contain new keys. Periodically, to keep things secure, the negotiated metadata will be rotated, and the phases start up again. Though these are technically new tunnels, the process is mostly (we’ll go over this later) transparent to end users.

Phase 2 does the heavy lifting. All traffic is authenticated and encrypted based on the policies established in this phase. Phase 1, on the other hand, is relatively quiet. Once it creates the secure channel for phase 2, not much data passes through the phase 1 policies. Because of this, it’s important for phase 2 to be focused on performance, as it’s constantly hashing/encrypting/decrypting, while phase 1 should be focused on security. Once phase 2 is up, phase 1 is only used for metadata transfer during subsequent re-negotiations, which are extremely sporadic.

Phase 1

Let’s start with phase 1, aka the Internet Key Exchange (IKE) phase. When endpoint Alpha decides to use a tunnel to send a packet to endpoint Beta, it looks at its own configuration and sees:

Phase 1 Configuration

  • Hash Algorithm: HMAC-SHA1
  • Encryption Algorithm: AES256
  • Diffie-Hellman Key Exchange Group: 2
  • Key life: 86400 seconds (can also be specified as number of bytes)
  • Pre-shared key: k2;2.6TbYl+{/qa
  • Endpoint Alpha IP address: 172.16.0.1
  • Endpoint Beta IP address: 172.31.255.1
  • Mode: main

Alpha wants to bring the tunnel up, so it will send an IKE protocol packet to Beta containing a set of acceptable hashing and encryption methods to use. In this case, we would make an offer to use SHA1/AES256.

The IKE protocol is just a layer 7 protocol, operating over UDP port 500. Beta, if configured to use those methods, will ack the proposal. Alpha will then send a nonce of the pre-shared key secret password utilizing the Diffie-Hellman method with modulo (group) 2, or 1024 bits.

Diffie-Hellman Key Exchange

If you don’t know about Diffie-Hellman, it’s a very interesting, effective, yet simple way to securely exchange keys. Despite a lot of hand waving recently, Diffie-Hellman modulo 2 is still considered secure, as there is still no computing power available that would be able to brute force it. Also if there WAS a computer that could brute force DH Group 2, it would have to do it within 24 hours. See the key life parameter? The key is actually rotated every 24 hours. If you want to be even more paranoid against this theoretical, nonexistent number crunching computer, you could lower that to something like every few hours, as well as bump up the DH Group to the next one up, which is 5 (1536 bits).

Interestingly enough, certain vendors consider DH Group 14 (2048 bits) to be the minimum, though it’s unknown why they consider this, and the conspiratorial part of me thinks it may be because they’re just trying to sell beefier hardware. I do recommend against using DH Group 1, as it’s right on the cusp of being brute forced, though even if you have to use it can still be considered acceptable, as again, key lives in the IPsec world are exceedingly short (24 hours seems to be the max people set their phase 1 key lives to), and Group 1 could be considered vulnerable only if the same key is used over the span of multiple years.

Both endpoints complete the last part of the DH key exchange by sending nonces to each other – this time, however, they use the hashing and encryption algorithms that were previously agreed upon to secure the nonce in flight. This is a final test to verify that the endpoints are in agreement on phase 1 parameters . If it checks out, the phase 1 portion is considered successful, and you now have a phase 1 security association, or SA. Most devices will issue a log record indicating something to this effect. The SA used in this example will be valid for 86400 seconds, or 24 hours.

Phase 2

Phase 2 is a bit more straightforward. Using the newly created authenticated and encrypted channel established in phase 1, the following phase 2 parameters are confirmed between both endpoints, again, using IKE:

  • Hash Algorithm: HMAC-MD5
  • Encryption Algorithm: AES128
  • Key life: 43200 seconds
  • Perfect Forward Secrecy: disabled

That’s all well and good, but this brings up one of the main issues of IPsec parameters. Why are we using these particular settings over another? For encryption, we can use all sorts of methods, including DES, triple DES (3DES), Blowfish, Twofish, AES128, AES256, AES512, RC4, RSA, and so on. For hashing, we can use MD5, SHA1, or even none! And why are we using 24-hour key lifetimes?

Read part 2, IPsec Parameter Choice Rationales
Read part 3, Bringing Sanity to Routing Over IPsec

Topics: Networking

4 Considerations for Picking A Data Center Transit Provider

When you're selecting a transit provider for your data center environment, you're selecting who brings customers to your doorstep, who controls a portion of your user experience, and of course, who controls a huge part of your uptime. There are many things to consider when making this decision:

#1. Network Capacity 

routers Routers

One of the first things you'll want to know about is the capacity of the network. It not only needs enough capacity to deal with your normal traffic, it needs enough capacity to deal with peak loads during things like product launches or promotions. Of course, there's the capacity needed to deal with traffic you don't want, too, like DoS attacks. It's important to ask questions and make sure they're able to provide the service you need.

#2 Reliability 

maintenance Maintenance

Another important factor is the reliability and redundancy of the network. Is it designed to be fault tolerant? If portions of the network represent single points of failure, then each one is a potential liability for your business. Is the network designed and built in a way where scheduled maintenance causes downtime? You may not want to go down on someone else's schedule.

#3 Routing Flexibility 

routing control flexibility Routing control flexibility

Find out how much control they have over their routing policies. Problems are bound to come up delivering traffic to your customers, so it's helpful to find a provider that can route around outages and bottlenecks.

#4 Support 

onsite support Onsite support

If you call in, are you able to talk to someone who can help you, or do they pass you off into a phone tree? Do they let you talk to the person fixing the problem? How long does it take to reach them?

At ServerCentral, we are always willing to answer these questions and talk about how we can help you.

Topics: Networking Data Center

Increase Uptime in Your Data Center Network

As networkers, we’re constantly thinking about redundancy and uptime. We’re taught that multiple links and devices means resiliency, which can be true, but complexity in the network can be equally complex to troubleshoot when things go wrong.

The classic Layer 1-2 redundancy model calls for multiple uplinks between hosts, switches, and routers. By making multiple paths available, you can create a fault-tolerant architecture. We usually need a protocol to protect against forwarding loops, which in Ethernet switching has traditionally been Spanning Tree (STP) and its variants. STP is one of those protocols that you can set and forget—until it’s time to troubleshoot.

So how do we as networkers continue to offer multiple links between devices and reduce the vulnerabilities of loop-free protocols like STP?

One option is to buy a single chassis-based Ethernet switch and load it up with redundant supervisors, line cards, and power supplies. This can be costly and often requires more cabinet space and power than is necessary.

Another option is switch stacking. A few years ago, switch vendors introduced stacking technology into certain switches. These switches have a stacking fabric that allows multiple independent devices to be configured as one virtual chassis. By connecting two standalone top-of-rack switches as one virtual chassis, we get the benefit of multiple line cards, redundant power, and a redundant control plane all in a smaller rack footprint than a bigger, chassis-based switch. What’s more, we get the added benefit of being able to fully utilize the extra capacity that we once stranded with STP and other active/standby techniques.

When host redundancy is a networking goal, the most common deployment that we see is switch stacking.

Stacking enables link aggregation between systems, which is very powerful in the network. Link aggregation gives you the ability to bond two or more physical links and present them as one logical aggregate link with the combined bandwidth of all members. On the server side of things, this can be called NIC teaming or bonding (bond mode 4 if you are a Linux user). On the routing and switching side, we specifically use Link Aggregation (IEEE 802.3ad), and often the LACP (Link Aggregation Control Protocol) option with it.

By deploying in a stacked environment with link aggregation, your server with 2x1G NICs can have 2 Gbps of active uplink capacity across two, physically diverse switches. Your switches can have that same redundancy for their uplink capacity. Your network can take full advantage of the extra capacity, and in the event of a failure, it automatically diverts traffic to remaining members.

Taking this a step further, we have server virtualization. It's a fantastic tool, and redundancy is an important aspect.

As we combine more virtual hosts onto less hardware, the impact of an individual outage grows.

What we typically do here is follow the idea of link aggregation across multiple switches. We use the hypervisor’s virtual switch to again create an 802.3ad-bonded link. We like the idea of network separation by function, so we use VLAN trunking over the bundle between the switch fabric and hypervisor. That way, we can move individual subnets and virtual machines (VMs) as needed.

There are different ways to bond links on server hardware.

I often prefer to configure LACP when using link aggregation because an individual link member won’t begin to forward until LACP has negotiated its state. If one of your switches has link but isn’t ready to forward, and your server detects link on the individual member, the server might forward to an unready switch member. LACP can prevent this.

Despite its benefits, LACP isn’t appropriate for every aggregated Ethernet scenario.

If your link is saturated or your CPU resources prevent the LACP control packets from being generated, your physical link might drop because one end didn't receive its PDUs within the configured interval.

Some hypervisors include LACP support, some make you pay for it, and some don't support it at all.

Remember that LACP is an option inside of the 802.3ad protocol (the terms are often incorrectly used synonymously), so even if your equipment can’t support LACP, link aggregation may still be a great option for you.

It's possible to mix switch types in order to create a custom stack to fit your particular needs.

For example, you might have two 10-Gbps switches and two 1-Gbps switches in a single 4-member stack:

Virtual Chassis with Link Aggregation Virtual Chassis with Link Aggregation

Here, your server has 20 Gbps of capacity for production traffic and 2 Gbps of capacity for back-end management connectivity. From a switch management standpoint, it’s one device with two logical links to configure.

Stackable switches with link aggregation provide a nice solution in the data center, where space is valuable and uptime is crucial.

Your specific configuration will depend on your hardware vendor, OS, and link capacity. Remember, we're always here to talk through your options (whether you're a managed network customer or not). If you need help, just ask!

Topics: Redundancy Networking Data Center