1. Pad your move time.
Give yourself more time than you think you need at every step of the migration. No matter how good your team is, the truck will have a mechanical problem, a tech will catch whooping cough from an unvaccinated kid, and a screw will get stuck because dissimilar metals caused a chemical reaction, welding it in place. It's okay to expect delays.
2. Document your network.
Write down the physical cabling configurations before a migration. No matter how well you know your network, you'll still end up investing time confirming cable placement in the back of the rack.
3. Use physical port names in cabling maps.
When making a cabling map, use the physical (not logical), manufacturer-given names labeled on a device. Sometimes we get cabling maps with instructions to connect a virtual bonded interface on a switch to “eth0” on a server, and it's much faster if the names are physical ports labeled on the device.
4. Label every device with a unique identifier.
Some servers have neat LCD screens that display the device name, but when it comes to migration time, that screen is off. If that screen is the only way to tell your servers apart, it’s time to break out the label maker. Label the front and back of each device while you're at it.
Failing hard drives, unsaved configurations, and bad connections on the circuit board tend to rear their ugly heads during a migration. Perform cold tests to find the problems (and with a fresh backup):
- Backup a server/network device configuration.
- Power it down and let it cool off for five minutes.
- Power it back up and confirm everything is working properly.
Grab a server, pack it up, and act out all the steps of the migration. This small time investment helps you set a realistic timeline and correct any surprises you discover.
7. Factor in time to troubleshoot.
After the migration, it takes time to get everything booted back up and running. Plan for some devices not to work and give yourself plenty of time to fix them.
8. Remember, you're not a robot.
If you're moving a large amount of equipment, take a shift-based approach to staffing vs. everyone doing everything together. No one can troubleshoot well after 24+ hours of strain. Demand naps if you have to.