Is your IT team Chaos Monkey ready?
How does your IT team deal with pressure? In an ideal world, every IT team would be filled with those guardian angels in action movies who speak words of techy wisdom into the leader’s ear, neutralise enemy attacks, retrieve compromised systems and save the day!
However, keeping a cool head when disaster strikes is not easy for anyone and every possible eventuality cannot be prepared for… or can it? In 2011, engineers at Netflix came up with a resiliency tool called Chaos Monkey to test the robustness of its IT infrastructure, and we think this approach to worst-case-scenario management is a good culture to foster during the modern era of cybercrime.
What is Chaos Monkey?
“Imagine a monkey entering a “data center”, these “farms” of servers that host all the critical functions of our online activities. The monkey randomly rips cables, destroys devices and returns everything that passes by the hand. The challenge for IT managers is to design the information system they are responsible for so that it can work despite these monkeys, which no one ever knows when they arrive and what they will destroy.”
Antonio Garcia Martinez, Chaos Monkeys
As you can gather, the concept underpinning the Chaos Monkey is that even a correctly functioning system will endure unpredictable problems. It is a simple by-product of existing within the real world, which can be very chaotic.
However, rather than throwing in the towel because of all the disasters that may arise, Chaos Monkey was created so organisations could prepare their IT infrastructure for these situations. By willingly unleashing the Chaos Monkey – under careful monitoring and at a time that suits the business – a lot can be learnt. Now IT teams can identify weaknesses in their infrastructure before they cause real-world damage to the business.
Chaos Monkey today is merely one member of a suite of resiliency tools called Simian Army, which consists of Chaos Gorilla, Security Monkey, Doctor Monkey and other monkeys, all ready to wreak havoc.
Preparing for failure
Not every IT team has the time or the resources to let an army of monkeys loose on their IT infrastructure; however, a valuable lesson can be taken from the Netflix approach. Failure will occur. In fact it occurs at all levels of business. If your business plans on growing, scaling up, expanding, diversifying and all strategies for growth require some degree of experimentation. As Jeff Bezos says,
“… if they’re experiments, you don’t know ahead of time if they’re going to work. Experiments are by their very nature prone to failure.”
Therefore, the key is not to avoid failure, but be prepared for the outcomes of failure.
Do you have a cyberattack recovery strategy?
This idea of preparing for things going wrong is pertinent today for the IT team because of the prevalence of cybercrime. Unfortunately, cybercriminals are the real chaos monkeys of the world that cannot be simply switched off. In 2016 and 2017, cybercrime proved itself to be a very serious threat to businesses of all sizes, with headlines across the globe reporting on the issue.
In fact, according to Online Trust Alliance (OTA), the number of cyberattacks doubled in 2017 making it “the worst year ever in data breaches and cyber-incidents around the world.” Cyberattacks are the ‘new normal’, and this is why you need a robust recovery plan.
The recovery plan
Backup
When deciding what level of backup your business requires, it is helpful to ask ‘How much downtime can my business afford?’ Different sizes of business require different levels of security, and this applies to backup solutions.
It is essential to find a backup solution that meets your specific business needs, with regards to scaling, ease-of-access, ease-of-use, disaster recovery, compliance, encryption, quality and customer service. Rather than having a costly secondary infrastructure, cloud technology offers a much more cost-effective and sustainable backup solution with limitless scaling.
Monitoring
According to zdnet, “It takes most business about 197 days to detect a breach on their network.” For many organisations and their clients, this is simply not good enough. A sophisticated 24/7 monitoring solution allows for the rapid detection of cyber-attacks before they can do any damage.
Essential to this is the ability to recognise suspicious patterns within the IT network, regularly scan the whole network, mobile or on-premise, analyse network traffic to recognise suspicious third-party log-ins, check servers for changes and all this via real-time analysis and reporting.
Checklist
Having an action-oriented plan ready to go in the event of a cyberattack can be very beneficial. It can be comforting and calming to have the option to follow a checklist while under intense pressure. An investigative team should be elected, each with a specific duty assigned that is orientated around identifying, containing and neutralising the cyberattack while also communicating with the necessary parties.
A step-by-step guide to the technical processes involved in effectively acting on each of these roles should be drawn up. What should be checked first? Who needs to be contacted? What decision-making privileges does each role have? How easily can the team access the backup to the compromised system? While creating this checklist, you should find that many questions that may arise during a cyberattack can be answered in advance.
Crisis management – communicate clearly and stay positive
This one is easier said than done, but it is fundamental to truly understanding that things can and will go wrong every now and again. Although we joke about the IT team being the ones to ‘save the day’, in truth they are human beings who are just as susceptible to stress, fatigue and human error as everyone and this is especially the case if resources are limited.
When it does feel as though everything has gone wrong, it is important to stay calm and communicate clearly, with positive reinforcement every step of the way to system recovery. For more great material on this subject, visit HumanOps who actively “highlight the importance of the health of teams running systems, not just the systems themselves”.
Preparing for the next cyberattack – education and practice drills
Everyone in the organisation with access to the IT infrastructure, no matter how minor, should receive basic cybersecurity training. Regular briefings, either by presentation or email (ideally both) from the IT team can help keep everyone aware that cyber threats exist and persistently attack.
Want to free up your IT team for bigger projects? We design and implement complete security solutions for our customers, providing the right levels of safety to match your business. Get in contact with us today, and we’ll tell you all about it!