Cybersecurity has become a top issue that is being widely discussed in corporate boardrooms today. A cybersecurity attack on an enterprise’s systems and databases can cause a huge revenue loss to the company and put its brand reputation at stake. The average cost per data breach soared from US$3.86 million in 2020 to US$4.24 million in 2021 as per IBM’s annual data breach report.
Did you know?
- In 2021, a large data set of over 500 million Facebook users was made public. The data that got exposed included user phone numbers and email addresses to other identifying information.
- In 2021, a company that is a critical part of a global telecommunications infrastructure used by AT&T, T-Mobile, Verizon, and several others around the world revealed that hackers were inside its systems impacting more than 200 of its clients and possibly millions of mobile phone users worldwide, exposing 500 million data records
- In 2019, 100 million records were lost in a massive data breach, revealed by Capital One, exposing personal information from credit card applications over 14 years. period.
- In 2018, the design website Houzz suffered a data breach. Almost 50 million email addresses were exposed along with names, geographic locations, IP addresses, and password hashes or links to social media profiles used for logins.
How to protect your organization from Cybersecurity attacks?
Implementation of Intrusion Detection System (IDS) and Intrusion Prevention System (IPS) has been a proven technique to protect your organization from a cybersecurity attack. IDS detects malicious attempts including policy violations. IPS not only to detects but also blocks malicious attacks. Based on their physical location in an enterprise’s infrastructure, and the defined scope of protection, the IDS and IPS handle two basic types of cyberattacks: 1. network-based and 2. host-based.
However, ensuring IDS and IPS work effectively in detecting and preventing cyberattacks comes with its own challenges.
- Most of IDSs and IPSs in production are signature-based systems. These signature-based systems need to be updated with signatures every time a new threat is discovered or an old one is modified at least a little. Usually, the new attacks are discovered and studied using honey pots, which are servers or platforms that are created akin to production platforms (but not production data), to allow external intrusions to attack them. This attacker profile is used to change the signature profiles in IDS and IPS.
- Since the current day attacks are diverse, real-time, and severe, a mere honeypot-based system is not good enough to detect the intrusions, let alone prevent them. With the ever-growing usage of digital platforms, the attack surface is also increasing and hence we are witnessing various types of attacks. Hence, getting a greater number of skillful resources with all areas of knowledge to study the attack and create rules is very challenging.
How does AI help IDSs and IPSs work effectively?
Essential Elements of the Cybersecurity Framework Using AI
- The NIST Cyber Security Framework
- Focuses on 5 functions, Identify, Protect, Detect, Respond and Recover
- The Centre for Internet Security Critical Security Controls (CIS)
- Allows a company to start small and incrementally grow its Cybersecurity setup. Comprises of 20 controls which are regularly updated by a community from academicians, government, industries, etc.
- The International Standards Organization (ISO) frameworks ISO/IEC 27001 and 27002
- Also called ISO 270K, assumes there exists an Information Security Management System and demands exhaustive management, and recommends 114 different controls.
- Other frameworks like SOC2, GDPR, COBIT, etc.
- Capture intrusion profiles using Honeypots
- Implement Honeypots to manufacture hacking targets like databases, networks, and servers.
- This would help create a decoy to lure hackers away from actual/legitimate target systems.
- Capture intrusion profiles using Honeynet
- Another good way to capture cybersecurity hacker profiles is to establish a network of honeypots called “Honeynets”.
- Since “Honeynets” can be designed to look like real networks, they attract unassuming “Cybercriminals” and engage them for a longer period of time so as to capture their profiles better.
- To create and manage intrusion detection and prevention workflows based on pre-set rules
- Enable better monitoring of intrusions and prevention
- IDS and IPS applications like Snort, Check Point IPS, Cisco Firepower Next-Generation IPS, IPCop Firewall, pfSense, COMODO Firewall Pro, Sophos UTM, Smoothwall ExpressTrellix Network Security, FireEye Network Security, Hillstone S-Series NIPS, NSFOCUS Next-Generation IPS, Palo Alto Networks Threat Prevention, etc.
- Cloud-based Data Platform (Data Lake) can store IPS and IDS system transactional data
- Store history of past intrusion data
- Store Honeypot data
- Analyze and explore Intrusion data captured by IDS systems in Honey Pots
- Cloud-based AI layer can be used for training AI on the honeypot intrusion data as well as the rules set by the IPS and IDS systems
- Expose trained AI models as API that can be integrated with IDS and IPS systems
- MLOps and Model Ops – Continuously monitor AI process performance to effectively recognize intrusion patterns that are not captured by IDS/IPS Rules
- Decide on AI Techniques for automatic Intrusion detection and classification:
- Transfer Learning Process: This process envisages a combination of 2 types of machine learning techniques (Unsupervised and Supervised), wherein the output of the unsupervised learning technique will be transferred as input to the supervised learning technique. Let’s understand what happens while using these techniques:
- Unsupervised Machine Learning The data in the Honey Pot will be clustered (grouped) into a similar group of intrusion profiles by Unsupervised Learning algorithms like Clustering. These profiles may be grouped because of intrusion type intrusion channel, frequency, time of the day, mechanism, etc. The results of clustering will be analyzed to understand what are the factors that made specific sets of intrusions under each cluster and help label them under different categories of intrusion like “High/ Low, Medium” or based on intrusion types (log4j, mobile, web, etc.)
- Supervised Learning and Artificial Neural Networks This data along with the newly defined labels from above will be trained with Supervised Learning algorithms like Random Forest, Decision Tree, etc. More importantly, it will be trained with neural network algorithms like Artificial Neural Networks.
- Reinforcement Learning is a branch of AI that works on rewards and penalizations. The AI agent can learn every profile of every incoming connection to the honeypot. Pre-written rules to identify prior detected intrusions can be set as policies for the AI. Every time the AI recognizes the policy (meaning detects the intrusion), it will be rewarded. Every time it misses the policy, it will be penalized. This process will continue for a few months, making the AI learn what helps it to get more rewards as opposed to more penalizations.
- Predict Intrusions: The above-mentioned combined methodology will allow the IDS system to automatically recognize and learn from the Honey POT data which has intruder profiles. New types of attacks or intrusions can be tested on this learned model to help detect and classify the intrusion type.
- Integrate AI process with the Existing IPS/IDS Systems: Whenever a new inbound request hits the honeypot, the IDS system will call the Transfer-Learning-API above. Transfer learning API will detect/predict the occurrence of intrusion by comparing the new request with already present profiles that it had learned on. A threshold can be set to define “what can be considered as intrusion” and based on the threshold the Transfer Learning API will predict the probability that a given inbound request is an intrusion or not and if so, what intrusion type? This outcome will be returned to IDS/IPS as a response to the transfer learning API. This can then be used for further actions by the IDS/IPS system.
Author
Anand Subramaniam
Anand Subramaniam is the Chief Solutions Officer, leading Data Analytics & AI service line at KANINI. He is passionate about data science and has championed data analytics practice across start-ups to enterprises in various verticals. As a thought leader, start-up mentor, and data architect, Anand brings over two decades of techno-functional leadership in envisaging, planning, and building high-performance, state-of-the-art technology teams.