WEBVTT

00:00.000 --> 00:16.320
So be able to get your house broken into and you feel like vulnerable and like really paranoid after that or maybe get your identity stolen and the same thing applies you feel really paranoid and kind of like just creeped out in general like you're really really on edge.

00:17.360 --> 00:24.240
If this video accomplishes what I hope it accomplishes then you'll feel that way by the end of this video.

00:24.240 --> 00:42.500
Oh, that's kind of the goal. So stick around. And then by the end I'll explain like countermeasures so hopefully you don't actually leave feeling freaked out but I want you to have kind of an understanding of acoustic side channel attacks in general because I think it's important.

00:42.500 --> 01:08.280
So we'll start off with a comparison, probably like a scenario. So imagine someone steals your credit card number without like ever touching your computer or your credit card or anything like that, right? Like they never actually hack your network or exploit any vulnerability that exists that you have that's software related.

01:08.780 --> 01:16.080
Instead, they simply listen to sounds your device makes while processing information.

01:17.400 --> 01:34.640
Yeah, it's like welcome to the world of acoustic side channel attacks, my friend, where hackers or adversaries can extract sensitive data just by analyzing the sounds that electronic devices actually produce during their normal operation.

01:34.640 --> 01:46.160
So think about typing your pin, right? Like in an ATM machine, each button produces a very distinct beep just like on a telephone. They all have like different frequencies, right?

01:46.480 --> 01:57.820
So when you press these buttons that make these sounds and while these beeps might sound a lot alike to our ears, each one is actually really really distinct and slightly different acoustically.

01:57.820 --> 02:17.220
So someone could record those beeps and analyze their actual unique characteristics and reconstruct your pin number without traditional hacking techniques and your devices are producing these information rich sounds constantly, right?

02:17.220 --> 02:28.880
Like at this very moment, your laptop fan, for example, changes speed in patterns that reveal what your processor is actually calculating, right?

02:29.060 --> 02:41.840
Like how hard it's working. Your keyboard creates unique acoustic signatures for every single key that's pressed like the space key sounds different than the enter key sounds different than the escape key, right?

02:41.840 --> 03:11.820
So like even the electronic circuits though inside your phone actually emit tiny vibrations that carry information about their operations and with the right equipment and the knowledge a adversary could capture and decode these sounds to extract things like passwords, encryption keys and other kinds of sensitive data simply by listening to the acoustic emissions from your device that naturally gets produced.

03:12.440 --> 03:19.940
So let's look at like how electronic devices actually make sound on a really basic kind of fundamental way.

03:20.220 --> 03:35.840
So one of the fundamental principles that most people never actually learn about when it comes to things like electronics is that when electricity flow through any material, it creates a physical vibration at the molecular level.

03:35.840 --> 03:41.960
Like this is really just basic physics that can't be prevented or eliminated altogether.

03:42.380 --> 03:48.960
Like every single electronic component in your computer basically vibrates when electricity current passes through it.

03:48.960 --> 03:52.080
And these vibrations disturb the surrounding air molecules.

03:52.540 --> 04:03.180
I told you, I told you, you're going to be very worried by the end of this, but but they disturb the surrounding air molecules and they create pressure waves.

04:03.180 --> 04:12.720
So it's the best way I can describe it that propagate outward as sound, if that makes sense.

04:13.320 --> 04:23.980
So like some of these sounds fall within the human hearing like range, like the world of like a cooling fan, for example, right?

04:24.340 --> 04:32.720
Well, a lot of them actually exist at frequencies or volumes below our actual perception threshold.

04:33.000 --> 04:36.680
And like I'm half deaf, so way below my perception threshold.

04:37.320 --> 04:38.940
I'm actually I can't hear in this ear.

04:39.040 --> 04:40.660
So there's a side issue.

04:41.120 --> 04:51.180
The the critical insight here, though, is that the different computational operations create very distinct vibrational patterns, right?

04:51.180 --> 05:01.180
And when your computer calculates, for example, like your tax return, the processor executes one specific sequence of operations that creates one acoustic pattern.

05:01.420 --> 05:10.860
And when they encrypt your banking password, it performs a different calculation that produces a entirely different acoustic signature all together.

05:11.400 --> 05:18.880
Opening your email creates yet another unique pattern sound that is distinguishable on like that small level.

05:18.880 --> 05:30.740
So think of it like the old dial up internet modems from like the 1990s through that old remember, like there's the strange like screeching sounds and buzzing sounds that were admitted when you were actually connecting.

05:30.900 --> 05:42.340
Like that was literally digital data being converted into acoustic signals for transmission over your phone lines, like your modern devices are doing basically the same thing.

05:43.210 --> 05:55.460
They're converting their digital operations into physical vibrations and sounds just out like, like much lower volumes and really kind of different frequencies.

05:55.860 --> 06:01.380
The difference is that quiet doesn't mean silent, right?

06:01.580 --> 06:05.940
Like, and it certainly doesn't mean secure from acoustic surveillance.

06:06.520 --> 06:08.000
Oh, yeah, that's a thing.

06:08.600 --> 06:16.440
And the science behind sound based hacking basically is sound waves, right?

06:16.580 --> 06:21.320
And then they aren't just random noise that fills the air that's around.

06:21.500 --> 06:27.920
They actually contain structured patterns that encode information about their actual source.

06:28.340 --> 06:35.900
And these patterns can be captured and analyzed and decoded to actually reveal some really surprising details.

06:35.900 --> 06:38.620
Like when you when you actually look at it.

06:38.660 --> 06:44.360
So consider how your voice, for example, works as an analogy.

06:44.860 --> 06:53.200
When you say the word hello, your vocal cords vibrate in a pretty specific pattern that creates a unique sound wave.

06:53.280 --> 06:57.380
And when you say goodbye, that pattern is obviously completely different.

06:57.720 --> 07:04.560
And a person listening can decode these patterns to understand the actual words.

07:05.090 --> 07:16.980
So electronic devices operate on that same kind of principle where every computational action produces its own like distinctive acoustic fingerprint, right?

07:17.220 --> 07:19.640
I hope my analogies don't suck here.

07:19.960 --> 07:27.480
Like the frightening development really, though, is like how sophisticated recording equipment has actually become nowadays.

07:27.950 --> 07:30.400
So professional grade microphones, right?

07:30.780 --> 07:43.520
Consisting of just a few hundred dollars can out detect vibrations so minute that you would need like laboratory grade instruments and equipment to actually observe them through any other method.

07:43.680 --> 07:55.480
And once these nearly like imperceivable sounds are actually captured, modern computer analysis can actually identify patterns that would be completely invisible to human perception.

07:55.480 --> 07:59.820
And machine learning has revolutionized acoustic attack capabilities.

08:00.320 --> 08:10.400
Like researchers can train neural networks on the thousands of different keyboard recordings and male, female keyboard production years and types and all that stuff.

08:10.600 --> 08:21.720
Teaching the algorithm that this specific sound pattern actually means someone hit the letter A on whatever kind of keyboard, whether it's a man or woman.

08:21.720 --> 08:35.460
Now, once you feed this trained system a new recording of someone typing, for example, it can reconstruct the entire documents with pretty scary accuracy, basically.

08:36.040 --> 08:44.540
And what we're describing is essentially superhuman hearing capability combined with perfect memory, right?

08:44.540 --> 08:52.200
Infinite patience and the ability to actually detect patterns across massive data sets, right?

08:52.660 --> 08:58.980
And these are capabilities that acoustic attackers can now deploy against your devices.

08:59.320 --> 09:03.440
Common attack methods, for example, would be like keyboard sound analysis.

09:04.000 --> 09:10.560
So let's look at one of the most accessible acoustic attacks that are actually around.

09:10.560 --> 09:12.320
And we'd check this out in detail.

09:12.540 --> 09:17.040
So your keyboard, to you, obviously, could seem just like an IO device, right?

09:17.160 --> 09:18.040
Like an input device.

09:18.080 --> 09:22.200
But it's actually broadcasting every character that you actually type through sound.

09:22.380 --> 09:29.980
Every key on your keyboard produces a subtly different sound, acoustic signature, if you will, when pressed.

09:30.220 --> 09:36.060
And the spacebar at the bottom of the keyboard, like I was talking about earlier, creates a different vibration.

09:36.060 --> 09:43.680
Then the Q key or the ones at the top or whatever, that the manufacturing company

09:44.320 --> 09:45.420
put in there, right?

09:45.800 --> 09:50.120
Like, and there's all the other manufacturing variations that exist as well.

09:50.400 --> 09:55.020
And how the keyboard is actually positioned on your desk can also affect it.

09:56.100 --> 09:59.020
Here's how an actual attack unfolds though.

09:59.360 --> 10:04.740
An attacker first needs to record your typing, which they might accomplish through a ton of different

10:04.740 --> 10:07.060
means, including just a phone call to you while you're busy.

10:07.280 --> 10:11.340
Like they can compromise your video conferencing software, for example,

10:11.420 --> 10:13.580
to access the microphone during calls.

10:13.840 --> 10:16.340
They might hide a small recording device in your office.

10:16.720 --> 10:22.500
Sometimes even a smartphone sitting innocently on your desk can serve as a recording device,

10:22.780 --> 10:26.540
especially if it's running something like malicious software.

10:26.960 --> 10:32.460
Now, once they have the recording, they can feed it to their trained machine learning model

10:32.460 --> 10:39.480
and this AI system analyzes each keystroke, comparing it against its training data to determine

10:40.260 --> 10:44.800
which key produced which sound and do this analysis.

10:45.040 --> 10:50.040
The system actually reconstructs everything you typed during the recorded period.

10:50.240 --> 10:55.980
The accuracy rates achieved in things like academic research are genuinely kind of scary.

10:55.980 --> 11:02.580
Recent studies have actually demonstrated that reconstruction accuracy exceeds 95%

11:02.580 --> 11:06.180
when recordings from across the room take place.

11:06.420 --> 11:12.300
Some experiments have successfully captured keystrokes through closed doors or from neighboring

11:12.300 --> 11:14.640
offices. Think about that.

11:14.980 --> 11:19.020
You could mail someone a package, have it express, have the phone running,

11:19.120 --> 11:22.600
have an extra battery pack in there and it's recording everything as it goes through.

11:22.600 --> 11:25.380
It sits in an office, might sit there for a couple hours.

11:25.620 --> 11:29.700
What information are they able to ascertain from that just sitting there?

11:30.020 --> 11:33.180
I mean, obviously, when you open the package, you find a phone, it's going to be weird,

11:33.260 --> 11:35.320
but whatever, they've already gotten the information they need.

11:35.540 --> 11:42.800
Side issue, like professional mechanical keyboards ironically are favored by security conscious

11:43.320 --> 11:48.460
users and often produce louder and more distinctive sounds that make these attacks

11:48.460 --> 11:54.200
even easier. So then let's pivot and we'll go to like computer fan attacks.

11:55.040 --> 11:57.200
Yeah, yeah, I know, I know.

11:57.260 --> 12:02.020
This attack method granted it absolutely sounds like something from a spy movie,

12:02.040 --> 12:06.680
but it's been repeatedly demonstrated in real laboratory conditions.

12:07.000 --> 12:11.280
Your computer's processor generates heat, a proportional to its computer load, right?

12:11.500 --> 12:15.060
And when the processor works harder, it produces more heat, makes sense.

12:15.060 --> 12:17.940
Causing the cooling fan to spin faster.

12:18.100 --> 12:20.240
And you got people like, oh, what if I have water cooling?

12:20.460 --> 12:23.700
Yeah, okay, like, we'll just, let's keep it simple for now, right?

12:24.500 --> 12:25.980
Well, we can get into that at another time.

12:26.240 --> 12:31.440
But like when computational demands decrease, the processor cools down

12:31.440 --> 12:33.920
and the fan slows down accordingly.

12:34.300 --> 12:39.160
Now, the key insight that makes this attack actually possible is that cryptographic

12:39.160 --> 12:45.620
operations like encryption and decryption involve very specific mathematical calculations.

12:45.880 --> 12:52.080
These calculations cause the processor to work in pretty predictable, alternating patterns

12:52.080 --> 12:56.620
between intensive computation and brief pauses.

12:56.940 --> 12:57.920
I hope that makes sense.

12:58.160 --> 13:01.740
I might have just did a word salad, but I think we're good.

13:02.220 --> 13:08.540
Like, think of it like very kind of specific rhythm or a heartbeat,

13:08.540 --> 13:14.240
I guess would be a great one that corresponds to the particular cryptographic operation

13:14.240 --> 13:17.040
actually being performed, right?

13:17.760 --> 13:24.140
So the cooling fan faithfully follows this computational heartbeat, speeding up, slowing down,

13:24.460 --> 13:29.780
kind of in perfect synchronization with the processor's work pattern and workload.

13:30.120 --> 13:34.920
So by recording the fan noise and analyzing how much pitch changes over time,

13:34.920 --> 13:39.340
attackers can in a way kind of like reverse engineer the computational patterns

13:39.340 --> 13:47.340
in workload now with sufficient enough analysis, they can actually extract cryptographic keys

13:47.340 --> 13:49.520
being used for encryption.

13:49.940 --> 13:55.100
So to understand how this works, like imagine trying to crack a safe combination by

13:55.100 --> 13:59.520
listening to someone exercising, right?

13:59.860 --> 14:04.240
When they grunt with more effort, they might be lifting heavier weights,

14:04.240 --> 14:11.320
which could obviously correspond to higher numbers in a combination.

14:11.960 --> 14:16.340
When their breathing is easier, they might be at a lower number.

14:16.740 --> 14:22.440
While this analogy is obviously very simple, it captures kind of the essence of

14:22.440 --> 14:27.460
how fan noise can leak information about the computational effort that's actually required

14:27.460 --> 14:29.760
for different parts of the encryption key.

14:29.760 --> 14:38.060
Researchers have successfully demonstrated extracting a 2048 bit RSA keys and 384 bit

14:38.060 --> 14:40.920
ECDSA keys using this method.

14:41.140 --> 14:44.100
There are studies that say it's not theoretical that they have done it.

14:44.220 --> 14:49.520
So it's not theoretical, but I feel like it's theoretical, if that makes sense.

14:49.860 --> 14:56.200
Anyways, now what they claimed is that this often only required a few minutes of fan noise to

14:56.200 --> 15:03.760
actually get that enough information to actually make that determination, which is insane, right?

15:04.020 --> 15:11.400
Like the attack works best in quiet environments, but it's been proven effective even with moderate

15:11.400 --> 15:12.460
background noise.

15:12.620 --> 15:14.680
We have keyboard noise, fan noise.

15:15.480 --> 15:16.680
What else makes noise?

15:18.360 --> 15:20.180
How about hard drives, right?

15:20.500 --> 15:22.560
So traditional hard drive sounds, right?

15:22.800 --> 15:23.820
They're mechanical hard drives.

15:23.820 --> 15:30.080
They operate kind of like high tech record players with spinning magnetic platters and they read and

15:30.080 --> 15:34.000
write and moves across the surface to access data.

15:34.160 --> 15:39.540
This mechanical operation again creates another rich acoustic environment, which is full of

15:39.540 --> 15:43.480
goodies, full of your information about things like disk activity.

15:44.080 --> 15:49.280
So different files are stored in different physical locations on the actual disk platters.

15:49.280 --> 15:56.640
And when your computer needs to say open a file, the read right head must physically move

15:56.640 --> 16:00.700
to that location creating a specific pattern of mechanical sounds.

16:01.020 --> 16:03.780
The sequence might be click, click, whirl.

16:05.820 --> 16:08.560
I totally, whatever, you get my point, right?

16:08.920 --> 16:10.960
I might look like an idiot, but now at least you understand.

16:11.580 --> 16:17.560
But it might be that particular click, click, whirl for that particular file.

16:17.560 --> 16:25.200
But I don't know, like whirl, click, click, whirl for another file stored in a different location.

16:25.540 --> 16:31.180
Now each file access creates its own unique acoustic signature based on the physical distance

16:31.180 --> 16:36.500
and the distance that the head must actually travel and the sectors it must read.

16:36.760 --> 16:41.280
By recording and analyzing these mechanical sounds, attackers can determine which files

16:41.280 --> 16:49.340
you're accessing, what programs you're running, or what even database queries that you're actually

16:49.340 --> 16:54.740
executing. And the technique is sophisticated enough to distinguish between like opening

16:54.740 --> 17:00.180
different documents, accessing different parts of a database or running different applications.

17:00.520 --> 17:04.240
It's comparable to being able to determine which book someone is actually reading

17:04.240 --> 17:08.280
by carefully listening to the sound of pages turning,

17:08.280 --> 17:15.500
where each book's unique thickness of paper, paper quality, and binding create distinctive

17:15.500 --> 17:22.300
acoustic patterns. Like even solid stage drives, which have no moving parts and were once

17:22.300 --> 17:27.880
considered immune to acoustic attacks, have been proven vulnerable. While SSDs like don't

17:27.880 --> 17:33.840
actually produce the obvious mechanical sounds, they do generate electromagnetic interference

17:33.840 --> 17:41.020
that basically can then couple with nearby components to produce faint acoustic emissions.

17:41.420 --> 17:45.780
Like these sounds are obviously much, much quieter than mechanical drives,

17:45.880 --> 17:51.720
but they still leak information about data across patterns when records are opened or accessed

17:52.660 --> 17:55.040
with sensitive equipment. So

17:57.220 --> 18:01.800
that's insane, but let's move on and we'll look at another one,

18:01.800 --> 18:07.540
which is electronic component wine. And deep inside of every electronic device, right,

18:07.800 --> 18:14.140
there are components like capacitors and inductors and transformers and all of them physically

18:14.140 --> 18:18.220
vibrate when electric current flows through them like we're talking about at the beginning.

18:18.580 --> 18:23.000
These vibrations often occur at frequencies above human hearing or

18:23.000 --> 18:28.580
at volumes below our perception threshold, but they contain valuable information

18:28.580 --> 18:34.880
about the device's operations, different computational tasks cause different patterns

18:34.880 --> 18:40.140
of electrical current flow, which in turn create different vibrational frequencies

18:40.760 --> 18:46.700
in these specific components. Now, the phenomenon is similar to how a guitar

18:46.700 --> 18:51.660
string basically produces different notes, depending on like how it's

18:51.660 --> 18:57.400
slot in where it's held. Your computer's components are essentially playing a

18:57.400 --> 19:04.760
inaudible song that describes exactly what calculations they're performing. And many users

19:04.760 --> 19:11.660
have experienced this phenomenon directly through what they call coil wine in graphics cards.

19:11.960 --> 19:17.820
Gamers often complain about high bitch sounds coming out of their GPUs when the GPU is under

19:17.820 --> 19:23.760
heavy load. But what they're actually hearing is their graphic card, basically inadvertently

19:23.760 --> 19:30.700
broadcasting information about its computational load through acoustic emissions. Researchers

19:30.700 --> 19:38.660
have basically demonstrated that by analyzing these sounds in detail, they can determine what

19:38.660 --> 19:45.460
specific calculations the GPU is actually performing and when and potentially obviously

19:45.460 --> 19:54.740
revealing sensitive information about the rendered content or process data. And the attack

19:54.740 --> 20:00.180
becomes particularly powerful when targeting things like cryptographic operations. Get

20:00.180 --> 20:06.520
front encryption algorithms create distinct, electronical patterns that translate into unique

20:07.060 --> 20:13.400
acoustic signatures, basically. And by recording and analyzing component wine

20:13.400 --> 20:19.440
during encryption operations, attackers can potentially identify which algorithm is

20:19.440 --> 20:26.840
algorithm is being used and even extract key material in some cases. After the first one,

20:26.860 --> 20:29.780
you kind of guess the other ones, because you've heard them before, you've probably heard your

20:29.780 --> 20:36.220
GPU wine, heard your fan go before, keyboard, right? All kind of obvious ones. But like,

20:36.280 --> 20:42.100
what are some advanced attack scenarios, right? So let's get into like smartphone

20:42.100 --> 20:47.900
vulnerabilities. Like your smartphone represents a pretty perfect storm of

20:47.900 --> 20:54.780
acoustic vulnerabilities. Modern phones contain multiple really high quality microphones

20:54.780 --> 20:59.980
that are designed to capture sound from various directions. They're equipped with powerful

20:59.980 --> 21:07.340
processors that are capable of real time audio analysis. And most critically, users routinely

21:07.340 --> 21:12.780
grant microphone permissions to dozens of applications without considering security

21:12.780 --> 21:20.140
implications. I'm sure you will now. But a single malicious application with microphone access,

21:20.560 --> 21:25.700
right? Now all of a sudden transforms your phone into a really dangerous and sophisticated

21:25.700 --> 21:31.820
acoustic spy device. And this basically compromised phone can record not just like

21:31.820 --> 21:35.580
your conversations, whatever, who cares about that. But like all the acoustic

21:35.580 --> 21:41.840
emissions from nearby devices. So place your phone on the same desk where your keyboard is.

21:42.120 --> 21:48.220
Great. While you're working, even better. Like you've inadvertently provided attackers with a

21:48.220 --> 21:52.560
front row acoustic access to every keystroke that you type like who needs a key lover who needs

21:52.560 --> 21:56.980
malware at that point for PCs or whatever. You just leave it near your computer and it'll

21:56.980 --> 22:03.080
capture fan patterns, hard drive sounds potentially, and component wine. And the

22:03.080 --> 22:10.160
thread extends beyond just a like simple recording, right? Like smartphones can produce and detect

22:10.160 --> 22:16.380
ultrasonic frequencies to above the hearing range of the human ear, which is typically in the 18 to

22:16.380 --> 22:22.600
22 kHz range. Attackers have developed sophisticated systems that use these inaudible

22:22.600 --> 22:27.580
sounds for things like tracking and data acceleration. For example, in an advertisement

22:27.580 --> 22:32.720
playing on your smart TV, it might emit an ultrasonic beacon that your phone's microphone

22:32.720 --> 22:38.860
detects linking your TV viewing habits to your mobile identity without your actual knowledge.

22:39.100 --> 22:45.000
Cross device tracking through ultrasonics has been documented in real world advertising,

22:45.140 --> 22:49.560
by the way, it's not just like something I'm making up, but it's been actually documented in

22:49.560 --> 22:55.620
like real world advertising, like in networks. Anyways, retail stores use stuff like

22:55.620 --> 23:02.140
ultrasonic beacons to track customer movements, malicious actors or threat adversaries can

23:03.290 --> 23:08.420
exploit the same technology to create things like covert communication channels between

23:08.420 --> 23:15.280
infected devices and coordinated attacks or exfiltrate data from air gap systems, right?

23:15.540 --> 23:22.560
Like air gap systems are like air gap computers that are physically isolated, right? From all

23:22.560 --> 23:27.620
networks with no internet connection, no Wi-Fi capability and no physical network capabilities,

23:27.920 --> 23:33.860
like organizations use air gapping for their like most exclusive and serious kind of sensitive

23:33.860 --> 23:38.820
information. Believing that physical isolation provides the perfect security. Well, now we know

23:38.820 --> 23:44.600
who, right? Acoustic attacks kind of shadow that illusion of perfect isolation. And this is

23:44.600 --> 23:49.260
why I said you're going to get really paranoid. Researchers have developed multiple methods

23:49.260 --> 23:56.700
for extracting data from air gap systems using acoustic channels. The most elegant involves malware

23:56.700 --> 24:04.460
that actually manipulates cooling fan speeds basically to encode data by speeding up and slowing

24:04.460 --> 24:10.640
down the fan in specific patterns. And the malware can transmit binary data through acoustic signals.

24:10.880 --> 24:15.860
It's essentially technological Morse code, I guess you'd say, right? Transmitted through

24:15.860 --> 24:22.140
fan noise. And the data rates, like obviously, like watching YouTube with this stuff, right? But

24:22.140 --> 24:26.900
like the data rates are going to be modest, but they're going to be sufficient enough for stealing

24:26.900 --> 24:31.580
encryption keys, passwords, small documents, like researchers have actually achieved

24:31.580 --> 24:38.760
transmission rates using fan noise alone. An attacker basically needs only to place

24:38.760 --> 24:43.360
a smartphone or small recording device within the acoustic range of the target system.

24:43.360 --> 24:50.440
The recording device can be hidden in the same room, placed in an adjacent office or in a back edge

24:50.440 --> 24:57.840
in that room, or like even operated from outside in some cases, right? If windows allow sound

24:57.840 --> 25:03.260
transmission, basically. And other acoustic exfiltration methods from air gap systems

25:03.260 --> 25:07.900
include manipulating things like hard drive patterns, kind of like how we were talking

25:07.900 --> 25:12.720
about before to create specific sound sequences using the computer's built-in speaker to

25:12.720 --> 25:20.780
generate ultrasonic signals or even modulating the sounds of capacitor, wine to do things like

25:20.780 --> 25:27.740
carry data. Each method offers different tradeoffs between data rate transmission distance and

25:27.740 --> 25:34.060
detectability, like understanding the effective range of acoustic attacks, obviously is going to

25:34.060 --> 25:41.020
help in developing appropriate defenses for them. Most keyboard acoustic tax work reliably

25:41.020 --> 25:48.140
within 10 to 20 feet of the target device under difficult conditions and typical is kind of a

25:48.140 --> 25:52.640
dangerous assumption I like to go with, but insecurity planning researchers have successfully

25:52.640 --> 25:58.800
demonstrated keyboard sounds reconstructed from recordings made in adjacent rooms through things

25:58.800 --> 26:04.420
like ventilation systems and even from outside the building through things like windows. In one

26:04.420 --> 26:11.720
experiment that's kind of notable, researchers actually reconstructed typed text from a recording

26:11.720 --> 26:18.340
made with a smartphone placed in a bag on the floor of a conference room that was over 15 feet

26:18.340 --> 26:24.840
from the target. Like high quality directional microphones can extend that effectiveness even

26:24.840 --> 26:31.600
further, right? Potentially capturing usable keyboard sounds from across large open offices

26:31.600 --> 26:37.700
environmental factors significantly impact acoustic attack effectiveness as well. Background

26:37.700 --> 26:45.180
noise basically provides some like natural protection as it makes isolating target sounds

26:46.150 --> 26:53.520
a bit more challenging like a busy coffee shop offers more acoustic cover than say a

26:53.520 --> 26:58.240
quiet house, right? Working in a city or trying to eavesdrop on someone working in a

26:58.240 --> 27:02.800
city or in a busy office would be more challenging than someone being in a quiet

27:02.800 --> 27:09.280
countryside house working remote from home with no music playing. So however, like the

27:09.280 --> 27:16.020
modern signal processing techniques can filter out steady background noise and machine

27:16.020 --> 27:22.540
learning models can be trained to work in noisy environments. So you're kind of screwed no matter

27:22.540 --> 27:30.580
what but like building construction also affects sound propagation in pretty complex ways like

27:30.580 --> 27:36.240
glass windows obviously act as kind of acoustic membranes that vibrate with sound waves potentially

27:36.240 --> 27:42.380
allowing things like laser microphones to read indoor sounds from hundreds of feet away.

27:42.560 --> 27:46.340
I always thought the whole laser sound thing when I was a kid I was reading like popular

27:46.340 --> 27:49.120
science problem mechanics I always thought that was like a myth I always thought it was

27:49.120 --> 27:56.720
like a scam nope. So thin walls obviously most of us at one point I have been in the

27:56.720 --> 28:02.480
chief apartment and like thin walls common in modern instruction transmit sound like

28:02.480 --> 28:09.260
real easily while thick concrete provides much better isolation obviously hard surfaces

28:09.260 --> 28:17.940
reflect sound potentially carrying acoustic information around corners if you've ever

28:17.940 --> 28:25.080
constantly things like furnishings right like these pads behind me also absorb it right so

28:25.080 --> 28:30.140
what are some of the actual tools that are needed right well there's a growing number of

28:30.140 --> 28:35.900
tools that acoustic attack tools that basically represent a growing security concern right

28:36.420 --> 28:41.500
professional audio equipment that once cost thousands of dollars is now available for a

28:41.500 --> 28:47.160
couple hundred and the specific equipment needed varies by attack type but basic acoustic attack

28:47.160 --> 28:53.780
kit is pretty affordable like high quality usb microphones that are designed for podcasting

28:54.330 --> 28:59.980
can capture their suitable sounds needed for things like obviously keyboard acoustic analysis but

28:59.980 --> 29:04.900
even a crappy cell phone would do that models caught a sting under two hundred dollars offer

29:04.900 --> 29:11.960
frequency responses and sensitivity specifications basically that would have required professional

29:11.960 --> 29:18.840
studio equipment just a decade ago parabolic microphones for long range recordings are available

29:18.840 --> 29:24.980
online for like under 500 bucks you can find them through solid services for like less than 50 bucks

29:24.980 --> 29:31.020
software defined radio devices also enable attacks to basically detect electromagnetic

29:31.020 --> 29:38.880
emissions that basically couple into acoustic signals popular sdr platforms cost between 20 and

29:38.880 --> 29:45.400
like 300 bucks depending on the frequency range and the sensitivity like requirements that you

29:45.400 --> 29:51.460
actually have so these devices can detect electromagnetic interference from electronic

29:51.460 --> 29:58.480
components that manifest as acoustic emissions the analysis software required for acoustic attacks

29:58.480 --> 30:03.900
is you know predominantly open source and you know freely available machine learning frameworks

30:03.900 --> 30:10.500
like tensor flow and pytorch provide the tools that you actually need to build keyboard acoustic

30:10.500 --> 30:16.940
recognition systems audio processing libraries handle signal filtering and frequency analysis

30:17.520 --> 30:22.360
like are we just screwed like is there is there nothing that we can do there are some defense

30:22.360 --> 30:27.880
strategies that are physical defense strategies that operate as countermeasures that are effective

30:27.880 --> 30:34.020
against acoustic attacks and it means with understanding that sound is a physical phenomenon

30:34.590 --> 30:40.540
that follows a predictable rule sound absorbing material can dramatically reduce acoustic leakage

30:40.540 --> 30:47.020
from sensitive areas professional acoustic panels designed for recording studios work well but like

30:47.020 --> 30:52.960
even simple solutions like heavy curtains or carpeting or upholstered furniture also help

30:52.960 --> 30:58.580
absorb sound energy these panels that you see behind me were really cheap they're like a dollar each

30:58.580 --> 31:04.700
or maybe even 50 cents each i think i'm on amazon white noise generators are always fun they create

31:04.700 --> 31:12.520
basically acoustic masking that makes it much much harder to isolate device sounds from background

31:12.520 --> 31:17.960
noise like i saw an interview one time where is this nsa agent he's talking about how when you

31:17.960 --> 31:21.240
walk to the hallway at the nsa they have this white noise and then they talk about brown

31:21.240 --> 31:26.440
noise and pink noise it gets really nuts but um but it played constantly now the key is using

31:26.440 --> 31:32.440
things like pink or brown noise rather than pure white noise as these lower frequency weighted

31:32.440 --> 31:40.200
sounds better mask the typical frequency ranges of device emissions position these generators

31:40.760 --> 31:45.500
strategically between potential recording locations and stuff like sensitive devices

31:45.810 --> 31:52.120
some organizations install permanent sound masking systems like the nsa and the hallways

31:52.120 --> 31:58.640
of the speakers that provide consistent acoustic cover throughout sensitive areas physical distance

31:58.640 --> 32:06.520
remains one of the most effective distances even doubling the distance reduces sound intensity

32:06.520 --> 32:13.160
by like six decibels arranging offices so that sensitive systems are far from publicly

32:13.160 --> 32:20.880
accessible areas exterior windows and stuff like shared walls provides real meaningful protection

32:20.880 --> 32:27.100
while processing a highly sensitive information you should consider something like using interior

32:27.100 --> 32:33.780
rooms without windows or exterior walls have me thick walls like traditional physical security

32:33.780 --> 32:43.140
measures gain a new importance in the context of these kind of acoustic attacks that we're

32:43.140 --> 32:51.540
recording devices obviously restricted access to things like near sensitive systems and visitor

32:51.540 --> 32:57.440
management all help prevent attackers from doing things like positioning recording equipment

32:57.440 --> 33:04.560
effectively packages should be kept in a isolated area for example so someone sends a phone right

33:04.560 --> 33:10.080
like we're the attack vector we're doing all before technical solutions that are kind of

33:10.080 --> 33:17.140
air based also exist as defenses against acoustic attacks but those are also evolving rapidly just

33:17.140 --> 33:23.540
like the attack vectors right modern operating systems can implement fan speed randomization

33:23.540 --> 33:30.720
that actually disrupts the pattern attacks that attackers rely on for cryptographic key extraction

33:30.720 --> 33:37.220
some security software as random computational delays during sensitive operations to prevent

33:37.220 --> 33:43.940
timing based acoustic analysis and hardware manufacturers are beginning to address things

33:43.940 --> 33:50.700
like acoustic security in their designs newer keyboard designs use dampening materials to more

33:50.700 --> 33:57.240
uniform key mechanisms to reduce acoustic variation between the keys some high security systems

33:57.240 --> 34:03.240
use keyboards that are filled with foam or use gel to actually muffle the keystrokes

34:03.240 --> 34:10.040
and mechanical keyboards with o-ring dampeners can reduce acoustic emissions while also maintaining

34:10.040 --> 34:16.820
tactile feedback which you kind of need if that's what you've been doing your whole life like

34:16.820 --> 34:24.100
component level shielding also does stuff like reduces the actual electromagnetic emissions

34:24.420 --> 34:32.160
that can couple into acoustic signals so better power supply designs with improved filtering

34:32.160 --> 34:38.980
also reduce things like coil wire some security systems use specifically designed cases with

34:38.980 --> 34:44.280
acoustic dampening materials that muffle all internal sounds these cases often include sealed

34:44.280 --> 34:50.200
designs that also provide things like electromagnetic shielding now detection systems can do things

34:50.200 --> 34:56.440
like alert security teams to potential acoustic surveillance ultrasonic detectors can identify

34:56.440 --> 35:02.860
covert communication attempts above human hearing range acoustic anomaly detection systems use

35:02.860 --> 35:08.600
machine learning to identify unusual sound patterns that might indicate recording devices or

35:08.600 --> 35:16.880
acoustic data acceleration some advanced systems can even detect the presence of my favorite because

35:16.880 --> 35:24.480
it's super nerdy laser microphones by identifying the infrared beams that they actually use

35:25.380 --> 35:33.080
so this brings us to an interesting question of like who actually uses these attacks right like

35:34.000 --> 35:40.580
who is it Sam that's going to be using laser microphones against me who is it Sam that's

35:40.580 --> 35:49.040
going to be listening to my fan speeds and hard drive speeds and well actually a lot of people

35:49.040 --> 35:56.540
right threat actors for one nation state intelligence agencies for another um have most

35:56.540 --> 36:02.360
likely employed acoustic attack techniques for decades like long before the public actually

36:02.360 --> 36:07.340
knew about them now it's only thanks to things like research that's brought these methods to

36:07.340 --> 36:14.000
light like these agencies have absolutely developed resources for custom recording equipment and

36:14.000 --> 36:22.060
the expertise to analyze complex acoustic signals and the operational capability to position these

36:22.060 --> 36:27.820
recording devices near things like high value targets that they have in mind that they want

36:27.820 --> 36:34.200
to spy on now for intelligence agencies acoustic attacks offer the kind of perfect combination

36:34.200 --> 36:39.760
of passive collection difficult to actually prove that they're the ones who sent it and

36:39.760 --> 36:44.780
the ability to bypass one strong encryption a corporate espionage which is by the way a billion

36:44.780 --> 36:51.260
dollar industry also is another fairly significant threat vector for things like acoustic attacks

36:51.260 --> 36:56.600
industrial spies can use acoustic techniques to steal train secrets during business meetings

36:57.120 --> 37:03.120
capture passwords during video conferences or monitor competitor activities like the

37:03.680 --> 37:09.840
passive nature of acoustic recording makes it ideal for things like long-term intelligence

37:09.840 --> 37:15.820
gathering in corporate environments a single compromised smartphone in a board room could

37:15.820 --> 37:20.880
provide months of valuable intelligence or millions or possibly even billions of dollars

37:20.880 --> 37:25.920
in information like sophisticated criminal organizations are beginning to adapt acoustic

37:25.920 --> 37:31.000
attack techniques as traditional digital attacks become more and more difficult so

37:31.000 --> 37:37.040
as organizations improve their network security by doing things like implementing strong encryption

37:37.040 --> 37:42.900
and training employees to recognize phishing attempts and criminals seek alternate attack

37:42.900 --> 37:48.420
vectors acoustic attacks provide a way to bypass many traditional security measures

37:48.420 --> 37:54.060
potentially capturing passwords and sensitive data despite strong digital defenses that are

37:54.060 --> 38:00.760
in place right and so like security researchers kind of continue to advance in this field

38:00.760 --> 38:05.960
through things like reasonable disclosure which is awesome because it allows me to do things

38:05.960 --> 38:12.660
like give talk to you about this but now academic institutions in corporate research labs regularly

38:12.660 --> 38:18.900
publish papers that detail these kind of novell acoustic vulnerabilities and attack

38:18.900 --> 38:23.980
techniques which again like no doubt like they definitely sound like they come from a spy movie

38:23.980 --> 38:31.260
but they don't this research aims basically to improve security by identifying vulnerabilities

38:31.260 --> 38:38.480
that also provides a roadmap for malicious actors and threat actors to develop their own acoustic

38:38.480 --> 38:45.060
contact capabilities unfortunately but that's just the world we're in and target selection

38:45.060 --> 38:51.740
high value individuals make prime targets for acoustic surveillance due to the focused nature

38:52.380 --> 38:58.600
of these attacks like CEOs typing passwords or discussing confidential strategies or

38:59.040 --> 39:04.320
government officials for example handling classified information and celebrities

39:04.320 --> 39:09.480
providing private communications all kind of represent really valuable targets where the

39:10.080 --> 39:16.520
effort of acoustic surveillance actually bays off the end like the targeted nature

39:17.100 --> 39:22.960
of acoustic attacks makes them more suitable for focused operations rather than stuff like

39:22.960 --> 39:28.660
mass deployment right government facilities and corporate headquarters often implement

39:28.660 --> 39:35.560
sophisticated digital security but overlook stuff for things like acoustic attacks many secure

39:35.560 --> 39:42.500
facilities were designed and built before acoustic attacks were well understood leaving them vulnerable

39:42.500 --> 39:47.880
to these techniques sensitive areas like situations rooms and executive offices and

39:47.880 --> 39:53.900
secure communication facilities require special attention to specifically do things like

39:53.900 --> 39:59.120
acoustic security obviously there are a ton of places that have accounted for this i'm not

39:59.120 --> 40:06.240
they haven't i'm saying that in general a lot of places have it like critical infrastructure

40:06.240 --> 40:13.280
especially faces unique challenges from stuff like acoustic attacks because the industrial control

40:13.280 --> 40:19.780
systems often rely on stuff like older equipment that provides distinct acoustic signatures power

40:19.780 --> 40:25.200
generation facilities for example water treatment plants manufacturing systems and transportation

40:25.200 --> 40:32.100
are all great examples of things that rely on mechanical systems to create and run that create

40:32.100 --> 40:38.700
information rich acoustic environments and these sounds might reveal operational schedules

40:38.700 --> 40:44.740
equipment status or processing parameters that could be valuable for planning physical or

40:44.740 --> 40:49.960
cyber attacks so i mean all this brings us to like the future of acoustic security

40:49.960 --> 40:56.040
like where are we going to be in 20 years well i would point out that some of the emerging

40:56.040 --> 41:01.400
threats like artificial intelligence really continues to kind of revolutionize the threat

41:01.400 --> 41:07.380
landscape acoustic attack capabilities with each passing year evolve like deep learning models

41:07.380 --> 41:14.320
become increasingly sophisticated at isolating signals from noise recognizing subtle patterns

41:14.320 --> 41:22.260
and acoustic data and reconstructing information from very minimal input in current research explores

41:22.260 --> 41:29.700
using things like generative ai in order to enhance poor quality recordings potentially making

41:29.700 --> 41:37.340
previously unusable acoustic data valuable for attacks now and miniaturization of recording

41:37.340 --> 41:46.540
devices proceeds rapidly with with mems microphones becoming smaller and more sensitive and more

41:46.540 --> 41:51.840
power efficient researchers have developed recording devices smaller than a grain of rice

41:51.840 --> 41:58.520
that can operate for weeks on tiny tiny batteries like these devices can be hidden almost anywhere

41:58.520 --> 42:04.240
making physical detection increasingly difficult to say the least if not impossible smart dust

42:04.240 --> 42:10.920
concepts propose microscopic recording devices that could be dispersed like powder creating a

42:10.920 --> 42:17.000
pervasive acoustic surveillance network and new attack vectors emerge regularly as research

42:17.000 --> 42:22.380
is explored previously totally unconsidered acoustic channels recently discoveries including

42:22.380 --> 42:29.120
extracting data from the sound of 3d printers to reconstruct printed objects using monitor

42:29.120 --> 42:35.960
brightness variations to create acoustic signals through coil wine and determining gps coordinates

42:35.960 --> 42:43.200
from acoustic signatures of power grid interference basically like each new discovery expands the

42:43.200 --> 42:49.900
attack surface that security professionals have to know about and have to defend so defense

42:49.900 --> 42:55.480
evolution is another topic that's worth talking about the security industry basically responds

42:55.480 --> 43:02.140
to acoustic threats with increasingly sophisticated countermeasures acoustic security assessments

43:02.140 --> 43:08.500
are becoming standard practice high security facilities for example using the same tools

43:08.500 --> 43:13.900
and techniques as attackers to identify vulnerabilities before they can be exploited

43:13.900 --> 43:19.880
security standards are beginning to include things like acoustic considerations requiring

43:19.880 --> 43:27.000
organizations to address these threats in their security planning manufacturing tends to show

43:27.000 --> 43:31.920
increased awareness of things like acoustic security to future devices may also include

43:31.920 --> 43:38.380
built-in acoustic countermeasures such as active noise cancellation specifically designed to mask

43:38.380 --> 43:44.500
information bearing sounds right randomized acoustic signatures that change with each device

43:44.500 --> 43:52.760
to prevent training effective recognition models and acoustic isolation built into component design

43:53.090 --> 44:00.240
from the ground up so then we get to education and awareness representation now as more security

44:00.240 --> 44:06.280
professionals understand acoustic threats they can better defend against them training programs

44:06.280 --> 44:13.500
increasingly include things like acoustic security modules teaching it staff to recognize

44:13.500 --> 44:17.980
things like vulnerable configurations and implement appropriate countermeasures

44:17.980 --> 44:23.780
user awareness helps to as employees who understand that their devices leak acoustic

44:23.780 --> 44:29.280
information are more likely to follow security procedures designed to mitigate those specific

44:29.280 --> 44:35.080
kind of risks some practical takeaways that i would say i found useful where like your devices

44:35.080 --> 44:40.720
are constantly generating acoustic emissions that contain information about their operations

44:40.720 --> 44:49.580
and your data every keystroke you type every kind of calculation that your processor performs every

44:49.580 --> 44:54.860
file your hard drive accesses produces these like unique sounds that could potentially be

44:54.860 --> 45:02.220
captured and analyzed by an attacker and like our kind of modern world of ubiquitous microphones

45:02.220 --> 45:09.440
and powerful analysis tools these acoustic emissions represent a real present

45:10.310 --> 45:16.520
threat and risk like the traditional kind of separation between physical security and

45:16.520 --> 45:22.280
digital security kind of no longer exists in any meaningful way with this attack vector the

45:22.280 --> 45:29.520
acoustic environment of your workspace kind of directly impacts your information security

45:29.520 --> 45:36.480
right like background noise levels like wall construction window placement and like the proximity

45:36.480 --> 45:44.180
of potential recording devices all affect your vulnerability to acoustic attacks in general

45:44.180 --> 45:51.940
so security planning has to consider those physical factors right alongside traditional

45:51.940 --> 45:58.240
digital concerns like simple precautions can significantly reduce your acoustic attack surface

45:59.580 --> 46:05.040
without necessarily requiring crazy expensive modifications these things were we're an example

46:05.040 --> 46:09.640
of that like using white noise generators or pink noise generators when handling sensitive

46:09.640 --> 46:14.200
information which are really cheap and get those on Amazon too or just get white noise on your

46:14.200 --> 46:19.060
phone and play it either too but it allows you to do things like if you're gonna handle

46:19.060 --> 46:24.860
sensitive data provide it like will provide some acoustic masking that makes attacks

46:24.860 --> 46:29.680
more difficult if you think that your threat model requires it right and being aware of your

46:29.680 --> 46:36.660
environment during important calls when typing passwords definitely helps your kind of identified

46:36.660 --> 46:43.360
potential recording threats that may exist positioning sensitive systems away from windows

46:43.360 --> 46:48.780
and public areas obviously reduces opportunity for acoustic surveillance and visual surveillance

46:48.780 --> 46:56.520
by the way understanding that air gap systems are not perfectly isolated helps you implement

46:56.520 --> 47:05.440
appropriate additions in terms of protections knowledge absolutely remains your most powerful

47:05.440 --> 47:10.540
defense which why i'm doing this crazy long video no basically data exists not just in

47:10.540 --> 47:17.440
digital form but also in physical phenomenon like our devices create and make sounds that

47:17.440 --> 47:22.700
create signatures and as our world fills with more and more powerful devices and more sensitive

47:22.700 --> 47:28.180
microphones the acoustic environment like those threats are not going away it becomes an increasingly

47:28.180 --> 47:34.560
rich source of potential information gathering and protecting against these attacks requires

47:34.560 --> 47:40.740
expanding our security thinking like beyond firewalls and passwords but to include the

47:40.740 --> 47:48.400
physical properties of like sound itself like the devices around us are always talking and in

47:48.400 --> 47:53.220
the world of acoustic security someone is always listening thank you for watching to the end

47:53.220 --> 47:57.500
if you like this video you found out some use out of it please like and subscribe and i will

47:57.500 --> 47:58.500
see you in the next video