Over-the-air (OTA) Attack Accuracy (Table 5)

Environment

We utilize the following ASRs
- Amazon Alexa - Amazon Echo Dot (3rd Gen)
- Google Assistant - Google Pixel 4
- Microsoft Cortana - Lenovo ThinkPad X1 Carbon
Perform experiments under different environments
- Household (noise: 15-20 dB) – ASR on a table, Logitech Z200 speaker plays adversarial audio
- Teleconference (noise ≈ Household) – ASR is beside the victim’s laptop, and the laptop plays adversarial audio
- Vehicle (noise: 60 dB) – ASR (Android Auto / Alexa Auto) is near center console, adversarial audio is played via car speaker (Kenwood KFC-1666S)
Select 6 commands for each environment (i.e., total 6 * 3 = 18 commands)
- Two short, two medium and two long commands
For each of 3 wake-up words (i.e., Okay Google, Alexa, and Hey Cortana), test 18 commands against the corresponding ASR

Construct 3 wake-up adversary audio files.
- Adversarial Wake-up Word
For each command, we perform 10 trials of attacks with the adversarial audio (i.e., 18x3x10=540 trials). An adversarial command may generate multiple adversarial audio files with different playback speeds.