[Home]
Human Comprehensibility Test (Section 6.3)
Environment
- We recruit 28 volunteers
- We prepare the following adversarial audio files
- 3 adversarial wake-up words
- 6 adversarial commands for each environment
- 3 normal commands
- The order in which audio files were played to the participants is randomized
- To mitigate any subjective effects, no voice command was disclosed to the participants
Experiment Procedure
- Step 1. Play adversarial audio files to each participant
- Step 2. Each participant is asked to indicate whether she or he had identified any meaning in the audio
- Step 3. Measure Word Error Rate
}{N},)
where N denotes the total number of words in the command, S, D and I represent the respective numbers of word substitutions, deletions, and insertions
- Step 4. Measure Phoneme Error Rate (PER) = Phonological distance between the recognized and target command / length of the phoneme sequence of the target command
Result
- None could comprehend any adversarial audio file
- The experiment result in Section 6.3 indicates that WERs and PERs are consistently above 0.5, and more than half are greater than or equal to 1