American Hardwood said:
I don't know how this software works, but wouldn't the longer a clip be, the less likely it would say it is a match?
No, it is likely the opposite. The longer the audio sample, the more accurately the model should be to predict if it is AI generated or not. It is similar to any other prediction made from data: generally, the more data one has, the more certainty one can have in any inferences produced from that data.
American Hardwood said:
Also, could you run an audio clip through a filter to change it up slightly, not enough to hear much difference, but enough to throw off the match detection of the analytical software and make it report that it is unlikely to have been faked? Could the same be done in reverse to make it appear more likely to be faked?
This is the link to ElevenLabs page on the AI Speech Classifier (
https://elevenlabs.io/blog/ai-speech-classifier). They claim the following on the accuracy of their model:
Quote:
Audio generated by our models has certain detectable characteristics. When you upload an audio sample to the AI Speech Classifier, our algorithm will scan for them to assess whether the content was indeed generated by our platform, currently maintaining >99% accuracy if the input was unmodified. If it underwent Codec or reverb transformations, the Classifier is over 90% accurate. This figure drops the more the content has been post-processed. Adding more audio tracks will also affect the result.
The model is a proprietary black-box, and they provide no further information on their validation methods, so I would take the above statement with a huge grain of salt. Further, the tool only claims to estimate the probability that the speech is generated from their own speech generation model, so who knows how this model would perform for more general detection of AI generated speech. I think that the model produces two wildly different predictions from different lengths of the same clip also indicates the tool is probably not as reliable as they claim.