For audio and video transcripts, as an optimization, the matching algorithm could remove most of the transcript that is not within, say, ~ 1 minute of the specified time. This would avoid false matches from earlier in the transcript and allow the the system to use less restrictive matches of bad translations.
Citing Text Example:
a Wizard of Oz moment
But it Needs to match actual transcript:
- a wizard of oddz moment
Match Sensitivity
In this case, the system matches the quote properly, but one can imagine a situation where it does not match with the full context, but would match for a shorter excerpt of transcript with looser matching rules.
Transcript: 30 seconds of proximity
By narrowing down the transcript from the full transcript (saved), the sensitivity settings can be loosened, to match:
- oddz ~ Oz
alsoconfirmed what has been supposedly one of Donald Trump’s most outrageous uh you know lies or
exaggerations it’s fake news okay well if you are willing to be objective and
look at the evidence in this case and not give them a pass because our democracy is threatened you have to say
oh now I know not only that it’s fake but the exact methods by which they fake
it yeah yeah crazy it’s a wizard of oddz moment it really is yep you know the
curtain was pulled back absolutely so our story for this week uh
is a classic one of the last stories that Tolstoy ever wrote called after the ball written in 1903 and published
posthumously in 1911 and it’s a short but very
powerful uh beautiful dark story