-----------------------------------------------------------------------------
Real-World Video Dubbing Test Set

Version 1.0
August 30, 2022
-----------------------------------------------------------------------------


1. OVERVIEW

Considering the scarcity of real-world video dubbing dataset (i.e., motion pictures with golden cross-lingual
source and target speech), we construct a test set collected from dubbed films to provide comprehensive evaluations 
of video dubbing systems.

We select nine popular films translated from English to Chinese, which are of high manual translation
and dubbing quality, and contain rich genres such as love, action, scientific fiction, etc.


2. PROCESS

With the aim of preventing copyright infringement, we provide the processing pipeline of test set, instead of the original data set. 
The following is the processing pipeline for the test set.

2.1 Download Films
    In this work, we select nine popular films translated from English to Chinese, which are of high manual translation
    and dubbing quality, and contain rich genres such as love, action, scientific fiction, etc.

    Here is the list of films:
    ===============
    |
    .- Iron Man 1
    |
    .- Iron Man 2
    |
    .- Captain America 3
    |
    .- The Notebook
    |
    .- Flipped
    |
    .- About Time
    |
    .- Fast & Furious 1
    |
    .- Fast & Furious 2
    |
    .- A Walk in the Clouds

2.2 Cut conversation clips from films
    Following these criteria: 
        1) The clip duration is around 1 ∼ 3 minutes. 
        2) More than 10 sentences are involved in each clip, which contains both long and short sentences.
        3) The face of speaker is visible mostly during his or her talks, especially visible lips at the end of speech.
    Here are clips we cut from fils:
    ===============
    |
    .- Iron Man 1
        |
        .- 65:55 ~ 66:30
        |
        .- 69:55 ~ 70:38
    .- Iron Man 2
        |
        .- 12:12 ~ 13:04
        |
        .- 39:15 ~ 41:20
        |
        .- 43:32 ~ 34:19
        |
        .- 64:20 ~ 65:57
        |
        .- 110:42 ~ 111:28
    .- Captain America 3
        |
        .- 16:42 ~ 18:12
        |
        .- 19:02 ~ 20:16
        |
        .- 20:14 ~ 21:18
        |
        .- 22:08 ~ 23:35
        |
        .- 27:07 ~ 30:38
        |
        .- 29:59 ~ 30:40
        |
        .- 50: 43 ~ 52:37
        |
        .- 74:26 ~ 75:07
        |
        .- 77:15 ~ 78:23
        |
        .- 122:19 ~ 124:14
    .- The Notebook
        |
        .- 04:08 ~ 05:08
        |
        .- 08:52 ~ 10:31
    .- Flipped
        |
        .- 10:35 ~ 11:04
        |
        .- 12:05 ~ 13:14
        |
        .- 28:41 ~ 29:59
        |
        .- 34:30 ~ 36: 08
        |
        .- 36:58 ~ 38:00
        |
        .- 59:25 ~ 61:11
    .- About Time
        |
        .- 04:37 ~ 06:26
        |
        .- 09:06 ~ 10:06
        |
        .- 13:52 ~ 14:47
        |
        .- 15:17 ~ 16:15
        |
        .- 36:31 ~ 38:00
        |
        .- 62:06 ~ 63:15
    .- Fast & Furious 1
        |
        .- 35:08 ~ 36:02
        |
        .- 36:14 ~ 36:30
        |
        .- 37:15 ~ 37:49
        |
        .- 55:03 ~ 56:02
    .- Fast & Furious 2
        |
        .- 15:45 ~ 17:17
        |
        .- 36:07 ~ 37:22
        |
        .- 66:33 ~ 67:30
        |
        .- 69:40 ~ 70:30
    .- A Walk in the Clouds
        |
        .- 17:10 ~ 18:52
        |
        .- 30:55 ~ 32:17
        |
        .- 61:14 ~ 62:26

2.3 Automatic Speech Recognition (ASR)
    Perform speech recognition and manual correction to obtain text transcripts in source and target languages.
    We recommend using Microsoft Azure's ASR service (https://azure.microsoft.com/en-us/services/cognitive-services/speech-to-text/?cdn=disable#features), 
    which is highly accurate and efficient.

2.4 Split clips into sentences 
    Clips are split into sentences based on semantics and speakers. 
    Please discard silence frames between sentences to make sure there is no 
    more than 0.5s of silence at the beginning and end of each split.