Skip to main content
POST
/
v1
/
media
/
translate
cURL
curl --request POST \
  --url https://api.vozo.ai/v1/media/translate \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "media_type": "video",
  "media_url": "<string>",
  "source_language": "<string>",
  "target_language": "<string>",
  "export_type": "auto",
  "subtitle_url": "<string>",
  "subtitle_usage_type": "original",
  "ocr_text_box": {
    "x": 0.5,
    "y": 0.5,
    "width": 0.5,
    "height": 0.5
  },
  "speaker_number": {
    "min": 2,
    "max": 34
  },
  "user_prompt": "<string>",
  "callback_url": "<string>"
}
'
{
  "task_id": "<string>"
}

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <api_key>, where <api_key> is your API Key. The token in the sample code is referring to this api_key as well.

Body

application/json
media_type
enum<string>
required

Media type of input: video or audio

Available options:
video,
audio
media_url
string
required

Publicly accessible URL of input media (video/audio) for the server to download. Maximum supported input duration is 2 hours.

source_language
string
required

Source language code. Supports 'auto' or language codes such as en-US.

target_language
string
required

Target language code. Locale is optional (e.g., en, en-US, zh, zh-CN). Only the language part is validated against the supported list.

export_type
enum<string>
required

Output type. auto: match input type; video: output video; audio: output audio; all: both video and audio. When media_type=audio, export_type cannot be video or all.

Available options:
auto,
video,
audio,
all
subtitle_url
string

Publicly accessible URL of subtitle file (SRT only). Server will download and validate timestamps (non-overlap, start<=end, within media duration, total <=2h). Mutually exclusive with ocr_text_box.

subtitle_usage_type
enum<string>

Whether the provided subtitle is original or final translated

Available options:
original,
final_translated
ocr_text_box
object

OCR text region in normalized coordinates [0–1], origin at top-left. Only applicable when media_type=video. Mutually exclusive with subtitle_url.

speaker_number
object

Specify speaker count range or leave empty for auto

user_prompt
string

Optional translation or dubbing prompt to guide style

callback_url
string

Webhook callback URL for status changes

Response

Job queued

task_id
string
required

ID of the task