How it works
1
Host your assets at a publicly accessible URL
Upload your video, photo, and audio files so our servers can retrieve them.
2
Send an API request with the appropriate parameters
Reference your hosted assets and specify your desired mode (Standard or
Precision).
3
Wait or query status
Use our webhook callback or poll the API with your job ID until processing
is complete.
4
Download video output
Retrieve the finished talking photo or lip‑synced video from the provided
URL.
Usage Limitation:
- You may have up to 10 concurrent jobs (including queued requests).
- Only single‑face videos or photos are supported.
- Estimated queue time: 1–120 minutes, depending on system load.
- Standard Mode processing time: ~10 minutes.
- Precision Mode processing time: ~20 minutes.
If a video or photo contains multiple faces, only the largest detected face
will be lip‑synced.
API Error Codes
| Code | Description |
|---|---|
| 5 | Invalid request parameters. |
| 7 | No permission to request. |
| 104 | Insufficient credits. |
| 814 | Your account is not a member and is not allowed to call the API. |
| 1000 | Internal Server Error. |
| 1301 | Callback Challenge failed. |
| 1302 | API key has been revoked. |
| 1304 | API key has reached the maximum number of concurrent requests. |
| 1502 | Your audio driver is either invalid or cannot be downloaded. |
| 1503 | Your account is not authorized to call the API. |
| 1305 | Only business plan is allowed. |
Job Error Codes
| Code | Description |
|---|---|
| 999 | Failed to download the file. |
| 20403 | Not enough faces. |
| 20407 | The number of face tracks is too many. |
| 20408 | The image-to-video facial detection has not been passed. |
| 20601 | There are no faces in the picture. |
| 20602 | Unknown image format. |
| 20611 | Video triggering flow limit. |
| 20613 | Generate video input sensitive to images. |