Translating Multilingual Audio into Simplified Chinese and Saving to a Text File with Python

Introduction

In many real-world scenarios, you might have an audio file containing multiple languages (English, Japanese, Chinese, etc.) and want to translate the entire content into Simplified Chinese.

In this article, we’ll walk through how to build a Python script that automatically:

  • Transcribes multilingual audio using OpenAI’s Whisper model
  • Translates the transcribed text into Simplified Chinese
  • Saves the final translation into a .txt file

This tutorial targets:

  • Developers or researchers working with multilingual datasets
  • Anyone who wants an automated, script-based solution for audio translation
Tool Purpose
Whisper To transcribe audio into text
Deep Translator To translate text from English to Simplified Chinese
ffmpeg To handle audio file conversions
Python Scripting and automation

Install the required Python packages with:

# Install Whisper and its dependencies
pip install -U openai-whisper

# Install PyTorch with MPS support (for Apple Silicon GPUs)
pip install torch torchvision torchaudio

# Install Deep Translator (for text translation)
pip install deep-translator

# (Optional but recommended) Install setuptools-rust
pip install setuptools-rust

# Also make sure ffmpeg is installed (system package, not pip)
brew install ffmpeg

:white_check_mark: How to check if MPS is available

import whisper
import torch

# Load model
model = whisper.load_model("turbo")

# Move model to MPS (Mac GPU)
if torch.backends.mps.is_available():
    model = model.to('mps')

# Transcribe or translate
result = model.transcribe("audio.mp3", task="transcribe")
print(result["text"])

Process Overview

The overall approach follows two main steps:

  1. Transcribe the multilingual audio into English text using Whisper.
  2. Translate the English text into Simplified Chinese using Deep Translator.

This two-step method ensures high consistency and accuracy even when the original audio has mixed languages.

import whisper
from deep_translator import GoogleTranslator

# 1. Load the Whisper turbo model
model = whisper.load_model("turbo")

# 2. Specify your audio file
audio_file = "audio.mp3"

# 3. Transcribe the audio to English text
result = model.transcribe(
    audio_file,
    task="transcribe"  # "transcribe" instead of "translate" to keep original text
)

# 4. Extract the recognized text
original_text = result["text"]
print("🎧 Original transcribed text:")
print(original_text)

# 5. Translate English text to Simplified Chinese
translated_text = GoogleTranslator(source='en', target='zh-CN').translate(original_text)

# 6. Save the translated text to a file
with open("outputfile.txt", "w", encoding="utf-8") as f:
    f.write(translated_text)

print("✅ Done! The output has been saved to outputfile.txt.")

Summary

By combining Whisper and Deep Translator,
you can automate the entire process of transcribing multilingual audio and translating it into Simplified Chinese,
all with just a small Python script.

On Apple Silicon Macs (M1/M2/M3), Whisper can leverage the MPS backend for GPU acceleration,
providing extremely fast performance.

For even higher accuracy or more complex projects, you might consider:

  • Using the Whisper large model
  • Switching to DeepL API or OpenAI GPT for professional-level translation