ROS Package bob_q3tts
A ROS 2 wrapper for the Qwen3-TTS model, providing high-fidelity, low-latency text-to-speech with streaming aggregation and voice cloning capabilities. It also includes a Qt-based GUI for real-time parameter tuning.
Quick Start
Launch the TTS Service:
ros2 run bob_q3tts tts
Open the Parameter GUI:
ros2 run bob_q3tts gui
Docker Usage
Using Docker Compose (Recommended)
docker-compose build
docker-compose up
Using Docker CLI
Build the Image:
docker build -t bob_q3tts .
Run the Node (with GPU and Audio):
docker run -it --rm \ --gpus all \ --device /dev/snd \ -e Q3TTS_MODEL_DIR=/models \ -e ROS_DOMAIN_ID=99 \ -v /blue/dev/q3tts/models:/models \ --network host \ --ipc host \ bob-q3tts:latest
Troubleshooting Audio
If you hear no sound or see “Invalid sample rate” errors (common with HDMI/GPU audio):
List Devices (inside container):
python3 -c "import sounddevice as sd; print(sd.query_devices())"
Set Device: Find the index or name (e.g.,
HDA NVidia: HDMI 0 (hw:2,3)) and set theaudio_deviceparameter.Force Resampling: If your hardware only supports 48kHz, set
target_sample_rateto48000.
Example:
ros2 param set /tts audio_device "HDA NVidia: HDMI 0 (hw:2,3)"
ros2 param set /tts target_sample_rate 48000
ROS API
Topics
Name |
Type |
Direction |
Description |
|---|---|---|---|
|
|
Subscriber |
Incoming text. Aggregated and synthesized at sentence boundaries. |
|
|
Publisher |
The text currently being spoken. Published right before playback. |
Parameters
The node uses static configuration for initialization and dynamic parameters for per-sentence tuning.
Core Configuration (Static)
Parameter |
Type |
Description |
|---|---|---|
|
|
The Hugging Face model ID. Env: |
|
|
Local directory for model caching. Env: |
|
|
Characters that trigger synthesis. Env: |
|
|
Timeout in ms to flush buffer without delimiter. Env: |
Generation Settings (Dynamic)
Parameter |
Type |
Description |
|---|---|---|
|
|
Speech language. Env: |
|
|
Enable sampling for Stage 1. Env: |
|
|
Sampling temperature for Stage 1. Env: |
|
|
Nucleus sampling threshold. Env: |
|
|
Top-k sampling limit. Env: |
|
|
Penalty for repeated sounds. Env: |
Subtalker Settings (Dynamic)
Parameter |
Type |
Description |
|---|---|---|
|
|
Enable sampling for Stage 2. Env: |
|
|
Temperature for acoustic texture. Env: |
|
|
Nucleus sampling for Stage 2. Env: |
|
|
Top-k for Stage 2. Env: |
Voice Clone / ICL (Dynamic)
Parameter |
Type |
Description |
|---|---|---|
|
|
Path to reference |
|
|
Transcript or path to transcript file. Reading from file enables dynamic updates. Env: |
Output & Storage (Dynamic)
Parameter |
Type |
Description |
|---|---|---|
|
|
Enable/disable audio playback. Env: |
|
|
Player: |
|
|
Device ID or name for sounddevice. Env: |
|
|
Prefix for saving audio files. Env: |
|
|
Starting index for file naming. Env: |