ROS Package bob_sd35

A ROS2 package for image generation using Stable Diffusion 3.5 Large.

This package provides two nodes:

  • itti - Image & Text to Image generation (img2img)

  • tti - Text to Image generation (text2img)

Both nodes automatically download the SD3.5 model from HuggingFace on first run and support CPU offloading for systems with limited VRAM.

Note: The SD3.5 Large model is approximately 67 GB in size. The initial download will take a significant amount of time depending on your internet connection.

Dependencies

System Dependencies

  • ROS2 Humble

  • ros-humble-cv-bridge

  • ros-humble-image-transport

Python Dependencies

Install via pip:

pip install -r requirements.txt

Installation and Building

Note: It is recommended to use a Python virtual environment for installing the Python dependencies to avoid conflicts with system packages.

# Clone the repository
cd ~/ros2_ws/src
git clone https://github.com/bob-ros2/bob_sd35.git

# Install Python dependencies (preferably in a virtual environment)
pip install -r bob_sd35/requirements.txt

# Build the package
cd ~/ros2_ws
colcon build --packages-select bob_sd35

# Source the workspace
source install/setup.bash

Nodes

This package provides two nodes for different use cases:

Node

Description

itti

Image & Text to Image - Transforms an input image based on a text prompt using img2img pipeline. Requires input_image.

tti

Text to Image - Generates images purely from text prompts using text2img pipeline. No input image required.

Usage

ITTI Node (Image-to-Image)

One-shot generation:

ros2 run bob_sd35 itti --ros-args \
  -p model_path:=/path/to/models/stable-diffusion-3.5-large \
  -p input_prompt:="a beautiful sunset" \
  -p input_image:=/path/to/input.jpg \
  -p output_image:=auto \
  -p once:=true

Continuous mode (topic-based):

ros2 run bob_sd35 itti --ros-args \
  -p model_path:=/path/to/models/stable-diffusion-3.5-large \
  -p input_image:=/path/to/default_input.jpg \
  -p output_image:=auto

Then publish prompts:

# Plain text prompt (uses input_image parameter)
ros2 topic pub --once --keep-alive 1.0 /input_prompt std_msgs/msg/String "data: 'forest background'"

# JSON with image_url (ITTI only - overrides input_image parameter)
ros2 topic pub --once --keep-alive 1.0 /input_prompt std_msgs/msg/String \
  "data: '{\"role\": \"user\", \"content\": \"cosmic background\", \"image_url\": \"file:///path/to/image.jpg\"}'"

TTI Node (Text-to-Image)

One-shot generation:

ros2 run bob_sd35 tti --ros-args \
  -p model_path:=/path/to/models/stable-diffusion-3.5-large \
  -p input_prompt:="a spacecraft in deep space" \
  -p output_image:=spacecraft.png \
  -p once:=true

Continuous mode (topic-based):

ros2 run bob_sd35 tti --ros-args \
  -p model_path:=/path/to/models/stable-diffusion-3.5-large \
  -p output_image:=auto

Then publish prompts:

ros2 topic pub --once --keep-alive 1.0 /input_prompt std_msgs/msg/String "data: 'a beautiful mountain landscape'"

Subscribed Topics

Topic

Type

Description

input_prompt

std_msgs/String

Text prompt to trigger image generation. For itti: accepts plain text or JSON with role, content, and image_url keys. For tti: plain text only.

Published Topics

Topic

Type

Description

generated_image

sensor_msgs/Image

Generated image published when subscribers are connected.

ROS Parameters

Both nodes share most parameters. Parameters marked with (itti only) are only available in the itti node.

Parameter

Type

Default

Description

model_repo

string

stabilityai/stable-diffusion-3.5-large

HuggingFace model repository ID.

model_path

string

./models/stable-diffusion-3.5-large

Path to the local model directory.

input_prompt

string

""

Initial prompt for generation.

input_image

string

""

(itti only) Initial input image path.

output_image

string

output.png

Output image path. If ending with auto, generates unique filenames.

strength

double

0.6

(itti only) Denoising strength (0.0-1.0). Low=keep original, high=more change.

guidance_scale

double

4.5

Classifier-free guidance scale.

negative_prompt

string

""

Negative prompt to exclude unwanted elements from generation.

num_inference_steps

int

40

Number of inference steps.

once

bool

false

If true, shuts down after first generation.

cpu_offload

bool

true

Enable model CPU offload to save VRAM.

keep_model_loaded

bool

false

Keep model loaded between generations. Set to true to save loading time.

keep_alive

double

1.0

Seconds to wait before shutdown when once is true, allowing consumers to receive messages.

seed

int

-1

Random seed for reproducibility. Use -1 for random seed each generation.

Environment Variables

All parameters can also be set via environment variables with prefix ITTI_ for the itti node and TTI_ for the tti node (e.g., ITTI_MODEL_PATH, TTI_GUIDANCE_SCALE).