# ROS Package [bob_sd35](https://github.com/bob-ros2/bob_sd35)

A ROS2 package for image generation using Stable Diffusion 3.5 Large.

This package provides two nodes:
- **itti** - Image & Text to Image generation (img2img)
- **tti** - Text to Image generation (text2img)

Both nodes automatically download the SD3.5 model from HuggingFace on first run and support CPU offloading for systems with limited VRAM.

> **Note:** The SD3.5 Large model is approximately **67 GB** in size. The initial download will take a significant amount of time depending on your internet connection.

## Dependencies

### System Dependencies
- ROS2 Humble
- ros-humble-cv-bridge
- ros-humble-image-transport

### Python Dependencies
Install via pip:
```bash
pip install -r requirements.txt
```

## Installation and Building

> **Note:** It is recommended to use a Python virtual environment for installing the Python dependencies to avoid conflicts with system packages.

```bash
# Clone the repository
cd ~/ros2_ws/src
git clone https://github.com/bob-ros2/bob_sd35.git

# Install Python dependencies (preferably in a virtual environment)
pip install -r bob_sd35/requirements.txt

# Build the package
cd ~/ros2_ws
colcon build --packages-select bob_sd35

# Source the workspace
source install/setup.bash
```

## Nodes

This package provides two nodes for different use cases:

| Node | Description |
|------|-------------|
| `itti` | **Image & Text to Image** - Transforms an input image based on a text prompt using img2img pipeline. Requires `input_image`. |
| `tti` | **Text to Image** - Generates images purely from text prompts using text2img pipeline. No input image required. |

## Usage

### ITTI Node (Image-to-Image)

**One-shot generation:**
```bash
ros2 run bob_sd35 itti --ros-args \
  -p model_path:=/path/to/models/stable-diffusion-3.5-large \
  -p input_prompt:="a beautiful sunset" \
  -p input_image:=/path/to/input.jpg \
  -p output_image:=auto \
  -p once:=true
```

**Continuous mode (topic-based):**
```bash
ros2 run bob_sd35 itti --ros-args \
  -p model_path:=/path/to/models/stable-diffusion-3.5-large \
  -p input_image:=/path/to/default_input.jpg \
  -p output_image:=auto
```

Then publish prompts:
```bash
# Plain text prompt (uses input_image parameter)
ros2 topic pub --once --keep-alive 1.0 /input_prompt std_msgs/msg/String "data: 'forest background'"

# JSON with image_url (ITTI only - overrides input_image parameter)
ros2 topic pub --once --keep-alive 1.0 /input_prompt std_msgs/msg/String \
  "data: '{\"role\": \"user\", \"content\": \"cosmic background\", \"image_url\": \"file:///path/to/image.jpg\"}'"
```

### TTI Node (Text-to-Image)

**One-shot generation:**
```bash
ros2 run bob_sd35 tti --ros-args \
  -p model_path:=/path/to/models/stable-diffusion-3.5-large \
  -p input_prompt:="a spacecraft in deep space" \
  -p output_image:=spacecraft.png \
  -p once:=true
```

**Continuous mode (topic-based):**
```bash
ros2 run bob_sd35 tti --ros-args \
  -p model_path:=/path/to/models/stable-diffusion-3.5-large \
  -p output_image:=auto
```

Then publish prompts:
```bash
ros2 topic pub --once --keep-alive 1.0 /input_prompt std_msgs/msg/String "data: 'a beautiful mountain landscape'"
```

## Subscribed Topics

| Topic | Type | Description |
|-------|------|-------------|
| `input_prompt` | `std_msgs/String` | Text prompt to trigger image generation. For **itti**: accepts plain text or JSON with `role`, `content`, and `image_url` keys. For **tti**: plain text only. |

## Published Topics

| Topic | Type | Description |
|-------|------|-------------|
| `generated_image` | `sensor_msgs/Image` | Generated image published when subscribers are connected. |

## ROS Parameters

Both nodes share most parameters. Parameters marked with **(itti only)** are only available in the `itti` node.

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `model_repo` | string | `stabilityai/stable-diffusion-3.5-large` | HuggingFace model repository ID. |
| `model_path` | string | `./models/stable-diffusion-3.5-large` | Path to the local model directory. |
| `input_prompt` | string | `""` | Initial prompt for generation. |
| `input_image` | string | `""` | **(itti only)** Initial input image path. |
| `output_image` | string | `output.png` | Output image path. If ending with `auto`, generates unique filenames. |
| `strength` | double | `0.6` | **(itti only)** Denoising strength (0.0-1.0). Low=keep original, high=more change. |
| `guidance_scale` | double | `4.5` | Classifier-free guidance scale. |
| `negative_prompt` | string | `""` | Negative prompt to exclude unwanted elements from generation. |
| `num_inference_steps` | int | `40` | Number of inference steps. |
| `once` | bool | `false` | If true, shuts down after first generation. |
| `cpu_offload` | bool | `true` | Enable model CPU offload to save VRAM. |
| `keep_model_loaded` | bool | `false` | Keep model loaded between generations. Set to true to save loading time. |
| `keep_alive` | double | `1.0` | Seconds to wait before shutdown when `once` is true, allowing consumers to receive messages. |
| `seed` | int | `-1` | Random seed for reproducibility. Use -1 for random seed each generation. |

## Environment Variables

All parameters can also be set via environment variables with prefix `ITTI_` for the itti node and `TTI_` for the tti node (e.g., `ITTI_MODEL_PATH`, `TTI_GUIDANCE_SCALE`).