ROS Package bob_vector_db

This package integrates a Vector DB into ROS which can be accessed via ROS topics.

Available Features

  • Simple ROS String topic JSON interface

  • Embed text into a Chroma or Qdrant Vector DB including additional payload data

  • Embed ROS sensor_msgs/msg/Image along with payload data into Qdrant Vector database

  • Configurable model for the embeddings

  • Multimodal vector embedding (Qdrant)

  • Query the DB and return the results

ROS Node EMBEDDER

This ROS node subscribes to an /embed String topic to receive JSON data with embedding messages. It embeds the delivered data into the configured Vector DB. A Qdrant Vector DB is the default DB, optionally a Chroma DB can be used.

By default Sentence Transformers all-MiniLM-L6-v2 model will be used to create the embeddings.

Related Qdrant links:

Related Chroma links:

Embedding data format

The JSON data which is received by the String topic has to contain the following fields. (for Qdrant the ids are optional)

Embed text

{
  "collection": "xfiles", 
  "documents": ["some story text","text about something strange"], 
  "metadatas": [{"title":"The end"}, {"title":"Dark star"}], 
  "ids": ["id1","id2"]
}

Embed text + image (Qdrant only)

{
  "collection": "movie_cover", 
  "documents": ["some story text","text about something strange"], 
  "metadatas": [{"title":"The end"}, {"title":"Dark star"}], 
  "images": ["/path/to/image_id1.jpg","/path/to/image_id2.jpg"],
  "ids": ["id1","id2"]
}

Image embedding environment variables

Enable / disable automatic storing of the image in Base64 format into payload data:

EMBED_IMAGES_BASE64 Default: 0

Used image format when embedding from sensor_msgs/msg/Image or bob_msgs/msg/STTImage:

EMBED_PIL_B64FORMAT Default: JPEG

Usage

# in order to work some of the below examples expects a running Qdrant or Chroma Vector DB

# start embedder node, by default use Qdrant Vector DB running on localhost
ros2 run bob_vector_db embedder

# start embedder node using Chroma Vector DB running on localhost
EMBED_USE_CHROMA=1 Ros2 run bob_vector_db embedder

# run in debug mode, this will also set Python logging to level DEBUG
ros2 run bob_vector_db embedder --ros-args --log-level DEBUG

# this works without a Qdrant server
ros2 run bob_vector_db embedder --ros-args -p path:=/home/ros/qdrant_data

# this works without a Chroma server
ros2 run bob_vector_db embedder --ros-args -p path:=/home/ros/chroma_data -p use_chroma:=true

# Publish from shell an embed order with a single item.
ros2 topic pub --once embed std_msgs/msg/String 'data: "{\"collection\":\"xfiles\", \"documents\":[\"Bobs ROS nodes are a collection of NLP and LLM tools for ROS\"], \"metadatas\": [{\"author\":\"bob\"}], \"ids\":[\"id1\"]}"'

# Embed one or more image together with the text representation
# When embedding images vectors for both, the text and the image, are produced and stored into the Qdrant DB
ros2 topic pub --once embed std_msgs/msg/String 'data: "{\"collection\":\"image_data\", \"documents\":[\"An animal from the animal farm in pink\"], \"images\": [\"/home/ros/ros2_ws/img/images/piglet.jpg\"], \"metadatas\": [{\"author\":\"bob\"}]}"'

# Start image/text embedder together with a topic terminal to enter image embed messages manually
# When setting EMBED_IMAGES_BASE64=1 in addtition to the path the image is stored in BASE64 format into payload key `image_base64`
# this works only with a Qdrant database
EMBED_IMAGES_BASE64=1 ros2 launch bob_vector_db embedder.launch.py terminal:=true

Node Parameter

Parameter name: default_collection
Type: string
Description: Default collection to be used if embedding from embed_raw or embed_image topic. Environment variable EMBED_DEFAULT_COLLECTION. Default: embed_raw

Parameter name: host
Type: string
Description: Vector DB host name or ip address. Environment variable EMBED_HOST. Default: localhost

Parameter name: location
Type: string
Description: Vector DB location. Can be empty or url. This parameter is only used by Qdrant. Environment variable EMBED_LOCATION. Default: ‘’

Parameter name: model
Type: string
Description: To be used embedding model. For Chroma currently only text embedding is available. When using Qdrant and to embed images a text and a image model can be provided separated by a space. Supported models: Qdrant/clip-ViT-B-32-text Qdrant/clip-ViT-B-32-vision. Environment variable EMBED_MODEL. Default: ‘’

Parameter name: path
Type: string
Description: Vector DB local path if using persistent storage. Environment variable EMBED_PATH. Default: ‘’

Parameter name: port
Type: integer
Description: Vector DB port. Environment variable EMBED_PORT. Default: 6333 or 8000 (chroma db)

Parameter name: use_chroma
Type: boolean
Description: Use Chroma Vector DB instead of a Qdrant Vector DB. Environment variable EMBED_USE_CHROMA. Default: 0

Subscribed Topics

~embed (std_msgs/msg/String)
Incoming JSON string with the embedding data.

~embed_raw (std_msgs/msg/String)
Incoming String with raw data.

~embed_image (sensor_msgs/msg/Image)
Incoming ROS Image to embed.

~embed_ttimage (bob_msgs/msg/TTImage)
Incoming TTImage with payload to embed.