Building a RAG Pipeline with SQL Server Vector Search

Recently, I worked on setting up a RAG pipeline using SQL Server as the vector database. The main goal was to test how practical SQL Server is for storing embeddings, searching similar content, and integrating AI features into an existing production-style database.

The most interesting part of the experience was not the AI model itself, but how SQL Server handled the vector side of the system.

Running SQL Server in Docker

I started by running SQL Server 2025 inside Docker. The setup was mostly straightforward, but I faced an issue where the container kept exiting during startup.

After checking the logs, the problem was related to CPU topology. SQL Server was crashing with an assertion error. The fix was to limit the CPUs exposed to the container:

--cpuset-cpus="0-3"

I also configured the container with a persistent Docker volume:

-v sql2025data:/var/opt/mssql

This is important because without a volume, removing the container can remove the database files. For any real setup, persistence should be configured from the start.

Using SQL Server as a Vector Store

The core table I created was a simple RagChunks table:

CREATE TABLE dbo.RagChunks
(
    id BIGINT IDENTITY(1,1) PRIMARY KEY CLUSTERED,

    source_table NVARCHAR(128) NOT NULL,
    source_pk NVARCHAR(128) NOT NULL,

    title NVARCHAR(500) NULL,
    chunk_text NVARCHAR(MAX) NOT NULL,

    metadata_json NVARCHAR(MAX) NULL,

    embedding VECTOR(1024) NOT NULL,

    source_updated_at DATETIME2 NULL,
    indexed_at DATETIME2 NOT NULL DEFAULT SYSUTCDATETIME()
);

Matching Vector Dimensions

A key SQL Server detail is that the VECTOR column dimension must match the embedding model output.

At first, I tested with 384-dimensional vectors:

embedding VECTOR(384)

Later, I switched to a model that returns 1024-dimensional vectors, so the table had to change to:

embedding VECTOR(1024)

If the dimensions do not match, inserts and searches will fail. This means the embedding model choice affects the database schema directly.

I also learned that vectors from different embedding models should not be mixed in the same column. Even if two models both return 1024 dimensions, the vectors are not necessarily compatible. If the model changes, the safest approach is to truncate or rebuild the indexed content and re-index everything using the new model.

Inserting Vectors into SQL Server

When inserting vectors from code, the embedding is usually sent as a JSON array. One issue I faced was that SQL Server rejected the conversion because the ODBC driver treated the value as ntext.

The fix was to explicitly cast the parameter to NVARCHAR(MAX) before converting it to VECTOR:

DECLARE @embedding_json NVARCHAR(MAX) = CAST(? AS NVARCHAR(MAX));

INSERT INTO dbo.RagChunks
    (source_table, source_pk, title, chunk_text, metadata_json, embedding, source_updated_at)
VALUES
    (?, ?, ?, ?, ?, CAST(@embedding_json AS VECTOR(1024)), ?);

Searching with VECTOR_DISTANCE

For the first version, I used VECTOR_DISTANCE():

DECLARE @query_json NVARCHAR(MAX) = CAST(? AS NVARCHAR(MAX));
DECLARE @q VECTOR(1024) = CAST(@query_json AS VECTOR(1024));

SELECT TOP (5)
    id,
    source_table,
    source_pk,
    title,
    chunk_text,
    metadata_json,
    VECTOR_DISTANCE('cosine', embedding, @q) AS distance
FROM dbo.RagChunks
ORDER BY distance;

This performs exact similarity search. It is simple, predictable, and very useful during development.

VECTOR_DISTANCE vs VECTOR_SEARCH

I also looked into VECTOR_SEARCH() and vector indexes.

The difference is simple:

VECTOR_DISTANCE() = exact search, scans and ranks vectors
VECTOR_SEARCH()   = approximate indexed search, better for larger datasets

For larger production datasets, VECTOR_SEARCH() with a vector index becomes more important. However, SQL Server requires enough vector rows before creating a vector index, and there are extra considerations around index maintenance and filtering.

Keeping RAG in the Same Production Database

One decision I made was to keep the RAG table in the same SQL Server database as the existing application, instead of creating a separate database.

For this use case, that felt practical because:

- the application already uses SQL Server
- backups and deployment are already handled
- source records and RAG chunks are close together
- integration with the existing CMS is simpler

The tradeoff is that vector search and indexing jobs can add load to the production database. So the API should be careful with permissions, batching, and query performance.

I would not use sa or a highly privileged user for the RAG API. A restricted SQL user should only have the permissions it actually needs.

Updating Content

For content updates, the simplest approach is:

DELETE FROM dbo.RagChunks
WHERE source_table = @source_table
  AND source_pk = @source_pk;

Then insert the new chunks for that source.

This keeps the index clean and avoids old versions of the same content staying in the table.

For unpublished or deleted CMS content, the same delete approach can be used to remove it from the RAG index.

Final Takeaway

This experience showed me that SQL Server can be used as more than a traditional relational database. With the new vector support, it can also act as the storage and retrieval layer for AI-powered features.

The biggest SQL Server lessons were:

- VECTOR dimension must match the embedding model
- store original text and vector together
- vectors cannot be converted back to text
- cast JSON embeddings carefully when inserting through ODBC
- VECTOR_DISTANCE is best for starting and debugging
- VECTOR_SEARCH is better later for larger datasets
- persistence matters when running SQL Server in Docker

Overall, the most valuable part was learning how to connect modern AI workflows with an existing SQL Server-based system without introducing a completely separate vector database.

Building a RAG Pipeline on SQL Server: What I Learned