Building a Personal RAG Chatbot with Laravel and pgvector

I’ve been diving into RAG (Retrieval-Augmented Generation) lately—a technique that ensures an AI model doesn't just guess based on its training data, but actually "looks up" information from a specific knowledge base before answering. By retrieving relevant text and passing it as context, the model provides grounded, factual responses based on documents I control.

The Stack

To get this running, I built a pipeline using:

Laravel — Handles the heavy lifting for routing, form processing, and the admin panel where I manage my documents.
PostgreSQL + pgvector — The heart of the retrieval system, storing text chunks alongside their vector embeddings to enable high-speed similarity searches.
Cloudflare Workers AI — Provides both the embedding model (bge-base-en-v1.5) and the final text generation.
Custom Text Chunker — A dedicated script that splits my documents into overlapping chunks to ensure no context is lost during indexing.

How It Works

Ingestion: I paste a document (skills, projects, or background) into the admin panel.
Embedding: The text is split into chunks; each chunk is converted into a vector and stored in PostgreSQL.
Querying: When you ask a question, that query is embedded using the same model.
Retrieval: pgvector runs a cosine similarity search to find the most relevant chunks in the database.
Generation: Those chunks are sent to the LLM as context, which then generates a factual answer.

Storing Vectors in PostgreSQL

pgvector adds a native vector column type to PostgreSQL. Inserting an embedding requires casting the array to the vector type explicitly — standard Eloquent won't handle this, so i used a raw insert:

  DB::insert(                                                                                         
      'INSERT INTO document_chunks
          (document_id, chunk_index, content, embedding, created_at, updated_at)                    
       VALUES (?, ?, ?, ?::vector, now(), now())',                                                   
       [                                                                                 
          $document->id,                                                           
          $chunk['index'],                                                               
          $chunk['content'],
          '[' . implode(',', $embedding) . ']',
      ]                                                                   
  );

The ?::vector cast is what makes pgvector accept the array. Miss it and you get a type error.

The Similarity Search Query

At query time, I embed the user's question and run a cosine similarity search using the <=> operator (pgvector's cosine distance). The score is 1 - distance, so higher is better:

  $results = DB::select('
    SELECT 
        document_chunks.content, 
        documents.title AS document_title,
        1 - (document_chunks.embedding <=> ?::vector) AS score
    FROM document_chunks
    INNER JOIN documents ON documents.id = document_chunks.document_id
    WHERE documents.status = ?
    ORDER BY document_chunks.embedding <=> ?::vector
    LIMIT ?
', [$vector, 'indexed', $vector, 5]);

The <=> operator is specific to pgvector and not something standard Laravel query builder knows about — another reason raw SQL is the right call here.

The Chunking Strategy

Before storing anything, text goes through a chunker that splits it into overlapping segments. Here is the core logic:

public function chunk(string $text): array
{
    $words = preg_split('/\s+/', trim($text), -1, PREG_SPLIT_NO_EMPTY);
    $chunks = [];
    $i = 0;

    while ($i < count($words)) {
        $slice = array_slice($words, $i, $this->chunkSize);
        $chunks[] = [
            'index' => count($chunks),
            'content' => implode(' ', $slice),
            'token_estimate' => count($slice),
        ];
        $i += $this->chunkSize - $this->overlap;
    }

    return $chunks;
}

The overlap ensures that a sentence split across two chunk boundaries still appears fully in at least one chunk — which matters a lot for retrieval accuracy.

Lessons from the Trenches

The biggest takeaway? The hardest part wasn’t the AI—it was the plumbing.

Matching embedding dimensions between queries and stored vectors, formatting pgvector columns in raw SQL, and gracefully handling cases where no relevant context is found took significantly more time than expected.

Chunking strategy also makes or breaks the experience. If chunks are too large, you lose precision; if they're too small, they lose meaning. I eventually settled on overlapping chunks of roughly 400 tokens, which seems to be the "sweet spot" for conversational retrieval.

Try It Out

The chatbot is now live on https://rag.kurdibuilds.dev/ and fully indexed with my background and project history. Give it a spin! It’s designed to answer confidently when it finds a match or tell you honestly when it lacks the information, rather than hallucinating an answer.

Related Articles

Building a RAG Pipeline on SQL Server: What I Learned