~/krishna_dhakal
#AI#AWS#RAG#Qdrant#Teams

Building a Real-Time Video Assistant with RAG on AWS

> April 15, 2025

One of the most interesting projects I have shipped is a Video Assistant that autonomously joins Microsoft Teams meetings and delivers real-time news briefings relevant to the ongoing conversation. This post explains the architecture from ingestion to delivery.


> The Problem


Enterprise teams spend hours manually researching context before meetings. The goal was to build a system that automatically surfaces the most relevant news and internal documents during a live meeting — without any human curation.


> Architecture Overview


The system has four main stages:


  • **Ingestion** — scrape news sources on a schedule, chunk and embed documents
  • **Indexing** — store embeddings in Qdrant with rich metadata
  • **Retrieval** — semantic search triggered by meeting transcript segments
  • **Delivery** — a bot that joins Teams calls and posts briefings in the chat

> Stage 1: News Ingestion with AWS Lambda


import boto3
import feedparser
from qdrant_client import QdrantClient
from qdrant_client.models import PointStruct

def handler(event, context):
    feeds = ["https://feeds.reuters.com/reuters/topNews", "https://rss.nytimes.com/services/xml/rss/nyt/HomePage.xml"]
    articles = []
    for feed_url in feeds:
        feed = feedparser.parse(feed_url)
        for entry in feed.entries[:20]:
            articles.append({"title": entry.title, "summary": entry.summary, "published": entry.published})
    embed_and_index(articles)
    return {"indexed": len(articles)}

A CloudWatch Events rule triggers this Lambda every 15 minutes, keeping the index fresh.


> Stage 2: Embedding and Indexing with Qdrant


from langchain_aws import BedrockEmbeddings
from qdrant_client.models import Distance, VectorParams

embeddings = BedrockEmbeddings(model_id="amazon.titan-embed-text-v1", region_name="us-east-1")
client = QdrantClient(url=QDRANT_URL, api_key=QDRANT_API_KEY)

client.recreate_collection(
    collection_name="news",
    vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
)

def embed_and_index(articles):
    points = []
    for i, article in enumerate(articles):
        text = f"{article['title']}. {article['summary']}"
        vector = embeddings.embed_query(text)
        points.append(PointStruct(id=i, vector=vector, payload=article))
    client.upsert(collection_name="news", points=points)

> Stage 3: Real-Time Retrieval


During a meeting, the Teams bot captures transcript segments and triggers retrieval:


def retrieve_relevant_news(transcript_segment: str, top_k: int = 5):
    query_vector = embeddings.embed_query(transcript_segment)
    results = client.search(
        collection_name="news",
        query_vector=query_vector,
        limit=top_k,
        score_threshold=0.72,
    )
    return [r.payload for r in results]

The score threshold (0.72) was tuned empirically to filter noise while keeping high-signal results.


> Stage 4: Teams Bot Delivery


The bot uses the Microsoft Bot Framework SDK to join meetings and post briefings:


async def on_message_activity(self, turn_context: TurnContext):
    transcript = turn_context.activity.text
    news_items = retrieve_relevant_news(transcript)
    if news_items:
        summary = format_briefing(news_items)
        await turn_context.send_activity(MessageFactory.text(summary))

> Lessons Learned


  1. **Rate-limit your ingestion** — news APIs and RSS feeds have quotas; use SQS to buffer bursts.
  2. **Tune your score threshold carefully** — too low gives noise, too high misses useful context.
  3. **Async delivery matters** — posting synchronously in the bot handler causes timeouts in Teams; always use background tasks.
  4. **Monitor your vector store** — Qdrant collection sizes grow fast; implement TTL-based cleanup for stale news.

This project showed me that the hardest part of a RAG system is not the LLM — it's data freshness and retrieval quality tuning.