How It Works

Zapdos employs an agentic indexing approach similar to how advanced AI assistants like Claude Code explore codebases. Rather than processing every frame sequentially, Zapdos intelligently samples through parallel random access and analyzes key moments in your video to build a comprehensive understanding.

Agentic Video Indexing

The Zapdos SDK functions as an intelligent agent that explores your video content in two phases:

Sampling:
The SDK performs a first-pass exploration by randomly accessing frames at fixed time intervals. This provides a lightweight overview of the video without processing every frame.
Signal-Based Analysis:
Each sampled frame from the initial pass is analyzed using computer vision models to extract:
- Visual hashes for duplicate detection
- Object detection to identify important visual elements
- Scene change detection to focus on meaningful transitions
Adaptive Exploration:
Based on the initial signals, the server may request denser sampling of specific segments to answer natural questions like “Do these two people actually shake hands?”.

Intelligent Processing Pipeline

The client-side SDK sends a sparse collection of strategically selected frames (first pass) to our server infrastructure, where they are processed by state-of-the-art vision-language models (VLM) and computer vision (CV) models. If the server identifies promising leads, it guides the client to sample more densely in specific regions of the video.

By avoiding full-frame, sequential analysis and instead combining interval-based exploration with adaptive deeper dives, Zapdos dramatically reduces both processing time and compute cost—while still capturing the most meaningful details for indexing.

How It Works

How It Works

Agentic Video Indexing

Intelligent Processing Pipeline

On this page