Structured output
Understand the structured output format returned by the Zapdos indexing process
When indexing a video with Zapdos, the system processes the video frames and returns a structured output containing detailed information about the content.
Return Format
The indexing process returns an array with the following structure:
[
// Array of items with detailed information about frames, segments, and summaries
]
VideoAnalysis Schema
The structured output from Zapdos follows a strictly defined schema to ensure consistency and reliability. Custom schema support is coming soon.
The output is an array that contains objects of different types:
type IndexResult = Array<
// Transcription segments
{
start_sec: number;
end_sec: number;
transcription: string
} |
// Frame analysis
{
at_sec: number;
description?: string;
detections?: {
label: string;
score: number;
box: number[]
}[]
} |
// Summary
TextSummary
>;
interface TextSummary {
metadata: {
title: string;
location: string;
};
observations: {
visual: {
people: Array<{
id: string;
appearance: string;
clothing: string;
gestures: string;
facial_expressions: string;
role: string;
}>;
objects: Array<{
id: string;
type: string;
description: string;
position: string;
}>;
setting: string;
text_graphics: string[];
scene_transitions: string[];
};
technical: {
camera_angles: string[];
camera_movements: string[];
lighting: string;
editing: string[];
special_effects: string[];
production_quality: "professional" | "amateur" | "surveillance" | "unknown";
};
};
timeline: Array<{
start_ms: string;
end_ms: string;
description: string;
actions: string[];
participants: string[];
}>;
narrative_analysis?: {
structure: string;
conflict: string;
resolution: string;
themes: string[];
messages: string[];
bias_perspective: string;
};
contextual_analysis?: {
time_setting: string;
environment: string;
weather_conditions: string;
social_dynamics: string;
activities: string[];
emotions: string[];
mood_atmosphere: string;
cultural_references: string[];
situational_significance: string;
};
interpretation?: {
summary: string;
key_points: string[];
importance: string;
intended_audience: string;
purpose: "educational" | "promotional" | "documentary" | "entertainment" | "personal" | "other";
open_questions: string[];
missing_context: string;
};
}
Example Output
Here's a (shortened) example of what the structured output looks like from processing an actual video:
[
{
"at_sec": 0,
"description": "Two men are seated in front of a fireplace, engaged in a conversation. The man on the left is wearing a gray blazer over a blue shirt, while the man on the right is dressed in a black suit with a white shirt. They are seated on wooden chairs, with the man on the left gesturing with his hands as he speaks. The fireplace, which is black and ornate, is adorned with a vase of pink flowers and a painting on the wall behind it.",
"detections": [
{
"label": "person",
"score": 0.96044921875,
"box": [975.0, 324.25, 1669.0, 1071.0]
},
{
"label": "person",
"score": 0.9599609375,
"box": [128.88, 352.75, 724.0, 1071.0]
},
{
"label": "chair",
"score": 0.89794921875,
"box": [122.12, 829.0, 849.5, 1077.0]
}
]
},
{
"start_sec": 4079.4,
"end_sec": 6000,
"transcription": "In this scene, we see two men engaged in a conversation in a room with a fireplace. One man is wearing a gray suit jacket over a blue shirt, while the other is dressed in a black suit with a white shirt. They are seated on wooden chairs, with the man on the left gesturing with his hands as he speaks."
},
{
"metadata": {
"title": "Conversation between two men",
"location": "Indoor setting, possibly a study or living room"
},
"observations": {
"visual": {
"people": [
{
"id": "person1",
"appearance": "Man on the left, wearing glasses.",
"clothing": "Gray suit jacket/blazer over a blue shirt with white patterns.",
"gestures": "Gesturing with hands, hands clasped together",
"facial_expressions": "Serious, thoughtful",
"role": "Speaker/participant in conversation"
}
]
}
},
"timeline": [
{
"start_ms": "4079400",
"end_ms": "6000000",
"description": "Two men are engaged in a conversation in a room with a fireplace.",
"actions": ["Conversing", "Listening", "Gesturing"],
"participants": ["person1", "person2"]
}
]
}
]
Accessing the Output
Programmatic Usage
When using the SDK programmatically, you can access the structured output as follows:
import ZapdosClient from 'zapdos-js';
const client = new ZapdosClient();
async function indexVideo() {
try {
for await (const event of client.index("path/to/your/video.mp4", { interval: 30 })) {
if (event.type === 'done') {
const result = event.value;
console.log(`Indexed ${result.length} items`);
// Iterate through items
result.forEach((item, index) => {
if ('at_sec' in item) {
console.log(`Frame at ${item.at_sec}sec: ${item.description}`);
if (item.detections) {
console.log(` Objects detected: ${item.detections.length}`);
}
} else if ('start_sec' in item && 'transcription' in item) {
console.log(`Transcription segment from ${item.start_sec}sec to ${item.end_sec}sec`);
} else if ('metadata' in item) {
console.log(`Summary: ${item.metadata.title}`);
}
});
}
}
} catch (error) {
console.error('Indexing failed:', error);
}
}
indexVideo();
CLI Usage
When using the CLI, you'll see progress updates and a final summary:
$ npx zapdos index video.mp4 --interval 60
Indexing video file: /home/user/video.mp4
📤 Preparing media for upload...
Processing at 60s intervals, saved to '/tmp/tmpnstvlfna/items'
Uploading items...
✓ Created video file record with ID: 03dfba34-9449-4c94-b0af-1fbe275671c8
💾 Indexed 34 items into database
🔍 Created object detection job with ID: fbf6b26c-926c-4595-9454-6286bbcf8831
✅ Completed object detection job with ID: fbf6b26c-926c-4595-9454-6286bbcf8831
⚙️ Created image description job with ID: e800825f-bbcf-4b16-a1dd-f0b9a1cdc3fe
✅ Completed image description job with ID: e800825f-bbcf-4b16-a1dd-f0b9a1cdc3fe
📝 Created summary job with ID: e4a13f53-4273-4845-8d38-2d59f3e1538b
✅ Completed summary job with ID: e4a13f53-4273-4845-8d38-2d59f3e1538b
🎉 Video indexing completed successfully!