Converts videos into AI-digestible data by extracting timestamped frames and transcripts. You'd reach for this when you need an AI agent to analyze video content without manually scrubbing through footage or transcribing speech. It processes local or remote video files and outputs structured data that language models can actually work with. Useful for building agents that summarize meetings, analyze tutorials, or search through video libraries by content. The extraction happens locally through the CLI, so you're not sending videos to third-party services. Think of it as a preprocessing layer that turns opaque video files into queryable, timestamp-indexed frames and text that Claude can reason about.