This server turns any social video into agent-readable material: evenly spaced JPG frames plus a full transcript. Under the hood it chains yt-dlp, ffmpeg, and Kyma's Whisper-class ASR into a single stdio call. Works across YouTube, X, LinkedIn, TikTok, Reddit, Vimeo, and Facebook, with automatic browser cookie fallback for login-walled posts. You get the raw frames and transcript text instead of a pre-digested summary, so your agent can reason over what was actually shown and said. Useful when you want Claude to extract architecture diagrams from conference talks, clone a UI demo into working React, or convert a coding walkthrough into runnable project files.
claude mcp add --transport stdio sonpiaz-watch-cli uvx watch-cli