A feed-forward 3D reconstruction model that processes streaming video or image sequences into point clouds at around 20 FPS. Uses a Geometric Context Transformer with paged KV cache attention to handle sequences over 10,000 frames without choking on memory. The CLI is straightforward: point it at images or video, get an interactive 3D viewer in your browser. Sky masking for outdoor scenes is a nice touch since sky points are usually garbage. Keyframe intervals and windowed mode keep long sequences tractable. The streaming approach means no iterative optimization loops, which is the main selling point here. Built for real reconstruction work, not just demos, though the example scenes suggest it handles both tourist snapshots and structured captures reasonably well.
npx skills add https://github.com/aradotso/trending-skills --skill lingbot-map-3d-reconstruction