Notatki

TwelveLabs video understanding models are now available in Amazon Bedrock

AWS adds native video embeddings and video understanding models to Amazon Bedrock. It opens a lot of potential use cases for which I previously reached for Gemini models. One example of such a case is an educational system that watches how the learner performs the task and provides feedback based on the educational materials.

Bedrock had workflows to do video understanding, but it was exactly that: workflows, not native models. You can imagine what they looked like—take a video, split to frames, feed frames to VLM, try to maintain temporal consistency, despair, come to terms with the system’s performance, and go on vacation.

Now, however, there are not one, but two different native video models:

TwelveLabs Marengo, for creating video embeddings;
TwelveLabs Pegasus, for video-based text generation.

Pricing of the models depends on whether your video has an audio track or not, but you should expect $2.5-$3/hour of video for Marengo and $1.8/hour for Pegasus.