Depending on whether you want to annotate a video with subtitles and cute emoji (eg to put on youtube), or whether you want to annotate objects or events in the video for the purpose of using the annotated data to train machine learning algorithms, you have to choose from the two types of responses below.
For the purpose of machine learning, these annotations are usually extremely time consuming. (For example, a self-driving startup mentioned in a podcast that for every hour of driving, they invest over 600 hours in annotating the collected data.) Thus your choice of annotation tool can have a huge time-cost effect on you. Annotating images is by far simpler than annotating video – video is a complex data structure, and everything about it is harder. To annotate on video, you have to handle incompatible codecs and video players, synchronize annotations between frames, track multiple annotated objects with varying states along multiple frames, and design UX to enable complex operations. To top it all, in a minute of video there are 1800 images that need to be annotated.
A guide for annotating videos for machine learning. Here are the highlights:
If you’re looking at annotating videos for an ML training dataset, what are your options? Here they are, from worst to better.
- Pick a sample from your videos, extract all the frames, and annotate them as images. We recommend you don’t do this, as you’re missing all the benefits inherent to the video format while incurring the cost of annotating a large number of images. Even if using a team of annotators, this approach is not efficient.
- Take some videos, get a video annotation tool, and make a personal effort over a few days to annotate them (as videos). Likely, this won’t work. Even one short video can take many hours to annotate.
- Use an available relevant dataset for training. Depending on your specific problem and how similar your data is to the available training dataset, this is a great shortcut to take. If you have this option, go for it.
- Pick a sample from your videos, get a video annotation tool, hire an in-house/remote team, and annotate them (as videos). This can work. Keep reading to learn about annotation tools.
If you decided to use a video annotation tool, here are the important features to consider in the tools you evaluate:
- Annotating key frames. There are 1800 frames per minute in a 30fps video, but subsequent frames are usually correlated: you don’t want to (and don’t have to) annotate each.and.every.frame.from.scratch. At a minimum, annotating key frames and interpolating between them is required.
- Native video format. You don’t want to extract all the frames from the video to be able to annotate them – if your tool needs this step, it is a sure sign this tool is, in fact, annotating images and not videos.
- Tracking and ML integration. Automated tracking of annotated objects (eg, using optical flow) can save a lot of time in annotations. The ability to use predictions from machine learning models for initial annotations (to be corrected by experts) is another time saver. Tools that utilize active learning, where the annotator is essentially teaching the ML model, can be especially useful.
- Consistent IDs. When there is more than one object annotated, objects should have consistent IDs for the duration of the video. This is helpful any time you want to track objects throughout a video, and becomes crucial if objects move in and out of the frame during the video.
- Distributed annotation. Every minute of a video can take hours to annotate. The option to share the annotation workload among a team of workers is extremely beneficial. The setup and configuration process for each worker, if at all, should be minimal.
- Segmenting long videos. Another aspect of distributing a large workload is the ability to split long videos into shorter segments (each segment can be annotated by different workers), and then merge the resulting annotations, with consistent quality.
- Multiple annotators. Compare and merge multiple annotations from multiple workers on the same video segment, to reduce annotation errors and improve quality.
- Customized labels. Can you customize the annotation tools to add your own labels?
- Customized attributes. Can you customize the annotation tools to add customized attributes to objects (eg, traffic light color)?
- Annotation types. Does it have what you need? Bounding boxes (BBOX), lines, circles, dots, 3D boxes.
We have reviewed some of the available video annotation tools, and here is what we’ve found:
ReClipped is a neat tool that lets you take notes, share notes, and share clips from educational videos. ReClipped blends the best aspects of TurboNote, VideoNot.es, and Pinterest into one slick system. With a ReClipped account you can clip sections of videos that you find on YouTube, Vimeo, DailyMotion, Coursera, and TED. In addition to clipping you can create time-stamped notes about the videos that you clip. The Pinterest-like aspect of ReClipped appears if you choose to share your clips and notes on a board. ReClipped boards can be shared publicly or kept private. See ReClipped in action in this video.
TurboNote is a Chrome extension you can use to take notes while watching any video. To take notes students just need to click the TurboNote extension icon in their browsers and start writing notes in the menu that appears on the right side of the screen. Any notes that students type are automatically time-stamped. Notes can be edited while the video is playing or while the video is stopped. All notes can be shared via social media and email.
Timelinely is the newest of this type of tool for annotating videos that are hosted on YouTube. Timelinely makes it easy to get started. You just have to copy a YouTube URL into the Timelinely homepage to get started. Once you have entered the URL for a video, a new screen appears that allows you to add tags or annotations to the timeline of the video. You can do this while the video plays or you can simply jump to a place on the video to add annotations. Your annotations can include text or images.