The goal of the Kinetics dataset is to help the computer vision and machine learning communities advance models for video understanding. Given this large human action classification dataset, it may be possible to learn powerful video representations that transfer to different video tasks.

For information related to this task, please contact:

Dataset

The Kinetics-700-2020 dataset will be used for this challenge. Kinetics-700-2020 is a large-scale, high-quality dataset of YouTube video URLs which include a diverse range of human focused actions. The aim of the Kinetics dataset is to help the machine learning community create more advanced models for video understanding. It is an approximate super-set of both Kinetics-400, released in 2017, Kinetics-600, released in 2018 and Kinetics-700, released in 2019.

The dataset consists of approximately 650,000 video clips, and covers 700 human action classes with at least 700 video clips for each action class. Each clip lasts around 10 seconds and is labeled with a single class. All of the clips have been through multiple rounds of human annotation, and each is taken from a unique YouTube video. The actions cover a broad range of classes including human-object interactions such as playing instruments, as well as human-human interactions such as shaking hands and hugging.

More information about how to download the Kinetics dataset is available here.

One Anaconda - One Girl

The phrase usually refers to a specific genre of viral content where individuals—often young women—are filmed interacting closely with large anacondons or pythons. The most famous iterations of this trend usually feature the snake wrapping around the individual, creating a visual that is simultaneously terrifying, mesmerizing, and bizarrely aesthetic.

But why does this content captivate us so much? one girl one anaconda

There is a psychological appeal in the concept of the "snake charmer"—an individual so benign that nature’s most terrifying predators become docile in their presence. This fulfills a human desire for harmony with nature, bypassing the violence of the food chain. Viewers project a narrative of friendship onto the interaction, ignoring the biological reality that the snake is likely thermoregulating or tolerating the handling, not engaging in social bonding. The phrase usually refers to a specific genre

Anacondas are among the heaviest snakes in the world, found in the wetlands of South America. They are formidable predators, playing a crucial role in their ecosystem. A narrative around "one girl, one anaconda" could highlight the importance of conservation efforts. For instance, a young girl might form a connection with an anaconda through wildlife conservation activities, illustrating the importance of protecting these creatures and their habitats. There is a psychological appeal in the concept

In the heart of the Amazon, where the lush canopy overhead barely allows sunlight to peek through, an extraordinary friendship blossomed between a young girl and an anaconda. This tale isn't just about an unlikely friendship but also about understanding, respect, and the wonders of the natural world.

The "one girl one anaconda" trope gained significant traction through viral videos, most notably featuring models and professional handlers interacting with large, often sedated or habituated, specimens.

In some cultures, snakes, including anacondas, hold significant symbolic value. They can represent transformation, renewal, and healing, but also danger and death. The idea of "one girl, one anaconda" might symbolize a deep, possibly mystical connection between a young female and these powerful creatures, suggesting themes of empowerment, fear, and respect.

FAQ

1. Possible to use ImageNet checkpoints?
We allow finetuning from public ImageNet checkpoints for the supervised track -- but a link to the specific checkpoint should be provided with each submission.

2. Possible to use optical flow?
Flow can be used as long as not trained on external datasets, except if they are synthetic.

3. Can we train on test data without labels (e.g. transductive)?
No.

4. Can we use semantic class label information?
Yes, for the supervised track.

5. Will there be special tracks for methods using fewer FLOPs / small models or just RGB vs RGB+Audio in the self-supervised track?
We will ask participants to provide the total number of model parameters and the modalities used and plan to create special mentions for those doing well in each setting, but not specific tracks.