Deep Fake Detection Challenge (DFDC) on Kaggle


  • Deep fake context, GAN, dataset size challenge etc
  • processing the videos
  • Frame extraction
  • Face recognition
  • ResNext50 training
  • Inference pipeline : video -> frames -> faces -> prediction over each face/frame -> average -> output

Classifying genuine and fake videos (generated by a GAN) from a 500GB dataset. I processed the videos to reduce the dataset size, by first extracting a few frames per video, than performing image segmentation to extract the faces contained in the frames, which happened to contain glitches generated by the GAN. I used fast-ai to train a CNN on those faces and managed to get a 75% accuracy by averaging the prediction on each frame of a video.