data-preprocessing-pipelines

Here are 3 public repositories matching this topic...

bohyy / Video-Processing-Pipeline

Video quality assessment and filtering pipeline for ML training data. Automatically handles format conversion, scene segmentation, face detection, text detection, and audio-video sync checking. Supports 127 concurrent processes with checkpoint recovery

opencv machine-learning ffmpeg pipelines video-processing face-detection video-streaming training-data training-project data-preprocessing-pipelines

Updated Feb 12, 2026
Python

shamspias / gpt3-data-preprocessing

Star

This repository containing code for preprocessing text data from PDF and DOCX files for use with GPT-3. It includes steps such as tokenization, removal of stop words and punctuation, and formatting for GPT-3 input.

data-science machine-learning artificial-intelligence data-preprocessing gpt-3 data-preprocessing-pipelines