Skip to content

TalonProbeite/PureFile

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PureFile 🛡️

A high-performance Python service designed to extract and sanitize sensitive metadata from your files. Stop leaking your GPS location, device serial numbers, and personal info before sharing documents or photos.

🌟 Key Features

  • Metadata Extraction: View all hidden technical tags before they are wiped.
  • Deep Sanitization: Completely strips EXIF, XMP, and other metadata from Images, PDFs, and Word docs.
  • Privacy-First: Files are processed in-memory and never stored on the server.
  • Modern Stack: Built with FastAPI and powered by the ultra-fast uv package manager.

🛠️ Tech Stack

  • Framework: FastAPI (Asynchronous API)
  • Package Manager: uv
  • Libraries: Pillow (Images), PyMuPDF (PDF), python-docx (Word)
  • Containerization: Docker

🚀 Getting Started

Option 1: Docker (Recommended)

  1. Build the image: docker build -t purefile-app .

  2. Run the container: docker run -d -p 8000:8000 --name purefile-container purefile-app

Option 2: Local Development (using uv)

  1. Install dependencies: uv pip install -r pyproject.toml

  2. Run the application: python run.py


🖥️ Usage

Once the service is running, open your browser at: http://localhost:8000/docs

You will see the interactive Swagger UI where you can upload a file and get a "purified" version back instantly.


📂 Supported Formats

  • Images: .jpg, .jpeg, .png, .webp
  • PDF: .pdf
  • Documents: .docx

⚙️ How it works

PureFile acts as a digital filter. It parses the binary structure of your file, identifies metadata segments (like EXIF in photos or Author properties in DOCX), and re-saves the file while intentionally omitting these segments. The result is a visually identical file with a clean "digital history".

About

API designed for removing and reading metadata for png, jpg, docx, pdf

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors