Personal Archive Infrastructure

Archive System OCR RAG Self-Hosted

BackIt — Transform Your Memories into a Personal Knowledge Archive

The problem

Hundreds of thousands of digital files piled up over decades. Photos across dozens of folders, documents scattered through drives, videos with no labels, health records buried in downloads. Years of data with no structure, no connections, and no way to find anything.

This isn't a client project. It's my own data — and no existing app could handle the scale or the mess. So I built the system myself.

The challenge

This wasn't a simple "upload and tag" problem. People accumulate massive amounts of data across every part of their lives — personal and professional — with no consistent naming, no metadata, and no organization:

Family photos and home videos scattered across phones, drives, and old backups
Emails, chat histories, and message threads across apps and platforms
Tax returns, insurance policies, contracts, and legal documents
Medical records, lab results, prescriptions
Diplomas, transcripts, certifications, and training materials
Work projects, presentations, reports, and client files
Receipts, warranties, manuals, and purchase records
Personal writing, journals, correspondence
Collections, inventories, catalogs

BackIt crawls apps, inboxes, drives, and local folders — pulling in all of it. It also handles digitized physical media like scanned letters, old prints, and handwritten documents in any language. And everything stays private: no cloud, no third-party services, nothing leaves your network.

BackIt mobile timeline view — Timeline on mobile.

BackIt mobile stories view — Stories on mobile.

What BackIt does

BackIt crawls everything — files, folders, apps, inboxes — and builds a fully organized, searchable archive. Every item is categorized, tagged with metadata, and archived with the care of a museum collection. Documents get read automatically, even handwritten ones, in any language. Photos are analyzed for locations, faces, and scenes. Nothing gets lost.

The Archive

The foundation. Every file you own — photos, documents, videos, emails, chats, records — properly categorized, enriched with metadata, and stored in a structured archive. Think of it as a personal museum for your entire digital life. Browse by category, search by content, filter by date. Everything in one place, everything findable.

Stories

BackIt doesn't just store your data — it reads it and creates stories from it. By connecting photos, documents, dates, and context, it automatically generates narrative posts about moments in your life. "The day Sam was born." "John's graduation." "Summer in Paris, 2003." Stories you never had time to write, assembled from data you already had.

Auto-edited Videos

Like stories, but in motion. BackIt pulls together photos, video clips, documents, and audio from across the archive and assembles edited videos automatically. A highlight reel of a vacation. A tribute for a birthday. A year in review. No editing software, no manual work — just the finished product, ready to share.

Chat with the Archive

This is where it gets personal. BackIt includes a conversational chatbot trained on the entire archive. Ask it anything: "When did we move to the new house?" "What did Dad do for work in the 80s?" "Show me everything from our trip to Italy." It's like talking to someone who remembers everything — because it has access to every photo, every document, every record. A digital version of the archive's owner, available to anyone in the family.

Timeline

A visual, filterable timeline of an entire life. Every event, every document, every photo — laid out chronologically. Filter by person, by category, by date range. Zoom into a single week or zoom out to see decades. It turns a messy pile of data into a clear, navigable history.

Stack

Python Tesseract OCR GPT-4 / Local LLMs ChromaDB FastAPI SQLite FFmpeg Linux / Nginx

Where it stands

Hundreds of thousands of items — indexed, searchable, and connected
Stories and videos generated automatically from archived data
Conversational chatbot that knows the entire archive
Full timeline of a life, filterable by person, category, or date
Reads documents in any language, including faded handwriting
Completely private — runs on local hardware, nothing in the cloud
Always growing — new material is automatically sorted and processed

Why this matters beyond my family

Every organization sitting on years of unstructured material has this same problem. Law firms with boxes of case files. Medical practices with decades of patient records. Museums with uncataloged collections. Media companies with vast photo and video libraries. Family offices managing generations of documents.

The specific material changes, but the need is the same: make it findable, make it searchable, make it useful. That's what this system does — and it's the kind of system I build.

Sitting on a collection that needs to become searchable?

Let's talk

Personal Archive Infrastructure

The problem

El problema

The challenge

El reto

What BackIt does

Qué hace BackIt

The Archive

El archivo

Stories

Historias

Auto-edited Videos

Vídeos editados automáticamente

Chat with the Archive

Chatea con el archivo

Timeline

Línea de tiempo

Stack

Tecnologías

Where it stands

Estado actual

Why this matters beyond my family

Por qué esto importa más allá de mi familia