Files

Understanding the file lifecycle from upload to searchable content

What are Files?

Files are the core content in George AI. Each file belongs to a Library and goes through automated processing to extract text, generate embeddings, and make content searchable.

Files can be added manually via upload or automatically through Crawlers that collect documents from external sources like SharePoint, file shares, or email.

Manual Upload

Upload files directly through the web interface into a Library

Automated Crawling

Configure Crawlers to automatically collect files from external systems

File Lifecycle

Every file in George AI goes through a processing pipeline to make it searchable and usable for AI assistants:

Step 1
Upload or Crawl

File is added to the Library (manually uploaded or collected by a Crawler)
Step 2
Validation

File format and integrity are checked
Step 3
Extraction

Text and images are extracted from the document (supports PDF, Office docs, images with OCR, etc.)
Step 4
Embedding

Text is split into chunks and converted to vector embeddings for semantic search
Step 5
Completed

File is now searchable and available for AI assistants

Processing Can Fail

Files can fail at Validation (unsupported format), Extraction (corrupted file), or Embedding (timeout). You can retry processing via the file menu.

File Processing Status

Files have three status indicators that track their progress:

Status Type	Values	Description
Processing Status	none pending validating extracting embedding completed failed	Overall processing state through the entire pipeline
Extraction Status	none pending running completed failed	Text and image extraction stage
Embedding Status	none pending running completed failed	Vector embedding generation stage

Status Badges in the UI

Extraction 2025-01-15

Embedding 2025-01-15

Unsupported Format

Legacy File

These badges appear in the file list and indicate processing completion times or errors.

File Metadata

Each file stores metadata that can be used for filtering, sorting, and enrichment:

Property	Description	Source
`name`	File name with extension	From upload or crawler
`mimeType`	File type (e.g., application/pdf, image/png)	Detected automatically
`size`	File size in bytes	Actual file size
`originUri`	Original location (file path, SharePoint URL, etc.)	From upload or crawler
`originModificationDate`	When the file was last modified at its source	From file system or crawler
`uploadedAt`	When the file was added to George AI	Set at creation time
`createdAt`	When the file record was created in the database	Set at creation time
`archivedAt`	When the file was archived (if applicable)	Set when file is archived
`taskCount`	Number of processing tasks associated with this file	Counted from processing queue
`chunksCount`	Number of vector embedding chunks generated	From embedding process

Using Metadata in Lists

You can create List fields with sourceType: file_property to display file metadata (name, size, modified date, source) without AI processing.

File Actions

You can perform several actions on files through the file menu:

Reprocess (Re-extract)

Triggers a new extraction task to re-extract text and images from the file

Use when:

Extraction failed or timed out
Library extraction settings changed (e.g., updated OCR prompt)
File content was updated at the source

Re-embed

Triggers a new embedding task to regenerate vector embeddings

Use when:

Embedding failed or timed out
Library embedding model changed
Extraction was re-run with new content

View Info

Shows detailed file metadata and processing information

Displays:

File size and format
Processing status
Number of chunks generated
Number of processing tasks
Crawler source (if applicable)
Origin modification date

View Extraction

Shows the extracted markdown content from the file

Use for:

Verifying extraction quality
Debugging enrichment issues
Understanding what content AI assistants see

Supported File Types

George AI supports a wide range of file formats for automatic text extraction:

Documents

• PDF (.pdf)
• Word (.docx, .doc)
• PowerPoint (.pptx, .ppt)
• Excel (.xlsx, .xls)
• Text (.txt, .md, .csv)
• HTML (.html, .htm)

Images (with OCR)

• JPEG (.jpg, .jpeg)
• PNG (.png)
• TIFF (.tiff, .tif)
• BMP (.bmp)
• GIF (.gif)

Videos

• MP4 (.mp4)
• WebM (.webm)
• AVI (.avi)
• MOV (.mov)
• MKV (.mkv)

Audio transcription and visual content extraction

Google Drive Upload (Optional)

For users who store documents in Google Drive, you can upload files directly with a modern file picker that includes search, batch selection, and automatic Google Docs conversion to PDF.

Features & Capabilities

Search Across All Files

Search your entire Google Drive by file name with instant results.

Batch Selection

Select multiple files at once with checkboxes.

Folder Navigation

Browse your Drive with folder navigation and breadcrumbs.

Auto PDF Conversion

Google Docs, Sheets, and Slides automatically converted to PDF.

How to Upload from Google Drive

Open Library → Navigate to the Library where you want to upload files
Click "Upload from Google Drive" button in the Files section
Sign in to Google (first time only) → Grant George AI read-only access to your Drive
Browse or Search → Navigate folders or use search to find files
Select Files → Check the boxes next to files you want to upload
Click "Upload" → Selected files are downloaded and processed automatically

View Modes

Switch between list view (detailed) and grid view (visual icons) using the toggle buttons.

Automatic PDF Conversion

When uploading Google Workspace files, George AI automatically exports them as PDF to ensure compatibility:

Google File Type	Converted To	Result
Google Docs	PDF	Text, formatting, images preserved
Google Sheets	PDF	Tables, charts, formatting preserved
Google Slides	PDF	Slides, images, layouts preserved
Other files (PDF, JPG, etc.)	No conversion	Downloaded as-is in original format

Automatic Detection: The system automatically detects file type and chooses the right conversion method—no configuration needed.

Metadata Preservation

Files uploaded from Google Drive preserve important metadata:

File name: Original Google Drive file name (with .pdf extension added if converted)
Modified date: Last modification date from Google Drive
Origin URI: Link back to the original file in Google Drive (e.g., https://drive.google.com/file/d/...)
File size: Actual file size from Google Drive

Traceability

The Origin URI allows you to trace back to the source file in Google Drive for verification or updates.

Pagination & Large Drives

The Google Drive picker handles large drives efficiently:

50 files per page: Fast loading even with thousands of files
Forward/backward navigation: Browse through pages with previous/next buttons
Search resets pagination: Searching automatically returns to first page of results
Folder navigation resets pagination: Entering a folder starts from page 1

One-Time Authentication

You only need to sign in to Google Drive once per browser. Your access token is stored securely in your browser's local storage and expires after 1 hour. Re-authenticate when needed.