Files
Understanding the file lifecycle from upload to searchable content
What are Files?
Files are the core content in George AI. Each file belongs to a Library and goes through automated processing to extract text, generate embeddings, and make content searchable.
Files can be added manually via upload or automatically through Crawlers that collect documents from external sources like SharePoint, file shares, or email.
Manual Upload
Upload files directly through the web interface into a Library
Automated Crawling
Configure Crawlers to automatically collect files from external systems
File Lifecycle
Every file in George AI goes through a processing pipeline to make it searchable and usable for AI assistants:
Processing Can Fail
Files can fail at Validation (unsupported format), Extraction (corrupted file), or Embedding (timeout). You can retry processing via the file menu.
File Processing Status
Files have three status indicators that track their progress:
| Status Type | Values | Description |
|---|---|---|
| Processing Status | none pending validating extracting embedding completed failed | Overall processing state through the entire pipeline |
| Extraction Status | none pending running completed failed | Text and image extraction stage |
| Embedding Status | none pending running completed failed | Vector embedding generation stage |
Status Badges in the UI
These badges appear in the file list and indicate processing completion times or errors.
File Metadata
Each file stores metadata that can be used for filtering, sorting, and enrichment:
| Property | Description | Source |
|---|---|---|
name | File name with extension | From upload or crawler |
mimeType | File type (e.g., application/pdf, image/png) | Detected automatically |
size | File size in bytes | Actual file size |
originUri | Original location (file path, SharePoint URL, etc.) | From upload or crawler |
originModificationDate | When the file was last modified at its source | From file system or crawler |
uploadedAt | When the file was added to George AI | Set at creation time |
createdAt | When the file record was created in the database | Set at creation time |
archivedAt | When the file was archived (if applicable) | Set when file is archived |
taskCount | Number of processing tasks associated with this file | Counted from processing queue |
chunksCount | Number of vector embedding chunks generated | From embedding process |
Using Metadata in Lists
You can create List fields with sourceType: file_property to display file metadata (name, size, modified date, source) without AI processing.
File Actions
You can perform several actions on files through the file menu:
Reprocess (Re-extract)
Triggers a new extraction task to re-extract text and images from the file
Use when:
- Extraction failed or timed out
- Library extraction settings changed (e.g., updated OCR prompt)
- File content was updated at the source
Re-embed
Triggers a new embedding task to regenerate vector embeddings
Use when:
- Embedding failed or timed out
- Library embedding model changed
- Extraction was re-run with new content
View Info
Shows detailed file metadata and processing information
Displays:
- File size and format
- Processing status
- Number of chunks generated
- Number of processing tasks
- Crawler source (if applicable)
- Origin modification date
View Extraction
Shows the extracted markdown content from the file
Use for:
- Verifying extraction quality
- Debugging enrichment issues
- Understanding what content AI assistants see
Supported File Types
George AI supports a wide range of file formats for automatic text extraction:
Documents
- • PDF (.pdf)
- • Word (.docx, .doc)
- • PowerPoint (.pptx, .ppt)
- • Excel (.xlsx, .xls)
- • Text (.txt, .md, .csv)
- • HTML (.html, .htm)
Images (with OCR)
- • JPEG (.jpg, .jpeg)
- • PNG (.png)
- • TIFF (.tiff, .tif)
- • BMP (.bmp)
- • GIF (.gif)
Videos
- • MP4 (.mp4)
- • WebM (.webm)
- • AVI (.avi)
- • MOV (.mov)
- • MKV (.mkv)
Audio transcription and visual content extraction
Archives
- • ZIP (.zip)
- • 7-Zip (.7z)
- • TAR (.tar, .tar.gz)
Archives are extracted and files inside are processed individually
Unsupported Formats
If a file format is not supported, it will be marked with "Unsupported Format" badge and no extraction will be performed. The file metadata is still stored and searchable.
Need Additional Format Support?
We can add support for any file format for paying customers. Alternatively, you can build automation workflows to transform files with your own automation and ingest them into George AI. Contact us to discuss your specific requirements.
Related Topics
Learn more about how files are processed and used: