CSV File Splitting
Handle massive spreadsheets with automatic row-by-row splitting for perfect search results
What is CSV File Splitting?
When you upload large CSV or Excel files, George AI automatically splits them into individual markdown filesโone per row. This ensures that when you search, you find exactly the row you're looking for, not an entire massive file.
No configuration needed. Upload a 100,000-row product catalog, and George AI handles everything automatically.
How It Works
- Upload CSVAny size file
- Auto-SplitOne file per row
- EmbedVector search enabled
- SearchFind exact rows
Example: Product Catalog (10,000 Products)
Input:
SKU,Name,Price,Stock P-001,Widget A,29.99,150 P-002,Widget B,39.99,75 ... P-10000,Widget Z,19.99,200 Output (File Structure):
products.md (summary) parts/ 0/ (rows 1-100) 1.md, 2.md, ..., 100.md 1/ (rows 101-200) 2/ (rows 201-300) ... Benefits
Semantic Search Precision
Each row becomes one semantic chunk. When you search "red t-shirt size M", you get that exact product rowโnot a 50,000-row file.
Memory Efficiency
Streaming architecture processes files with constant ~1KB memory usage, regardless of file size. Handle 700K+ rows without performance degradation.
Fast Pagination
Bucketed storage (100 files per directory) enables fast UI navigation. Metadata caching makes browsing split files instant.
Enrichment-Ready
Each row is a list item. Add enrichment fields to extract additional data (e.g., "Product Category" from description). Perfect for data cleaning.
Viewing Split Files
After upload, navigate to the file in your library:
- Open the library containing your CSV file
- Click on the file name
- Use the Markdown File Selector dropdown to choose which row to view
- Dropdown shows: "Summary", "Row 1", "Row 2", etc.
For files with many rows, pagination controls appear automatically:
- Navigate between rows using previous/next buttons
- Jump to specific row numbers
- View summary file to see total row count and column names
Configuration
Automatic - No configuration needed!
Advanced: Library Settings
For power users, the setting is controlled in library configuration:
| Setting | Default Value | Description |
|---|---|---|
splitByCsvRows | Enabled | Automatically split CSV/Excel files by rows |
Technical Details
Storage Layout: /storage/libraries/{libraryId}/files/{fileId}/ main.md # Summary file parts/0/1.md # Row 1 parts/0/2.md # Row 2 ... parts/0/100.md # Row 100 parts/1/101.md # Row 101 (new bucket) ... Each row becomes a structured markdown file:
# Row 1 **SKU:** P-001 **Name:** Widget A **Price:** 29.99 **Stock:** 150 - One Chunk per Row: Each markdown file = one semantic chunk
- Batch Processing: Embeddings generated in parallel batches for speed
- Part Number Tracking: Each embedding stores its row number for retrieval
- Summary Embedding: Main file (column headers + stats) also embedded
Common Use Cases
Product Catalogs
Upload supplier product lists (50,000+ products). Search finds exact SKUs. Enrich to extract missing data (category, brand). Export to e-commerce platform via automations.
Inventory Lists
Process warehouse inventory spreadsheets. Search by location, product code, or description. Track stock levels across multiple warehouses.
Customer & Contact Lists
Import CRM exports. Search by name, company, or email. Enrich with additional data from web APIs. Clean and deduplicate records.
Transaction & Order Logs
Process order history CSVs (100K+ transactions). Search by order number, customer, or date. Analyze patterns with AI enrichments.
Related Documentation
Learn more about working with files and data: