SarudoResearch Path
FeaturesHow It WorksPricing↗ SwitchReseller↗ SwitchDocsAbout
Get Started
Sarudo logo — AI Employee platformSarudo

AI Employees for Modern Businesses

Product

  • Features
  • How It Works
  • Documentation
  • Pricing
  • WordPress plugin
  • Reseller Program
  • FAQ

Company

  • About
  • Careers
  • Blog
  • Contact

Legal

  • Terms of Service
  • Privacy Policy
  • Refund Policy
  • SLA
  • Acceptable Use
  • Data Processing

© 2026 Sarudo. All rights reserved.

hello@sarudo.com
What is Sarudo?Onboarding ProcessSetting Up TelegramYour First InteractionWhat Your AI Employee Can DoSecurity & PrivacyYour First Conversation with SarudoWhat's Under the HoodBackups & Data Export
Telegram Commands ReferenceManaging ConversationsFile SharingApproval WorkflowTips for Effective CommunicationMulti-User Access
Email Setup & ConfigurationSending & Drafting EmailsReading & Searching InboxEmail Approval FlowEmail Use Cases
Voice Call SetupMaking Outbound CallsCall TranscriptionAI-Powered ConversationsCall History & RecordingsVoice Providers & Options
What Meetings Can DoUploading a RecordingAutomatic TranscriptionAction Items & AttendeesFollowing Up on Action Items
Managing Your CalendarReminders & NotificationsScheduling for OthersDaily Briefings
How Sarudo LearnsStoring & Retrieving KnowledgeDocument IngestionSemantic SearchKnowledge CategoriesContradiction HandlingSettings vs Knowledge
Web SearchWebsite BrowsingCompetitor ResearchYouTube & Video AnalysisLocal Business SearchImage Search
SEO Tools OverviewKeyword ResearchTrending Topics & Blog Gap AnalysisSERP Analysis & Competitor TrackingPutting It Together — A Content Research Workflow
Creating DocumentsPDF OperationsFormat ConversionOCR & Text ExtractionPresentationsDiagrams & Visuals
Built-in TemplatesCustom TemplatesRendering DocumentsBulk Mail Merge
CRM OverviewManaging ContactsCompanies & OrganizationsDeals & PipelineActivity TrackingFollow-ups & RemindersHow Deletion Works
Email EnrichmentDomain & Company LookupEmail FinderLinkedIn Enrichment
Automation OverviewCreating WorkflowsPre-Built TemplatesManaging WorkflowsBuilt-in AutomationsWorkflow Reliability FeaturesDry-Run Mode
How the Pipeline WorksStage 1 — Monthly ResearchStage 2 — Daily DrafterStage 3 — Publish LoopSupported CMS TargetsTuning the Pipeline
Social Media SetupDrafting PostsScheduling & PublishingSocial Post CalendarApproval WorkflowPublishing to Your Own Blog
Stripe Integration SetupCreating Checkout LinksSending InvoicesPayment TrackingProcessing Refunds
Notion IntegrationGoogle Sheets IntegrationAirtable IntegrationWebhook EventsBrowser AutomationMedia ProcessingGoogle Docs IntegrationBrowser Automation — Local vs Cloud
  1. Docs
  2. >
  3. Knowledge Base & Memory
  4. >
  5. Document Ingestion

Document Ingestion

Uploading PDFs, DOCX files, spreadsheets, and presentations for automatic chunking, embedding, and knowledge extraction.

Last updated: April 14, 2026
documentsuploadpdfdocxingestionchunkingembeddingspreadsheet

How Document Ingestion Works

When you send a document to your AI employee, it goes through a multi-step ingestion process. First, the document is parsed to extract all text content. Then the text is split into meaningful chunks — sections, paragraphs, or logical blocks. Each chunk is converted into a vector embedding that captures its semantic meaning. Finally, the chunks are stored in your knowledge base and become searchable. This process enables the AI to reference specific parts of your documents when answering questions.

Supported Document Types

The ingestion system supports a wide range of document formats. PDF files (including scanned documents with OCR), Microsoft Word documents (DOCX, DOC), plain text files (TXT, MD), spreadsheets (XLSX, CSV), PowerPoint presentations (PPTX), and HTML pages. For scanned PDFs and images containing text, the system uses OCR (optical character recognition) to extract the text before processing. Each format is handled by a specialized parser to ensure maximum extraction accuracy.

ℹ️

For best results with scanned documents, ensure the scan quality is at least 300 DPI. Blurry or low-resolution scans may produce less accurate OCR results.

Uploading Documents

To ingest a document, simply send it to your AI employee through Telegram and tell it to learn from the document. You can say "learn this," "add this to your knowledge base," or "ingest this document." The AI will process the file and confirm when ingestion is complete, including a summary of what it learned. You can also send multiple documents at once — the AI will process them in sequence.

Ingesting a company handbook

Upload a document for the AI to learn from.

You say:
[Attaches employee-handbook.pdf] Learn this document. It's our company handbook.
Sarudo responds:
I've ingested the employee handbook (47 pages, 128 sections). Key topics stored include: company policies, PTO rules, expense reporting procedures, code of conduct, benefits overview, and organizational structure. I can now answer questions about any of these topics. For example, I know your PTO policy allows 20 days per year with a 5-day carryover maximum. Anything specific you'd like me to look up?

Auto-Chunking & Embedding

The chunking strategy is optimized for retrieval accuracy. Documents are split at natural boundaries — section headings, paragraph breaks, and topic changes. Each chunk is sized to contain enough context to be useful on its own (typically 500-1000 tokens). Overlapping text between chunks ensures that information at boundaries is not lost. Each chunk is embedded using a high-quality embedding model, enabling semantic search that understands meaning rather than just matching keywords.

Related Articles

How Sarudo Learns
The automatic knowledge extraction pipeline, how learning happens continuously, and how your AI employee improves over time.
Semantic Search
How vector search and hybrid search work to find information by meaning, not just keywords.
File Sharing
How to send files to your AI employee and receive generated files back, including supported formats and download links.
Previous
Storing & Retrieving Knowledge
Knowledge Base & Memory
Next
Semantic Search
Knowledge Base & Memory

On This Page