Getting started
KatCore is a browser-based workspace. There's nothing to install and no API to wire up — sign up, upload a file, and start asking questions.
Sales or Support).Ingesting data
Drag and drop up to 50 files per batch with real per-file progress. Each file can be up to 100 MB and is normalized, cleaned, and stored automatically.
Supported formats
Multi-sheet XLSX lets you pick the sheet. Documents (PDF, DOCX, MD, TXT) are chunked and embedded for retrieval.
Four ways to bring data in
Drag-drop or browse. Batch up to 50 files at once.
Pull from a public or authenticated URL.
REST GET/POST with Bearer, API-key, or Basic auth and JSON record-path extraction.
PostgreSQL, MySQL, SQL Server. MongoDB coming soon.
Every file is versioned. Re-ingesting the same source creates a new version with full lineage — your history is never overwritten.
Understanding your data
On ingest, KatCore labels every column (id, email, date, currency, region, PII…) with a deterministic 5-step pipeline, then writes a natural-language description of each column in the background. These descriptions ground both the audit and the chat.
The Readiness Audit
The audit produces a fully-explainable 0–100 AI-Readiness Score with a letter grade. It's signal-based, not a vibe — and every point is traceable to a specific issue.
The six dimensions
| Dimension | Weight | What it measures |
|---|---|---|
| Completeness | 25% | How much data is actually present — missing (null) and blank/whitespace values. |
| Validity | 25% | Whether values are well-formed and plausible — IQR outliers, malformed emails/phones, unparseable dates, business-rule violations. |
| Uniqueness | 15% | Freedom from unintended duplicates — repeated identifiers and exact duplicate rows. |
| PII Exposure | 15% | Sensitive personal data left in the clear — unmasked emails, phones, names, addresses. |
| Consistency | 10% | One canonical convention — variant spellings ("USA" vs "U.S.A.") and non-standard column names. |
| Semantic Completeness | 10% | Every column documented so AI and analysts understand it. |
The fix-list
Each audit returns a severity-ranked fix-list (critical / high / medium / low) where every fix shows the exact points it will recover — and those points provably sum to 100 − your score. The checklist literally is your score, decomposed. Each fix carries sample evidence: the actual offending values and 0-based row indices (up to 20), IQR fences for outliers, and duplicate groups.
One-click cleaning
Each issue maps to a suggested action — entity resolution, schema standardization, PII masking, imputation, anomaly quarantine, smart date parsing, or fix all. Preview the before/after rows and impact counts, then apply to produce a cleaned new version of the file.
Unmasked PII is always scored critical. The UI shows a projected score after fixes so you know the payoff before you commit.
Asking questions (Kat)
Ask a question in natural language. An intent classifier routes it, semantic search finds the right files (grounded by the auto-descriptions), and a plan → execute → synthesize loop over DuckDB returns a written answer with the numbers and the source file cited. No SQL required.
# You ask "What was the growth trend of SaaS subscriptions in Q3 vs Q2?" # Kat answers SaaS subscriptions grew 14.2% in Q3, driven by the Enterprise tier (+22% seats). Source: sales_report.csv
Phrase questions the way you'd ask a colleague. Every answer cites the file it came from, so you can trace the number back to its source.
Schedules
Set up cron-driven recurring ingestion from a URL or API. Schedules are timezone-aware (IANA), can be paused and resumed, track failures, and auto-disable after repeated failures.
Smart polling
KatCore caches ETag and Last-Modified. If the upstream source hasn't changed, the run is a no-op — no duplicate data. Versioned lineage means each refresh is a new version of the same file.
A schedule like 0 6 * * * in UTC runs daily at 06:00 UTC — and silently skips when the source is unchanged.
Notebooks & Artifacts
An artifact is a Jupyter .ipynb notebook that lives inside KatCore: viewable, cell-by-cell editable (markdown + code, GitHub-flavored, syntax-highlighted), and downloadable as a real .ipynb to share or open anywhere.
The auto-generated Data Quality Report
Every time you run an audit, KatCore generates a Quality Report notebook — no setup. It contains:
Grade, per-dimension breakdown, and prioritized fixes.
Natural-language per-column findings with evidence — null row indices, sample values, outlier ranges.
Entity mappings, schema renames, PII masking rules, imputation strategies, date patterns.
A single ready-to-run block that applies every fix.
It's interactive, not a dead PDF. Edit cells in place, tweak the remediation, re-run the audit to watch the score climb, then download to share with your team.
Bring your own
Upload your own notebooks (≤ 10 MB, validated nbformat 4) and attach them to any dataset file.
FAQ
Need help?
Questions, demos, or enterprise — we'll get back to you.