The cloud workspace for your data

Understand your data the moment you upload it.

Upload a spreadsheet, connect a database or API. KatCore labels every column, scores its quality, and answers your questions in natural language — all in your browser. No SQL, no setup.

Get Started Free See how it works

app.katcore.io / sources

customers_q3.csv12,402 ROWS · 2.4 MB · CSV · v3 ReadyB87 / 100

Column	Type	Auto description
email	PII	Customer email address — unmasked, flagged for exposure
signup_date	Date	Account creation date, parsed as ISO-8601
mrr	Currency	Monthly recurring revenue, USD
region	Region	Sales territory — 6 distinct values

What was the growth trend of SaaS subscriptions in Q3 vs Q2?

Kat

SaaS subscriptions grew 14.2% in Q3, driven by the Enterprise tier (+22% seats). Source: sales_report.csv

Drop in any format

CSVJSONXLSXPDFTXTDOCXMDParquetHTML

The flow

From messy file to trusted answers — in minutes.

Bring data in, understand and trust it, keep it fresh, and walk away with a report. Four steps, one workspace, zero pipelines to build.

01 Bring any data in

Drop it in. KatCore handles the rest.

Drag and drop your files with real per-file progress — or pull straight from a public URL, a REST API, or a SQL database. Parsing, cleaning, and storage happen for you.

Spreadsheets, PDFs, databases, or live APIs — drop them in and KatCore handles parsing, cleaning, and storage. No pipelines to build.

File URL API Database

Drop files to upload

CSV · JSON · XLSX · PDF · TXT · DOCX · MD · Parquet · HTML
up to 50 files · 100 MB each

orders_2025.xlsx100%

support_logs.pdf68%

events.parquet34%

02 Understand & trust it · the centerpiece

A 0–100 trust score for any dataset — and the exact fixes.

On ingest, every column is labeled and described automatically. Then the Readiness Audit scores your data across six weighted dimensions — fully explainable, no black box — and hands you a prioritized fix-list where every fix shows the points it recovers. The checklist literally is your score, decomposed.

KatCore reads every column, scores how trustworthy your data is, and answers your questions in natural language — so you can act, not audit.

GRADE B

AI-Readiness Score

Good

+9 projected after fixes → 96

Completeness 25%92

Validity 25%78

Uniqueness 15%95

PII Exposure 15%60

Consistency 10%84

Semantic 10%100

Fix these to improve your score4 issues

Unmasked PII in email

312 unmasked addresses · always critical

+15.0 pts

Outliers detected in mrr

IQR fences 12–840 · 47 rows outside

+8.5 pts

Unparseable dates in signup_date

rows 14, 89, 203 … (+19 more)

+4.0 pts

Inconsistent values in region

"USA" vs "U.S.A." · 2 variants

+1.5 pts

Suggested cleaning actions · preview → apply

Before

jordan.lee@acme.io

After · PII masking

j••••••@acme.io

Which region had the highest churn last quarter?

Kat

EMEA had the highest churn at 6.8%, nearly double the 3.5% global average — concentrated in the SMB segment. Source: customers_q3.csv

03 Keep it fresh

Refresh on a schedule. Skip the pull when nothing changed.

Point KatCore at a URL or API and it re-ingests on a timezone-aware cron. Smart polling caches ETag and Last-Modified, so an unchanged source is a no-op — never duplicate data.

Point KatCore at a URL or API and it refreshes on your schedule — and skips the pull entirely when nothing changed.

exchange_rates · daily 06:000 6 * * * · UTC · next run in 4hUnchanged — skipped

inventory_api · every 15 min*/15 * * * * · pulled 2 min agoNew version v8

crm_contacts · hourly0 * * * * · Bearer auth · lineage trackedv23

04 Walk away with a report

Every audit becomes a notebook you can edit, re-run, and download.

The audit produces a real Jupyter notebook inside KatCore — an AI-Readiness scorecard, a natural-language narrative of findings with evidence, and a single DuckDB SQL block that applies every fix. Edit cells in place, re-run the audit to watch the score climb, or download the notebook to open anywhere.

Score, findings, evidence, and the exact SQL to fix your data — as a live notebook, not a dead PDF. Bring your own notebooks too.

Quality Report Upload Notebook

Data_Quality_Report.ipynb .ipynb

Markdown

AI-Readiness Scorecard

87/ 100 · Grade B · Good

Completeness 92 · Validity 78 · Uniqueness 95 · PII 60 · Consistency 84 · Semantic 100

Markdown

Findings. Column email holds 312 unmasked addresses (critical). mrr shows 47 outliers beyond the IQR fence [12, 840]. signup_date has 22 unparseable values at rows 14, 89, 203…

DuckDB SQL · remediation

-- one block applies every fix CREATE TABLE customers_clean AS SELECT mask_email(email) AS email, try_strptime(signup_date, '%Y-%m-%d') AS signup_date, CASE WHEN region IN ('U.S.A.') THEN 'USA' ELSE region END AS region FROM customers_q3 WHERE mrr BETWEEN 12 AND 840;

Cross-dataset chat

Ask across many datasets.

Kat doesn't stop at a single file. It pulls in every dataset your question touches, registers them together, and writes one query that joins them — CTEs, window functions, and all — then answers in natural language with every source cited.

app.katcore.io / chat

sales_2025.csv+ returns_2025.csv2 datasets · 1 query

Which regions are losing the most to returns, relative to sales this year?

Kat

EMEA has the steepest return rate at 9.4% — $84.2K refunded against $897K in sales, more than double the 4.1% company average. APAC follows at 6.7%. Sources: sales_2025.csv · returns_2025.csv

DuckDB SQL · one query, two datasets

-- one query spans both datasets WITH sales AS ( SELECT region, SUM(amount) AS revenue FROM sales_2025 GROUP BY region ), returns AS ( SELECT region, SUM(amount) AS refunded FROM returns_2025 GROUP BY region ) SELECT s.region, round(r.refunded / s.revenue * 100, 1) AS return_rate FROM sales s JOIN returns r USING (region) ORDER BY return_rate DESC;

Real AI, exactly where it helps.

KatCore puts large language models and embeddings to work where they earn it — understanding your data, answering your questions, and cleaning the mess. The numbers you rely on stay deterministic.

It understands your columns

On ingest, an LLM reads and describes every column, and embeddings map what your data means — so search and answers are grounded in your real schema, not guesswork.

LLM + embeddings

It answers in natural language

Ask Kat a question. It reads your intent, finds the right files by meaning, writes and runs the query, then explains the result — with the source file cited.

Intent · retrieval · synthesis

It cleans with context

AI reconciles "USA" vs "U.S.A.", flags values that don't belong, parses messy dates, and detects PII — then writes the exact fix. You preview before anything changes.

LLM + embeddings + NER

Grounded, not guessed

Kat never invents a number. Every answer is computed from your actual data and traces back to the rows and the source file it came from — so you can verify it, not just trust it.

Computed · traceable

Scores, statistics, and duplicate detection stay deterministic and fully explainable — no model guesswork in the numbers you trust.

Why KatCore

Built to be trusted with your data.

Automatic understanding

Every column is labeled and described on ingest. KatCore understands your data before you even ask.

Natural-language answers

Ask a question, get a written answer with the numbers and the source file cited. No SQL required.

Trust score & quality

A 0–100 readiness score with a point-by-point fix-list. Know exactly what's wrong and how to fix it.

Isolated & encrypted

Every workspace is logically separated and encrypted at rest. Your data stays yours.

Built on infrastructure you already trust

Polars DuckDB Cloudflare R2 Railway Postgres

Pricing

Pricing that scales.

Start free and upload your first dataset in minutes. Upgrade when your team grows.

Individual

Free

$0/mo

100 AI credits / mo
1 project · 100 MB storage
File upload, semantic & chat
Community support

Get Started Free

Go from messy file to trusted answers — today.

Get Started Free Read the docs