Skip to content

data morph

Local CSV / JSON / TXT file conversion with a distilled Gemma model.

PyPI Model Dataset

data morph distills a file-format-conversion capability from Claude Opus into a 2.0 GB Gemma student that runs locally β€” so you can convert between CSV, JSON, and TXT for free instead of paying for frontier-LLM API calls. AI Builders 2026.

from datamorph import convert_file

result = convert_file("contacts.csv", "contacts.json")
print(result.accepted, result.scores)

How it works, in one picture

The model never sees the full source file β€” only a small metadata envelope. From that, it writes a Python script that does the conversion, which is then run in a sandbox and validated.

Pipeline architecture

See How it works for the full five-stage pipeline.

Why a small local model?

Rule-based parsers can't handle messy, context-dependent conversions. Frontier LLMs can, but they're expensive at scale. data morph narrows the task from "transform a whole file" to "read metadata, write a script" β€” realistic for a 2 B model, and it scales to arbitrary file sizes because the model never reads full file content.

Results

A three-step model surgery (fuse LoRA adapter β†’ strip vision/audio towers β†’ prune the vocabulary 262 k β†’ 16 k β†’ 8-bit quantize) shrinks the student 9.6 GB β†’ 2.0 GB (βˆ’79 %) while staying at ~96 % of teacher accuracy.

Artifact params size retry≀3 % of teacher
fused + text-only + vocab-16k, bf16 2.05 B 3.8 GB 69/70 (0.986) ~99 %
+ 8-bit (shipped) 2.05 B 2.0 GB 67/70 (0.957) ~96 %

Model size shrink

Next steps

  • Quickstart β€” install and run your first conversion.
  • Showcase β€” real, messy files converted end to end.
  • API reference β€” convert_file, ConversionResult, resolve_model.