Transmutation Lite

Simplified TypeScript document converter for common formats to Markdown

Version: 0.6.1
Status: ✅ Production Ready - 177/177 Tests Passing (100%)

Overview

Transmutation Lite is a lightweight, TypeScript-based document conversion library that converts common document formats (PDF, DOCX, XLSX, PPTX, HTML, TXT) to Markdown. Designed specifically for integration with the HiveLLM Classify project, it provides fast, reliable conversions without the complexity of OCR, audio, video, or archive processing.

⚠️ Important: This is a simplified version optimized for use cases that do not require high precision, such as document classification. For production applications requiring high-quality conversion, advanced features, and superior performance, we strongly recommend using the full Transmutation Rust library, which offers:

98x faster performance than Docling

80%+ conversion quality with precision mode

Advanced OCR, audio, and video processing

Enterprise-grade reliability and accuracy

Use Transmutation Lite for: Document classification, quick previews, prototyping
Use Transmutation (Rust) for: RAG systems, production pipelines, high-quality document processing

Features

Core Conversion

✅ PDF to Markdown - Using pdf-parse-new for reliable text extraction
✅ DOCX to Markdown - Using mammoth for Word documents
✅ XLSX to Markdown - Using xlsx for Excel spreadsheets (tables)
✅ PPTX to Markdown - Basic text extraction from PowerPoint
✅ HTML to Markdown - Using turndown for clean conversion
✅ TXT to Markdown - Plain text normalization
✅ CLI & Library - Use as command-line tool or Node.js library
✅ TypeScript - Full type safety and IntelliSense support

Performance & Reliability

✅ Result Caching - LRU cache with SHA-256 content hashing
✅ Input Validation - Comprehensive validation with security checks
✅ Error Handling - Clear, actionable error messages
✅ Logging System - Configurable logging (DEBUG, INFO, WARN, ERROR)
✅ Metrics Collection - Production monitoring and analytics
✅ Batch Processing - Parallel file conversion
✅ Fast & Lightweight - No heavy dependencies or external tools

Developer Experience

✅ 177 Tests Passing - 100% success rate, full coverage
✅ CI/CD Ready - GitHub Actions workflows
✅ 5 Comprehensive Examples - Complete usage demonstrations
✅ Security - Path traversal protection, buffer limits (500MB)
✅ Complete Documentation - API docs, guides, examples

Quick Start

Installation

npm installation (recommended):

npm install @hivellm/transmutation-lite

Local usage (within monorepo):

# From classify project
npm install
# Uses: "file:../transmutation-lite" dependency

Library Usage

import { Converter, Logger, LogLevel } from '@hivellm/transmutation-lite';

// Simple conversion
const converter = new Converter();
const result = await converter.convertFile('./document.pdf');
console.log(result.markdown);

// With caching and logging
const converter = new Converter({
  enableCache: true,
  cacheSize: 100,
  logger: new Logger({ level: LogLevel.INFO }),
  collectMetrics: true,
});

const result = await converter.convertFile('./document.docx', {
  preserveFormatting: true,
  maxPages: 10,
});

console.log('Format:', result.metadata.format);
console.log('Pages:', result.metadata.pageCount);
console.log('Time:', result.conversionTimeMs, 'ms');

// Get metrics
const metrics = converter.getMetricsSummary();
console.log('Success Rate:', metrics.successRate);
console.log('Cache Hit Rate:', metrics.cacheHitRate);

CLI Usage

# Convert single file
npx transmutation-lite convert document.pdf -o output.md

# Convert with options
npx transmutation-lite convert report.docx -o report.md --max-pages 5

# Batch conversion
npx transmutation-lite batch ./documents -o ./output --parallel 4

# Recursive batch conversion
npx transmutation-lite batch ./docs -o ./markdown --recursive

# List supported formats
npx transmutation-lite formats

API Reference

`convert(filePath, options?)`

Convenience function to convert a single file.

Parameters:

filePath (string): Path to the file to convert
options (ConversionOptions): Optional conversion options

Returns: Promise<ConversionResult>

const result = await convert('./document.pdf', {
  preserveFormatting: true,
  maxPages: 10,
});

`Converter` Class

Main converter class that manages all format converters.

Methods

`convertFile(filePath, options?)`

Convert a file to Markdown.

const converter = new Converter();
const result = await converter.convertFile('./document.docx');

`convertBuffer(buffer, format, options?)`

Convert a buffer directly.

const buffer = await readFile('./document.pdf');
const result = await converter.convertBuffer(buffer, DocumentFormat.PDF);

`detectFormat(filePath)`

Detect document format from file extension.

const format = converter.detectFormat('./document.pdf');
// Returns: DocumentFormat.PDF

`isSupported(filePath)`

Check if a file format is supported.

const supported = converter.isSupported('./document.pdf');
// Returns: true

`getSupportedFormats()`

Get list of supported formats.

const formats = converter.getSupportedFormats();
// Returns: [DocumentFormat.PDF, DocumentFormat.DOCX, ...]

Types

`ConversionOptions`

interface ConversionOptions {
  preserveFormatting?: boolean;  // Default: true
  extractImages?: boolean;        // Default: false (not implemented)
  maxPages?: number;              // Default: 0 (all pages)
  formatOptions?: Record<string, any>;
}

`ConversionResult`

interface ConversionResult {
  markdown: string;               // Converted markdown content
  metadata: DocumentMetadata;     // Document metadata
  conversionTimeMs: number;       // Conversion time in milliseconds
  warnings?: string[];            // Any warnings during conversion
}

`DocumentMetadata`

interface DocumentMetadata {
  format: DocumentFormat;         // Original format
  fileSize: number;               // File size in bytes
  pageCount?: number;             // Number of pages/sheets/slides
  title?: string;                 // Document title
  author?: string;                // Document author
  createdAt?: Date;               // Creation date
  extra?: Record<string, any>;    // Format-specific metadata
}

`DocumentFormat` Enum

enum DocumentFormat {
  PDF = 'pdf',
  DOCX = 'docx',
  XLSX = 'xlsx',
  PPTX = 'pptx',
  TXT = 'txt',
  HTML = 'html',
  UNKNOWN = 'unknown',
}

Supported Formats

Format	Extension	Library	Quality	Notes
PDF	`.pdf`	pdf-parse	Basic	Text extraction only, no images, limited formatting
DOCX	`.docx`	mammoth	Good	Formatting support, quality depends on mammoth
XLSX	`.xlsx`, `.xls`	xlsx	Good	Converts to Markdown tables
PPTX	`.pptx`, `.ppt`	jszip (basic)	⚠️ Limited	Basic text extraction only, experimental
HTML	`.html`, `.htm`	turndown	Good	Clean Markdown conversion
TXT	`.txt`, `.md`	native	Good	Direct text handling

Integration with Classify

Transmutation Lite is designed to integrate seamlessly with the HiveLLM Classify project:

import { convert } from '@hivellm/transmutation-lite';
import { ClassifyClient } from '@hivellm/classify';

// Convert document to Markdown
const conversionResult = await convert('./contract.pdf');

// Classify the markdown content
const classifier = new ClassifyClient({
  provider: 'deepseek',
  apiKey: process.env.DEEPSEEK_API_KEY,
});

const classificationResult = await classifier.classifyText(
  conversionResult.markdown
);

console.log('Domain:', classificationResult.classification.domain);
console.log('Type:', classificationResult.classification.doc_type);

CLI Commands

`convert <file>`

Convert a single file to Markdown.

Options:

-o, --output <path> - Output file path (default: <filename>.md)
-m, --max-pages <number> - Maximum pages/sheets to process
--no-preserve-formatting - Disable formatting preservation

Examples:

# Basic conversion
transmutation-lite convert document.pdf

# Specify output
transmutation-lite convert document.pdf -o output/document.md

# Limit pages
transmutation-lite convert large.pdf --max-pages 10

# Disable formatting
transmutation-lite convert document.docx --no-preserve-formatting

`batch <directory>`

Convert all supported files in a directory.

Options:

-o, --output <path> - Output directory (default: <directory>/output)
-r, --recursive - Process subdirectories recursively
-m, --max-pages <number> - Maximum pages/sheets to process
--parallel <number> - Number of parallel conversions (default: 4)
--no-preserve-formatting - Disable formatting preservation

Examples:

# Basic batch conversion
transmutation-lite batch ./documents

# Recursive with custom output
transmutation-lite batch ./docs -o ./markdown --recursive

# High parallelism
transmutation-lite batch ./files --parallel 8

# Limit pages for all files
transmutation-lite batch ./pdfs --max-pages 5

`formats`

List all supported file formats.

transmutation-lite formats

Examples

Convert PDF with Metadata

import { Converter } from '@hivellm/transmutation-lite';

const converter = new Converter();
const result = await converter.convertFile('./research-paper.pdf');

console.log('Title:', result.metadata.title);
console.log('Author:', result.metadata.author);
console.log('Pages:', result.metadata.pageCount);
console.log('Created:', result.metadata.createdAt);
console.log('Producer:', result.metadata.extra?.producer);

Batch Processing with Progress

import { Converter } from '@hivellm/transmutation-lite';
import { readdir } from 'fs/promises';
import { join } from 'path';

const converter = new Converter();
const files = await readdir('./documents');

let completed = 0;
for (const file of files) {
  const filePath = join('./documents', file);
  
  if (converter.isSupported(filePath)) {
    try {
      const result = await converter.convertFile(filePath);
      await writeFile(
        `./output/${file}.md`,
        result.markdown
      );
      completed++;
      console.log(`✅ [${completed}/${files.length}] ${file}`);
    } catch (error) {
      console.error(`❌ Failed: ${file}`, error);
    }
  }
}

Excel to Markdown Tables

import { convert } from '@hivellm/transmutation-lite';

const result = await convert('./data.xlsx');

// Output will be Markdown tables:
// # Sheet1
//
// | Column A | Column B | Column C |
// | --- | --- | --- |
// | Value 1 | Value 2 | Value 3 |
// | Value 4 | Value 5 | Value 6 |

console.log(result.markdown);

Error Handling

import { convert, ConversionError } from '@hivellm/transmutation-lite';

try {
  const result = await convert('./document.pdf');
  console.log(result.markdown);
} catch (error) {
  if (error instanceof ConversionError) {
    console.error('Conversion failed:', error.message);
    console.error('Format:', error.format);
    console.error('Cause:', error.cause);
  } else {
    console.error('Unexpected error:', error);
  }
}

Performance

Note: Performance estimates below are theoretical. Real-world performance may vary based on document complexity and system specifications.

Estimated conversion times:

Format	File Size	Pages	Estimated Time	Notes
PDF	2 MB	15	~200-500ms	Text extraction only, depends on pdf-parse
DOCX	500 KB	20	~150-300ms	Depends on document complexity
XLSX	1 MB	10 sheets	~100-200ms	Table conversion
PPTX	3 MB	30 slides	~300-600ms	Basic text extraction via jszip
HTML	200 KB	-	~50-100ms	Turndown conversion

Important: For production use requiring guaranteed performance and quality, use the full Transmutation Rust library.

Limitations

No OCR - PDF images are not extracted or processed
No Audio/Video - Audio and video files are not supported
No Archives - ZIP, TAR, and other archives are not supported
PPTX Quality - PowerPoint conversion is basic (text-only)
Images - Image extraction is not implemented in this lite version
Lower Precision - Text extraction quality is lower than the Rust version

When to Use Transmutation Lite vs Full Transmutation

Feature	Transmutation Lite (TypeScript)	Transmutation (Rust)
Best For	Document classification, quick previews	RAG systems, production pipelines
Precision	⚠️ Basic (untested, depends on libraries)	✅ Excellent (80%+ tested)
Performance	Moderate (Node.js)	✅ 98x faster than Docling (tested)
Memory	Moderate (Node.js overhead)	✅ Very Low (~20MB tested)
OCR Support	❌ No	✅ Yes (Tesseract)
Audio/Video	❌ No	✅ Yes (Whisper)
Archives	❌ No	✅ Yes (ZIP, TAR, etc.)
Setup	✅ npm install (local)	Requires Rust/binary
Integration	✅ Easy (Node.js)	Moderate (CLI/FFI)
Status	✅ Production Ready	✅ Production Ready

Recommendation: For production use cases requiring high-quality document conversion, we strongly recommend using the full Transmutation Rust library. Transmutation Lite is ideal for:

Quick document classification (like HiveLLM Classify)
Prototyping and development
Simple text extraction where precision is not critical
Node.js-only environments without Rust toolchain

Development

Build

npm run build

Test

npm test

Lint

npm run lint
npm run lint:fix

Type Check

npm run type-check

Project Structure

transmutation-lite/
├── src/
│   ├── converters/
│   │   ├── base.ts       # Base converter interface
│   │   ├── pdf.ts        # PDF converter
│   │   ├── docx.ts       # DOCX converter
│   │   ├── xlsx.ts       # XLSX converter
│   │   ├── pptx.ts       # PPTX converter
│   │   ├── html.ts       # HTML converter
│   │   ├── txt.ts        # TXT converter
│   │   └── index.ts      # Converter exports
│   ├── types.ts          # Type definitions
│   ├── index.ts          # Main library export
│   └── cli.ts            # CLI implementation
├── tests/                # Test files
├── dist/                 # Compiled output
├── package.json
├── tsconfig.json
└── README.md

Related Projects

Classify - Intelligent document classification using LLMs (uses Transmutation Lite)
Transmutation ⭐ - Recommended: Full-featured Rust document converter with 98x performance and 80%+ quality
Vectorizer - Vector database and search engine (uses full Transmutation)
Nexus - Graph database with vector search

License

MIT

Contributing

This project follows the HiveLLM ecosystem standards:

TypeScript 5.x with strict mode
Tests with vitest (currently basic coverage)
Clear documentation
Semantic versioning
Conventional commits

Current Status: Production ready - 177/177 tests passing, fully documented, ready for npm publication.

Changelog

See CHANGELOG.md for version history and release notes.

Contact: HiveLLM Development Team

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
.cursor/commands		.cursor/commands
.github/workflows		.github/workflows
benchmarks		benchmarks
docs		docs
examples		examples
openspec		openspec
scripts		scripts
src		src
tests		tests
.codespellignore		.codespellignore
.cursorrules		.cursorrules
.gitignore		.gitignore
.npmignore		.npmignore
.prettierrc		.prettierrc
.rulebook		.rulebook
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
debug-pdf.cjs		debug-pdf.cjs
eslint.config.js		eslint.config.js
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
tsup.config.ts		tsup.config.ts
vitest.config.ts		vitest.config.ts

License

hivellm/transmutation-lite

Folders and files

Latest commit

History

Repository files navigation

Transmutation Lite

Overview

Features

Core Conversion

Performance & Reliability

Developer Experience

Quick Start

Installation

Library Usage

CLI Usage

API Reference

convert(filePath, options?)

Converter Class

Methods

convertFile(filePath, options?)

convertBuffer(buffer, format, options?)

detectFormat(filePath)

isSupported(filePath)

getSupportedFormats()

Types

ConversionOptions

ConversionResult

DocumentMetadata

DocumentFormat Enum

Supported Formats

Integration with Classify

CLI Commands

convert <file>

batch <directory>

formats

Examples

Convert PDF with Metadata

Batch Processing with Progress

Excel to Markdown Tables

Error Handling

Performance

Limitations

When to Use Transmutation Lite vs Full Transmutation

Development

Build

Test

Lint

Type Check

Project Structure

Related Projects

License

Contributing

Changelog

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

`convert(filePath, options?)`

`Converter` Class

`convertFile(filePath, options?)`

`convertBuffer(buffer, format, options?)`

`detectFormat(filePath)`

`isSupported(filePath)`

`getSupportedFormats()`

`ConversionOptions`

`ConversionResult`

`DocumentMetadata`

`DocumentFormat` Enum

`convert <file>`

`batch <directory>`

`formats`

Packages