Transform Literature into
Graphic Novels

An open-source AI pipeline that generates 100+ page graphic novels with consistent characters, dynamic layouts, and proper speech bubbles. Batch-first architecture designed for long-form text adaptation.

Python Gemini 2M Context OpenAI gpt-image-1 7-Pass Pipeline PDF / EPUB Export MIT License

How It Works

Three steps from plain text to a complete graphic novel.

1

Upload

Upload any public domain book as a .txt file. Grab one from Project Gutenberg.

2

Customize

Choose from 14 art styles (watercolor, manga, noir, botanical...), 14 narrative tones, page count, and era constraints.

3

Generate

The pipeline writes scripts, generates character references, illustrates panels, composes pages, and exports to PDF/EPUB.

Architecture

A 3-agent pipeline with a 7-pass scripting enrichment system.

1 Scripting Agent

7-pass enrichment pipeline

0
Global Context

Era constraints, technology level, visual constants

1
Beat Analysis

Intensity scores, visual potential, page-turn hooks

1.5
Adaptation Filter

Essential / condensable / cuttable scenes

2
Director Pass

Page blueprint, spread awareness, cliffhangers

3
Character Deep Dive

Arc typing, voice profiles, scene-specific gear

4
Asset Manifest

Color signatures, location palettes, color script

5
Scriptwriter

Parallel per-page scripts with shot types, panel sizes

6
Validation + Auto-Fix

Anachronism detection, dialogue length enforcement

2 Illustrator Agent

Reference sheets + panel generation

Character Reference Sheets

Generates visual reference images for every character before panel generation. Multiple candidates scored by an LLM judge. Ensures consistent appearance across all panels.

Panel Generation

Each panel is generated with character references passed as PIL image objects for visual consistency. 3-tier model fallback ensures generation completes even during outages.

3-Tier Model Fallback

Primary: gemini-3-pro-image (4096px)
Fallback: gemini-2.5-flash-image
Last resort: gemini-2.5-flash-image (1024px)

3 Compositor Agent

Layout, composition, and export

Story-Aware Layouts

Panel grids dynamically sized based on shot type and scene importance. Supports establishing shots (full width), medium shots, close-ups, and multi-panel sequences.

Text Overlays

Speech bubbles and captions rendered via PIL (not baked into AI images). Bubble placement uses negative space analysis to avoid covering key visual elements. Multi-speaker dialogue with color-coded bubbles.

Export

PDF output via ReportLab with proper page dimensions. EPUB output with embedded images for e-reader compatibility. Resume capability via manifest tracking.

Key Handoffs Between Passes

Beat Analysis → Director: Visual potential scores inform page allocation and splash decisions

Adaptation Filter → Director: Essential/condensable guidance shapes pacing

Character Arcs → Scriptwriter: Voice profiles ensure dialogue consistency

Blueprint → Scriptwriter: Adjacent page context (prev/next) for continuity

Art Styles & Tones

14 art styles with cinematic prompt fragments, plus 14 narrative tones.

Art Styles

Lush Watercolor Validated

Soft color bleeds, visible paper texture, ethereal lighting, dreamlike quality

Classic Comic Book Validated

Bold outlines, flat colors, Ben-Day dots, dynamic poses, classic 4-color printing aesthetic

Ligne Claire Validated

Uniform line weight, flat colors, clean precise linework, Hergé/Moebius influence

Manga / Anime Validated

Manga panel conventions, clean black linework, dynamic angles, expressive eyes, speed lines

Gritty Noir Validated

Heavy shadows, high contrast, limited palette, chiaroscuro lighting, Frank Miller influence

Botanical Illustration Validated

Precise detail, scientific accuracy, natural color palette, fine line work, naturalist style

Ukiyo-e Woodblock Experimental

Flat areas of color, bold outlines, woodblock print texture, Japanese composition

Cyberpunk Neon Experimental

Neon lighting, dark backgrounds, holographic effects, rain-slicked surfaces

Art Nouveau Experimental

Flowing organic lines, decorative borders, natural motifs, Mucha-inspired

Indian Miniature Experimental

Flat perspective, rich colors, gold detailing, intricate patterns

Vintage Pulp Experimental

Dramatic lighting, saturated colors, action-oriented composition, retro aesthetic

Sketch / Pencil Experimental

Graphite texture, cross-hatching, visible pencil strokes, tonal gradation

Minimalist Line Art Experimental

Single weight lines, minimal detail, negative space, clean composition

Chiaroscuro Experimental

Dramatic light-dark contrast, Caravaggio-inspired, volumetric lighting

Narrative Tones

Heroic Suspenseful Melancholic Whimsical Dark Fantasy Educational Philosophical Satirical Romantic Contemplative Tragic Noir Detective Cosmic Horror Gothic

Paths Explored: Gemini vs OpenAI

Two backends, different tradeoffs. Here's an honest breakdown with real numbers.

Google Gemini (Primary)

Vertex AI production backend

Context Caching

The full book (up to 2M tokens) is cached once. Subsequent passes get a 90% discount on input tokens. For a 50K-word novel, this saves ~$0.13 per run.

3-Tier Image Fallback

If the primary model (gemini-3-pro) is down or quota-exhausted, the pipeline automatically falls back to flash models. No manual intervention.

Native Multimodal

Character reference images are passed directly as PIL objects in the same API call as the prompt. No file upload step.

Cost (50K-word novel, 50 pages)

Scripting (7 passes)$0.34
Cache savings-$0.13
Images (200 panels)$26.80
Character refs (5 chars)$2.01
Total~$29

OpenAI (Alternative)

gpt-image-1 backend

Batch API

OpenAI's Batch API offers a 50% cost discount with a 24-hour SLA. Ideal for non-interactive production runs where latency doesn't matter.

File-Based References

Character references are uploaded via OpenAI's Files API and referenced by ID. Different from Gemini's inline approach, but achieves the same consistency goal.

Drop-In Replacement

A factory pattern (get_image_agents()) swaps the entire image pipeline by changing one config value. Scripting still uses Gemini (no equivalent for 2M context caching in OpenAI).

Tradeoffs

+ 50% cheaper via Batch API
+ Different model = different failure modes (redundancy)
- 24h latency for batch mode
- No context caching (scripting still on Gemini)
- File upload overhead for reference images

Why Dual Backends?

Image generation APIs have outages, quota limits, and regional availability issues. During development, Gemini 3 Pro Image was often unavailable outside us-central1. Having OpenAI as a fallback meant production runs could continue regardless. The factory pattern keeps the switching cost at zero — one config change, no code changes.

Repository Structure

17K+ lines of Python across a clean, modular architecture.

illustrative/
agents/                  # Core pipeline agents
  scripting/             # 7-pass enrichment pipeline
    passes/              # Individual pass implementations
  openai/                # OpenAI alternative backend
  panel_agent.py         # Panel image generation
  reference_agent.py     # Character reference sheets
  compositor_agent.py    # Page layout + text overlay
  layout_agent.py        # Dynamic panel grid generation
  export_agent.py        # PDF/EPUB export
validators/              # Quality assurance
  pre_validators.py      # Prompt validation (saves tokens)
  post_validators.py     # Output quality checking
  consistency.py         # Character consistency audit
  composition.py         # Bubble placement, cropping
ui/                      # Streamlit interface
  pages/                 # Home, auth, dashboard, generate
models/                  # SQLAlchemy ORM models
migrations/              # Alembic database migrations
storage/                 # S3/bucket storage utilities
tests/                   # 12 test files, 4K+ lines
app.py                   # Streamlit entry point
production_run.py        # CLI for full production runs
config.py                # Centralized configuration
cost_calculator.py       # Pay-per-use pricing engine
utils.py                 # Retry logic, rate limiting, manifest

Getting Started

Clone, configure, and generate your first graphic novel.

Installation
git clone https://github.com/arvindang/illustrative.git
cd illustrative
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
API Configuration (choose one)

Option A: AI Studio (simple, free tier)

# .env
GEMINI_API_KEY=your_key_here

Get a key at aistudio.google.com

Option B: Vertex AI (production, no daily limits)

# .env
GOOGLE_CLOUD_PROJECT=your-project
GOOGLE_CLOUD_LOCATION=us-central1
GOOGLE_GENAI_USE_VERTEXAI=true

Then run: gcloud auth application-default login

Run

Test Mode (Streamlit UI)

streamlit run app.py

Production Mode (full book)

python production_run.py