An open-source AI pipeline that generates 100+ page graphic novels with consistent characters, dynamic layouts, and proper speech bubbles. Batch-first architecture designed for long-form text adaptation.
20,000 Leagues Under the Sea by Jules Verne — Botanical Illustration style — 10 pages
Maintaining consistent diving gear and period-accurate equipment across underwater scenes remains one of the harder consistency challenges.
Three steps from plain text to a complete graphic novel.
Choose from 14 art styles (watercolor, manga, noir, botanical...), 14 narrative tones, page count, and era constraints.
The pipeline writes scripts, generates character references, illustrates panels, composes pages, and exports to PDF/EPUB.
A 3-agent pipeline with a 7-pass scripting enrichment system.
7-pass enrichment pipeline
Era constraints, technology level, visual constants
Intensity scores, visual potential, page-turn hooks
Essential / condensable / cuttable scenes
Page blueprint, spread awareness, cliffhangers
Arc typing, voice profiles, scene-specific gear
Color signatures, location palettes, color script
Parallel per-page scripts with shot types, panel sizes
Anachronism detection, dialogue length enforcement
Reference sheets + panel generation
Generates visual reference images for every character before panel generation. Multiple candidates scored by an LLM judge. Ensures consistent appearance across all panels.
Each panel is generated with character references passed as PIL image objects for visual consistency. 3-tier model fallback ensures generation completes even during outages.
gemini-3-pro-image (4096px)
gemini-2.5-flash-image
gemini-2.5-flash-image (1024px)
Layout, composition, and export
Panel grids dynamically sized based on shot type and scene importance. Supports establishing shots (full width), medium shots, close-ups, and multi-panel sequences.
Speech bubbles and captions rendered via PIL (not baked into AI images). Bubble placement uses negative space analysis to avoid covering key visual elements. Multi-speaker dialogue with color-coded bubbles.
PDF output via ReportLab with proper page dimensions. EPUB output with embedded images for e-reader compatibility. Resume capability via manifest tracking.
Beat Analysis → Director: Visual potential scores inform page allocation and splash decisions
Adaptation Filter → Director: Essential/condensable guidance shapes pacing
Character Arcs → Scriptwriter: Voice profiles ensure dialogue consistency
Blueprint → Scriptwriter: Adjacent page context (prev/next) for continuity
14 art styles with cinematic prompt fragments, plus 14 narrative tones.
Soft color bleeds, visible paper texture, ethereal lighting, dreamlike quality
Bold outlines, flat colors, Ben-Day dots, dynamic poses, classic 4-color printing aesthetic
Uniform line weight, flat colors, clean precise linework, Hergé/Moebius influence
Manga panel conventions, clean black linework, dynamic angles, expressive eyes, speed lines
Heavy shadows, high contrast, limited palette, chiaroscuro lighting, Frank Miller influence
Precise detail, scientific accuracy, natural color palette, fine line work, naturalist style
Flat areas of color, bold outlines, woodblock print texture, Japanese composition
Neon lighting, dark backgrounds, holographic effects, rain-slicked surfaces
Flowing organic lines, decorative borders, natural motifs, Mucha-inspired
Flat perspective, rich colors, gold detailing, intricate patterns
Dramatic lighting, saturated colors, action-oriented composition, retro aesthetic
Graphite texture, cross-hatching, visible pencil strokes, tonal gradation
Single weight lines, minimal detail, negative space, clean composition
Dramatic light-dark contrast, Caravaggio-inspired, volumetric lighting
Two backends, different tradeoffs. Here's an honest breakdown with real numbers.
Vertex AI production backend
The full book (up to 2M tokens) is cached once. Subsequent passes get a 90% discount on input tokens. For a 50K-word novel, this saves ~$0.13 per run.
If the primary model (gemini-3-pro) is down or quota-exhausted, the pipeline automatically falls back to flash models. No manual intervention.
Character reference images are passed directly as PIL objects in the same API call as the prompt. No file upload step.
| Scripting (7 passes) | $0.34 |
| Cache savings | -$0.13 |
| Images (200 panels) | $26.80 |
| Character refs (5 chars) | $2.01 |
| Total | ~$29 |
gpt-image-1 backend
OpenAI's Batch API offers a 50% cost discount with a 24-hour SLA. Ideal for non-interactive production runs where latency doesn't matter.
Character references are uploaded via OpenAI's Files API and referenced by ID. Different from Gemini's inline approach, but achieves the same consistency goal.
A factory pattern (get_image_agents()) swaps the entire image pipeline by changing one config value.
Scripting still uses Gemini (no equivalent for 2M context caching in OpenAI).
Image generation APIs have outages, quota limits, and regional availability issues.
During development, Gemini 3 Pro Image was often unavailable outside us-central1.
Having OpenAI as a fallback meant production runs could continue regardless.
The factory pattern keeps the switching cost at zero — one config change, no code changes.
17K+ lines of Python across a clean, modular architecture.
illustrative/
agents/ # Core pipeline agents
scripting/ # 7-pass enrichment pipeline
passes/ # Individual pass implementations
openai/ # OpenAI alternative backend
panel_agent.py # Panel image generation
reference_agent.py # Character reference sheets
compositor_agent.py # Page layout + text overlay
layout_agent.py # Dynamic panel grid generation
export_agent.py # PDF/EPUB export
validators/ # Quality assurance
pre_validators.py # Prompt validation (saves tokens)
post_validators.py # Output quality checking
consistency.py # Character consistency audit
composition.py # Bubble placement, cropping
ui/ # Streamlit interface
pages/ # Home, auth, dashboard, generate
models/ # SQLAlchemy ORM models
migrations/ # Alembic database migrations
storage/ # S3/bucket storage utilities
tests/ # 12 test files, 4K+ lines
app.py # Streamlit entry point
production_run.py # CLI for full production runs
config.py # Centralized configuration
cost_calculator.py # Pay-per-use pricing engine
utils.py # Retry logic, rate limiting, manifest
Clone, configure, and generate your first graphic novel.
git clone https://github.com/arvindang/illustrative.git
cd illustrative
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# .env
GEMINI_API_KEY=your_key_here
Get a key at aistudio.google.com
# .env
GOOGLE_CLOUD_PROJECT=your-project
GOOGLE_CLOUD_LOCATION=us-central1
GOOGLE_GENAI_USE_VERTEXAI=true
Then run: gcloud auth application-default login
streamlit run app.py
python production_run.py