LitScribe - Autonomous Academic Synthesis Engine
Multi-agent literature review system using Model Context Protocol for deep cross-paper synthesis and gap analysis
Overview
LitScribe is an autonomous academic synthesis engine designed to transform how researchers conduct literature reviews. Using the Model Context Protocol (MCP) and multi-agent architecture, it goes beyond simple summarization to provide deep cross-paper synthesis and gap analysis. The system acts as a “Digital Scribe” that faithfully organizes knowledge while minimizing hallucinations common in standard LLM outputs.
Problem Statement
Traditional literature reviews are time-consuming and mentally exhausting. Researchers must manually search multiple databases, download papers, extract key findings, identify conflicts between studies, and synthesize insights across dozens or hundreds of papers. Existing AI tools often produce hallucinated citations or superficial summaries that lack the depth needed for serious academic work.
Architecture
Multi-Agent Design
Agent Roles (Planned):
- Discovery Agent: Multi-source literature search and intelligent deduplication
- Critical Reading Agent: Deep analysis of individual papers with citation extraction
- Synthesis Agent: Cross-paper analysis, conflict resolution, and gap identification
Model Context Protocol Integration
MCP Servers:
- Academic database connectors (arXiv, PubMed, Google Scholar)
- Zotero library integration for reference management
- PDF processing pipeline with LaTeX support
Features
Current MVP
- Multi-source Literature Search: Query arXiv, PubMed, and Google Scholar simultaneously
- Intelligent Deduplication: Automatically identify and merge duplicate papers from different sources
- PDF-to-Markdown Conversion: Extract text with LaTeX equation support using marker-pdf
- Vector-based Semantic Search: Search within your Zotero library using embeddings
- MCP Integration: Standardized tool interface for academic databases
Planned Features
- Multi-agent Debate Mechanism: Resolve conflicting claims through structured agent debate
- Citation Traceability: Every claim backed by source PDF evidence with page references
- Apple Silicon Optimization: Local inference acceleration via MLX on M4 chips
- Structured Output: Generate publication-ready literature review sections
Technical Stack
Core Technologies
| Component | Technology |
|---|---|
| Language | Python 3.12+ |
| Package Management | Mamba |
| Agent Framework | LangGraph |
| MCP Implementation | FastMCP 2.0 |
| Local LLM | Qwen 3 (32B/14B) |
| Cloud LLM | Claude Opus/Sonnet 4.5 |
| PDF Processing | marker-pdf |
| Vector Store | Zotero + embeddings |
Design Principles
- Hallucination Minimization: All claims traceable to source documents
- Hybrid Inference: Local models for routine tasks, cloud models for complex synthesis
- Extensible MCP: Easy addition of new academic database connectors
- Privacy-First: Option for fully local processing of sensitive research
Use Cases
Academic Researchers:
- Rapid literature surveys for new research directions
- Systematic review preparation with comprehensive coverage
- Identification of research gaps and contradictions
Graduate Students:
- Thesis literature review chapters
- Understanding the state-of-the-art in a new field
- Finding seminal papers and citation networks
Industry R&D:
- Technical due diligence on emerging technologies
- Competitive landscape analysis from academic publications
- Patent prior art searches
Limitations & Future Work
Current Limitations:
- MVP phase with core features still in development
- Requires API keys for cloud LLM functionality
- PDF extraction quality varies with document formatting
Future Directions:
- Integration with more academic databases (Semantic Scholar, IEEE)
- Collaborative features for research teams
- Export to popular reference managers beyond Zotero
- Fine-tuned models for specific research domains
GitHub Repository
Timeline
Status: MVP Development (January 2025 - Present)