Production email automation platform using LangGraph multi-agent workflows for intelligent email classification, prioritization, and response generation with vector-based memory retrieval.
Multi-agent pipeline with conditional routing and shared state management
Pinecone-powered semantic search for contextual email responses
OpenAI GPT-4o + DeepSeek V3 on RunPod RTX 4090 for cost optimization
FastAPI with automatic documentation and JWT authentication
Multi-agent orchestration with state management and conditional routing
Dual inference setup with cost optimization and fallback
Semantic search and contextual retrieval pipeline
High-performance async backend with comprehensive security
Gmail API integration with robust error handling
Multi-modal data storage with real-time synchronization
Modern React application with real-time capabilities
Production deployment with comprehensive observability
Multi-class classification (urgent, inquiry, booking, complaint)
Average time from email ingestion to response generation
Peak processing capacity with concurrent workflows
Average rating from user feedback on response quality
Vector memory retrieval relevance scoring
6-month operational reliability
FastAPI provides automatic OpenAPI documentation, native async support for email processing, and better performance for I/O-heavy operations like API calls to Gmail/OpenAI. Django's ORM would be overkill since we're primarily doing API orchestration, not complex database relationships.
We use SlowAPI for basic rate limiting (e.g., @limiter.limit('10/minute') on endpoints) and Google's credentials refresh mechanism. When tokens expire, the GmailService automatically refreshes them and updates Firestore. For bulk operations, we process emails in batches of 5 per user.
Cost optimization was critical for sustainable scaling. OpenAI's GPT-4o costs $0.03/1K tokens while DeepSeek offers comparable performance at 70% lower cost. RunPod provides GPU instances optimized for inference with auto-scaling, reducing our monthly inference costs from $800+ to under $300 while maintaining response quality and <2.5s latency.
The EmbeddingStore class supports Pinecone for vector similarity search. We need semantic search across email content for the memory retrieval step in LangGraph. Setting up pgvector would require more infrastructure management compared to managed vector databases.
We use a ThreadPoolExecutor (max 10 workers) in EmailProcessor to handle concurrent email processing. Each email gets processed independently through the LangGraph pipeline. Since we're using Firestore's atomic operations for user data updates, basic concurrency is handled at the database level.
LangGraph manages the email processing pipeline: classify → prioritize → retrieve_similar → generate_response. Each step can access shared state and the conditional routing lets us handle different email types. The EmailAgentState tracks progress through the workflow and handles errors at each node.