Mailix: AI Email Automation Platform

Production email automation platform using LangGraph multi-agent workflows for intelligent email classification, prioritization, and response generation with vector-based memory retrieval.

Business Impact

1,000+

Active Users

Professionals using daily automation

$2K+

Monthly Revenue

Subscription-based growth

60%

Time Reduction

In manual email handling

150

Emails/Min

Peak processing throughput

System Architecture

LangGraph Multi-Agent Architecture: Production Implementation

Architecture Highlights

LangGraph Workflow

Multi-agent pipeline with conditional routing and shared state management

Vector Memory

Pinecone-powered semantic search for contextual email responses

Dual Inference Setup

OpenAI GPT-4o + DeepSeek V3 on RunPod RTX 4090 for cost optimization

Real-time API

FastAPI with automatic documentation and JWT authentication

System Architecture Components

LangGraph Workflow Engine

Multi-agent orchestration with state management and conditional routing

StateGraph: directed acyclic graph execution with checkpointing

AgentState: shared memory across nodes with Pydantic schemas

Conditional edges: priority-based routing with business logic

Node execution: async processing with timeout handling

Error recovery: exponential backoff with dead letter queues

AI Model Infrastructure

Dual inference setup with cost optimization and fallback

OpenAI GPT-4o: 128k context, JSON mode, function calling

DeepSeek V3: vLLM server, FP16 quantization, 64k context

RunPod RTX 4090: auto-scaling pods, 4K tokens/sec throughput

Load balancing: weighted routing, health checks every 30s

Cost optimization: 70% reduction via smart model selection

Vector Memory System

Semantic search and contextual retrieval pipeline

Pinecone p2.x1 pods: 1536-dimensional vectors, cosine similarity

Embedding generation: text-embedding-3-large, batch size 100

Metadata filtering: user_id, timestamp, email_type indexing

Query optimization: top-k=5 retrieval, relevance thresholding

Cache strategy: Redis TTL 300s for frequent queries

API Gateway & Authentication

High-performance async backend with comprehensive security

FastAPI: uvicorn ASGI server, asyncio + uvloop event loop

JWT authentication: HS256 signing, 15min access + 7d refresh

Rate limiting: sliding window, 100 req/min per user

Request validation: Pydantic v2 models with custom validators

CORS policies: strict origin checking, credential handling

Email Integration Layer

Gmail API integration with robust error handling

OAuth 2.0: service account delegation, token refresh automation

IMAP protocol: real-time push notifications via Pub/Sub

Rate limiting: 10 req/sec/user with exponential backoff

Thread management: conversation tracking, reply-to handling

Delivery confirmation: SMTP status codes, bounce handling

Data Storage & Persistence

Multi-modal data storage with real-time synchronization

Firestore: NoSQL document store, real-time listeners

Security rules: field-level access control, query validation

Multi-region backup: automatic failover, 99.99% availability

Connection pooling: 20 concurrent connections per service

Transaction management: atomic operations, optimistic locking

Frontend & User Experience

Modern React application with real-time capabilities

Next.js 14: SSR + SSG, app router, React 18 concurrent features

TypeScript: strict mode, path mapping, ESLint + Prettier

State management: SWR for server state, Zustand for client state

Real-time updates: WebSocket connections, optimistic UI updates

Performance optimization: code splitting, lazy loading, CDN

DevOps & Monitoring

Production deployment with comprehensive observability

Kubernetes: HPA based on CPU/memory, rolling deployments

Docker: multi-stage builds, distroless base images, layer caching

Monitoring: Prometheus metrics, Grafana dashboards, alerting

Logging: structured JSON logs, ELK stack, log aggregation

CI/CD: GitHub Actions, automated testing, security scanning

Machine Learning Results

Email Classification Accuracy

94.2%

Multi-class classification (urgent, inquiry, booking, complaint)

Response Generation Time

<2.5s

Average time from email ingestion to response generation

Processing Throughput

150 emails/min

Peak processing capacity with concurrent workflows

User Satisfaction

4.1/5.0

Average rating from user feedback on response quality

Context Relevance

4.3/5.0

Vector memory retrieval relevance scoring

System Uptime

99.2%

6-month operational reliability

System Design Decisions

Why FastAPI over Django/Flask?

FastAPI provides automatic OpenAPI documentation, native async support for email processing, and better performance for I/O-heavy operations like API calls to Gmail/OpenAI. Django's ORM would be overkill since we're primarily doing API orchestration, not complex database relationships.

How do you handle Gmail API rate limits?

We use SlowAPI for basic rate limiting (e.g., @limiter.limit('10/minute') on endpoints) and Google's credentials refresh mechanism. When tokens expire, the GmailService automatically refreshes them and updates Firestore. For bulk operations, we process emails in batches of 5 per user.

Why host DeepSeek on RunPod instead of using OpenAI directly?

Cost optimization was critical for sustainable scaling. OpenAI's GPT-4o costs $0.03/1K tokens while DeepSeek offers comparable performance at 70% lower cost. RunPod provides GPU instances optimized for inference with auto-scaling, reducing our monthly inference costs from $800+ to under $300 while maintaining response quality and <2.5s latency.

Why Pinecone instead of embedding search in Postgres?

The EmbeddingStore class supports Pinecone for vector similarity search. We need semantic search across email content for the memory retrieval step in LangGraph. Setting up pgvector would require more infrastructure management compared to managed vector databases.

How do you prevent race conditions when multiple emails arrive simultaneously?

We use a ThreadPoolExecutor (max 10 workers) in EmailProcessor to handle concurrent email processing. Each email gets processed independently through the LangGraph pipeline. Since we're using Firestore's atomic operations for user data updates, basic concurrency is handled at the database level.

Why LangGraph for the email workflow?

LangGraph manages the email processing pipeline: classify → prioritize → retrieve_similar → generate_response. Each step can access shared state and the conditional routing lets us handle different email types. The EmailAgentState tracks progress through the workflow and handles errors at each node.

Technology Stack

AI & ML

LangGraphOpenAI GPT-4oDeepSeekPineconeLlamaIndex

Backend & Infrastructure

FastAPIPythonFirebaseRunPodJWT Auth

Frontend

Next.jsTypeScriptTailwind CSSStripe

Key Engineering Learnings

Cost optimization through model selection: Hosting DeepSeek on RunPod reduced inference costs by 70% compared to OpenAI GPT-4o while maintaining comparable response quality and achieving target <2.5s response times.
Vector database selection matters: Pinecone's managed infrastructure eliminated the complexity of self-hosting embeddings while providing sub-100ms semantic search.
LangGraph state management: Shared state across workflow nodes enabled sophisticated routing and error recovery without complex external orchestration.
Async processing architecture: ThreadPoolExecutor with careful rate limiting balanced throughput with API constraints, achieving 150 emails/min sustained processing.