Back to Portfolio

Mailix: AI Email Automation Platform

Production email automation platform using LangGraph multi-agent workflows for intelligent email classification, prioritization, and response generation with vector-based memory retrieval.

Business Impact

1,000+
Active Users
Professionals using daily automation
$2K+
Monthly Revenue
Subscription-based growth
60%
Time Reduction
In manual email handling
150
Emails/Min
Peak processing throughput

System Architecture

LangGraph Multi-Agent Architecture: Production Implementation

Gmail APIEmail IngestionOAuth 2.0ClassifierLangGraph Agent94.2% AccuracyPrioritizerBusiness Rules1-10 Priority ScaleResponderDeepSeek + GPT-4<2.5s ResponseEmail SenderGmail Send APIThread ManagementMonitoringReal-time TrackingPerformance MetricsPinecone Vector DBSemantic Memory<85ms P99AI InfrastructureOpenAI + RunPod70% Cost ReductionFastAPIAsync BackendLangGraphStateGraph RuntimeNext.jsFrontend AppFirebaseUser DatabaseKubernetesAuto-scaling

Architecture Highlights

LangGraph Workflow

Multi-agent pipeline with conditional routing and shared state management

Vector Memory

Pinecone-powered semantic search for contextual email responses

Dual Inference Setup

OpenAI GPT-4o + DeepSeek V3 on RunPod RTX 4090 for cost optimization

Real-time API

FastAPI with automatic documentation and JWT authentication

System Architecture Components

LangGraph Workflow Engine

Multi-agent orchestration with state management and conditional routing

StateGraph: directed acyclic graph execution with checkpointing
AgentState: shared memory across nodes with Pydantic schemas
Conditional edges: priority-based routing with business logic
Node execution: async processing with timeout handling
Error recovery: exponential backoff with dead letter queues

AI Model Infrastructure

Dual inference setup with cost optimization and fallback

OpenAI GPT-4o: 128k context, JSON mode, function calling
DeepSeek V3: vLLM server, FP16 quantization, 64k context
RunPod RTX 4090: auto-scaling pods, 4K tokens/sec throughput
Load balancing: weighted routing, health checks every 30s
Cost optimization: 70% reduction via smart model selection

Vector Memory System

Semantic search and contextual retrieval pipeline

Pinecone p2.x1 pods: 1536-dimensional vectors, cosine similarity
Embedding generation: text-embedding-3-large, batch size 100
Metadata filtering: user_id, timestamp, email_type indexing
Query optimization: top-k=5 retrieval, relevance thresholding
Cache strategy: Redis TTL 300s for frequent queries

API Gateway & Authentication

High-performance async backend with comprehensive security

FastAPI: uvicorn ASGI server, asyncio + uvloop event loop
JWT authentication: HS256 signing, 15min access + 7d refresh
Rate limiting: sliding window, 100 req/min per user
Request validation: Pydantic v2 models with custom validators
CORS policies: strict origin checking, credential handling

Email Integration Layer

Gmail API integration with robust error handling

OAuth 2.0: service account delegation, token refresh automation
IMAP protocol: real-time push notifications via Pub/Sub
Rate limiting: 10 req/sec/user with exponential backoff
Thread management: conversation tracking, reply-to handling
Delivery confirmation: SMTP status codes, bounce handling

Data Storage & Persistence

Multi-modal data storage with real-time synchronization

Firestore: NoSQL document store, real-time listeners
Security rules: field-level access control, query validation
Multi-region backup: automatic failover, 99.99% availability
Connection pooling: 20 concurrent connections per service
Transaction management: atomic operations, optimistic locking

Frontend & User Experience

Modern React application with real-time capabilities

Next.js 14: SSR + SSG, app router, React 18 concurrent features
TypeScript: strict mode, path mapping, ESLint + Prettier
State management: SWR for server state, Zustand for client state
Real-time updates: WebSocket connections, optimistic UI updates
Performance optimization: code splitting, lazy loading, CDN

DevOps & Monitoring

Production deployment with comprehensive observability

Kubernetes: HPA based on CPU/memory, rolling deployments
Docker: multi-stage builds, distroless base images, layer caching
Monitoring: Prometheus metrics, Grafana dashboards, alerting
Logging: structured JSON logs, ELK stack, log aggregation
CI/CD: GitHub Actions, automated testing, security scanning

Machine Learning Results

Email Classification Accuracy

94.2%

Multi-class classification (urgent, inquiry, booking, complaint)

Response Generation Time

<2.5s

Average time from email ingestion to response generation

Processing Throughput

150 emails/min

Peak processing capacity with concurrent workflows

User Satisfaction

4.1/5.0

Average rating from user feedback on response quality

Context Relevance

4.3/5.0

Vector memory retrieval relevance scoring

System Uptime

99.2%

6-month operational reliability

System Design Decisions

Why FastAPI over Django/Flask?

FastAPI provides automatic OpenAPI documentation, native async support for email processing, and better performance for I/O-heavy operations like API calls to Gmail/OpenAI. Django's ORM would be overkill since we're primarily doing API orchestration, not complex database relationships.

How do you handle Gmail API rate limits?

We use SlowAPI for basic rate limiting (e.g., @limiter.limit('10/minute') on endpoints) and Google's credentials refresh mechanism. When tokens expire, the GmailService automatically refreshes them and updates Firestore. For bulk operations, we process emails in batches of 5 per user.

Why host DeepSeek on RunPod instead of using OpenAI directly?

Cost optimization was critical for sustainable scaling. OpenAI's GPT-4o costs $0.03/1K tokens while DeepSeek offers comparable performance at 70% lower cost. RunPod provides GPU instances optimized for inference with auto-scaling, reducing our monthly inference costs from $800+ to under $300 while maintaining response quality and <2.5s latency.

Why Pinecone instead of embedding search in Postgres?

The EmbeddingStore class supports Pinecone for vector similarity search. We need semantic search across email content for the memory retrieval step in LangGraph. Setting up pgvector would require more infrastructure management compared to managed vector databases.

How do you prevent race conditions when multiple emails arrive simultaneously?

We use a ThreadPoolExecutor (max 10 workers) in EmailProcessor to handle concurrent email processing. Each email gets processed independently through the LangGraph pipeline. Since we're using Firestore's atomic operations for user data updates, basic concurrency is handled at the database level.

Why LangGraph for the email workflow?

LangGraph manages the email processing pipeline: classify → prioritize → retrieve_similar → generate_response. Each step can access shared state and the conditional routing lets us handle different email types. The EmailAgentState tracks progress through the workflow and handles errors at each node.

Technology Stack

AI & ML

LangGraphOpenAI GPT-4oDeepSeekPineconeLlamaIndex

Backend & Infrastructure

FastAPIPythonFirebaseRunPodJWT Auth

Frontend

Next.jsTypeScriptTailwind CSSStripe

Key Engineering Learnings

  • Cost optimization through model selection: Hosting DeepSeek on RunPod reduced inference costs by 70% compared to OpenAI GPT-4o while maintaining comparable response quality and achieving target <2.5s response times.
  • Vector database selection matters: Pinecone's managed infrastructure eliminated the complexity of self-hosting embeddings while providing sub-100ms semantic search.
  • LangGraph state management: Shared state across workflow nodes enabled sophisticated routing and error recovery without complex external orchestration.
  • Async processing architecture: ThreadPoolExecutor with careful rate limiting balanced throughput with API constraints, achieving 150 emails/min sustained processing.