Jeb: Pubky AI Bot
Jeb is an AI-powered bot for the Pubky decentralized social network. Affectionately named “Jeb,” this bot automatically responds to mentions with intelligent summaries, fact-checking, and other AI-powered capabilities, demonstrating how AI can enhance decentralized social experiences without compromising user sovereignty.
Overview
Section titled “Overview”Jeb operates as an autonomous agent on the Pubky network, monitoring mentions and providing helpful AI-generated responses. Unlike centralized social media bots that collect user data, Jeb operates transparently on the decentralized Pubky infrastructure, with all interactions stored on Homeservers under user control.
Key Characteristics
Section titled “Key Characteristics”- AI-Powered: Supports multiple AI providers (Groq, OpenAI, Anthropic, OpenRouter)
- Event-Driven: Built on Redis Streams for scalable, asynchronous processing
- Production-Ready: Full observability with health checks, metrics, and structured logging
- Privacy-Preserving: No data collection; all interactions occur via public Pubky protocol
- Open Source: MIT licensed, fully transparent implementation
- Horizontally Scalable: Worker-based architecture for handling high volumes
Capabilities
Section titled “Capabilities”Summary Generation
Section titled “Summary Generation”Jeb can summarize long threads or conversations to help users quickly understand complex discussions.
Trigger Keywords: “summary”, “tl;dr”, “recap”, “summarize”
How It Works:
- User mentions Jeb in a post with a summary request
- Jeb fetches the full thread context from the Homeserver
- AI generates a concise summary with key points
- Response includes up to 3 bullet points highlighting main topics
- Safety check ensures appropriate content before posting
Example Interaction:
User: "@jeb can you summarize this thread? tl;dr"Jeb: "This thread discusses X, Y, and Z. Key points: • Point 1 about topic A • Point 2 about topic B • Point 3 about topic C"Token Management: Jeb respects configurable token budgets to avoid overwhelming AI APIs with massive threads.
Fact-Checking
Section titled “Fact-Checking”Jeb can verify factual claims by searching for supporting evidence using web search APIs.
Trigger Keywords: “verify”, “fact check”, “source?”, “is this true?”
How It Works:
- User mentions Jeb with a fact-check request
- Jeb extracts factual claims from the content
- Uses MCP (Model Context Protocol) tools to search Brave API
- AI analyzes search results and assigns verdicts per claim
- Returns overall assessment with confidence level and sources
Verdict Types:
- Accurate: Claim supported by reliable sources
- Mixed: Some accurate elements, some not
- Inaccurate: Contradicted by reliable sources
- Unverifiable: Insufficient evidence to determine truth
Example Interaction:
User: "The moon landing happened in 1969. @jeb fact check"Jeb: "Verdict: ACCURATE (High Confidence) Claim verified by multiple reliable sources: • NASA official records • Independent analysis from [source] • Historical documentation from [source]"Intelligent Routing
Section titled “Intelligent Routing”Jeb uses a hybrid classification system to understand user intent:
Heuristic Classification: Fast keyword matching for obvious cases
- “summary” → Summary action
- “fact check” → Factcheck action
- “verify” → Factcheck action
LLM Classification: Fallback to AI when keywords are ambiguous
- Analyzes full context and conversational intent
- Handles natural language requests without exact keywords
- More flexible but slightly higher latency
Architecture
Section titled “Architecture”System Design
Section titled “System Design”┌─────────────────────────────────────────────────────────┐│ Pubky Network ││ (Homeservers + Nexus API) │└───────────────────┬─────────────────────────────────────┘ │ ▼ ┌──────────────────────┐ │ Mention Poller │ ← Polls for @jeb mentions └──────────┬───────────┘ │ ▼ ┌──────────────────────┐ │ Event Bus (Redis) │ ← mention.received.v1 └──────────┬───────────┘ │ ▼ ┌──────────────────────┐ │ Router │ ← Classifies intent │ (Heuristics + LLM) │ └──────────┬───────────┘ │ ┌──────────┴──────────┐ ▼ ▼┌─────────────────┐ ┌─────────────────┐│ Summary Workers │ │ Factcheck Workers││ (Horizontal │ │ (Horizontal ││ Scaling) │ │ Scaling) │└────────┬────────┘ └────────┬─────────┘ │ │ └──────────┬──────────┘ ▼ ┌──────────────────────┐ │ Reply Publisher │ → Posts to Pubky └──────────────────────┘Components
Section titled “Components”Mention Poller:
- Queries Pubky Nexus API for mentions
- Tracks last processed mention to avoid duplicates
- Emits
mention.received.v1events to Redis Streams - Configurable polling interval (default: 30s)
Router:
- Consumes mention events from Redis Streams
- Applies heuristic keyword matching first
- Falls back to LLM classification if needed
- Emits action-specific events (e.g.,
action.summary.requested.v1) - Stores routing decisions for audit trail
Action Workers:
- Summary Workers: Generate thread summaries using AI
- Factcheck Workers: Verify claims using web search + AI
- Horizontally scalable for load distribution
- Each worker type runs in its own consumer group
Reply Publisher:
- Receives completed responses from workers
- Performs final safety checks (wordlist filtering)
- Publishes replies to Pubky via Pubky SDK
- Stores published replies for auditability
Data Flow
Section titled “Data Flow”- Mention Detection: Poller finds new @jeb mentions
- Event Emission: Mention stored in DB, event sent to Redis
- Intent Classification: Router determines user intent
- Action Processing: Workers execute appropriate logic
- Response Generation: AI creates human-readable response
- Safety Validation: Content checked against wordlist
- Publication: Reply posted to Pubky network
Technology Stack
Section titled “Technology Stack”- Language: TypeScript/Node.js 20+
- Database: PostgreSQL (mention tracking, audit logs)
- Event Bus: Redis Streams (event-driven architecture)
- AI Providers: Groq (default), OpenAI, Anthropic, OpenRouter
- Web Search: Brave Search API (via MCP protocol)
- Pubky Integration:
@synonymdev/pubkySDK - Data Validation:
pubky-app-specsfor schema compliance - Observability: Winston logging, Prometheus metrics, health endpoints
Deployment
Section titled “Deployment”Quick Start with Docker Compose
Section titled “Quick Start with Docker Compose”The fastest way to run Jeb:
# Clone repositorygit clone https://github.com/pubky/pubky-ai-botcd pubky-ai-bot
# Configure environmentcp .env.example .env# Edit .env with your configuration (see below)
# Start all services (bot, PostgreSQL, Redis)docker compose up -d
# Check logsdocker compose logs -f pubky-ai-bot
# Stop servicesdocker compose downConfiguration
Section titled “Configuration”Required Environment Variables
Section titled “Required Environment Variables”# Bot Identity (generate at https://iancoleman.io/bip39/)PUBKY_BOT_MNEMONIC="word1 word2 ... word12"
# AI Provider (Groq is free for development)AI_PRIMARY_PROVIDER=groqGROQ_API_KEY=gsk_your_groq_api_key
# Pubky NetworkPUBKY_NETWORK=testnet # or: mainnet (when ready)Optional Configuration
Section titled “Optional Configuration”# AI Models (per action)AI_MODEL_SUMMARY=llama-3.1-8b-instantAI_MODEL_FACTCHECK=llama-3.1-8b-instantAI_MODEL_CLASSIFIER=llama-3.1-8b-instant
# Brave Search for fact-checkingBRAVE_API_KEY=your_brave_api_key
# Database (auto-configured by Docker Compose)DATABASE_URL=postgres://user:pass@localhost:5432/pubkybotREDIS_URL=redis://localhost:6379/0
# PollingMENTION_POLL_INTERVAL_MS=30000 # 30 seconds
# PerformanceWORKER_CONCURRENCY_SUMMARY=2WORKER_CONCURRENCY_FACTCHECK=2AI Provider Setup
Section titled “AI Provider Setup”Groq (Free for Development)
Section titled “Groq (Free for Development)”- Sign up at console.groq.com
- Create an API key
- Set
GROQ_API_KEY=gsk_...in.env - Uses fast Llama 3.1 models
OpenAI
Section titled “OpenAI”- Get API key from platform.openai.com
- Set
AI_PRIMARY_PROVIDER=openai - Set
OPENAI_API_KEY=sk-... - Configure
AI_MODEL_SUMMARY=gpt-4o-mini(or other model)
Anthropic
Section titled “Anthropic”- Get API key from console.anthropic.com
- Set
AI_PRIMARY_PROVIDER=anthropic - Set
ANTHROPIC_API_KEY=sk-ant-... - Configure
AI_MODEL_SUMMARY=claude-3-5-sonnet-latest
OpenRouter
Section titled “OpenRouter”- Get API key from openrouter.ai
- Set
AI_PRIMARY_PROVIDER=openrouter - Set
OPENROUTER_API_KEY=sk-or-... - Configure models using OpenRouter model names
Production Deployment
Section titled “Production Deployment”Horizontal Scaling
Section titled “Horizontal Scaling”Run multiple worker instances for high load:
# Single instance for poller + router (stateful polling)npm start
# Scale summary workers horizontallyNODE_ENV=production WORKER_TYPE=summary npm start &NODE_ENV=production WORKER_TYPE=summary npm start &NODE_ENV=production WORKER_TYPE=summary npm start &
# Scale factcheck workers horizontallyNODE_ENV=production WORKER_TYPE=factcheck npm start &NODE_ENV=production WORKER_TYPE=factcheck npm start &Kubernetes Deployment
Section titled “Kubernetes Deployment”# Poller + Router (single replica)apiVersion: apps/v1kind: Deploymentmetadata: name: jeb-pollerspec: replicas: 1 # Must be 1 (stateful polling) template: spec: containers: - name: jeb-poller image: pubky/jeb-bot:latest env: - name: NODE_ENV value: production
---
# Summary Workers (scale as needed)apiVersion: apps/v1kind: Deploymentmetadata: name: jeb-summary-workersspec: replicas: 3 # Scale based on load template: spec: containers: - name: jeb-summary image: pubky/jeb-bot:latest env: - name: WORKER_TYPE value: summary
---
# Factcheck Workers (scale as needed)apiVersion: apps/v1kind: Deploymentmetadata: name: jeb-factcheck-workersspec: replicas: 3 # Scale based on load template: spec: containers: - name: jeb-factcheck image: pubky/jeb-bot:latest env: - name: WORKER_TYPE value: factcheckDatabase Migrations
Section titled “Database Migrations”On first deployment or after updates:
# Run migrationsnpm run db:migrate
# Or via Dockerdocker compose exec pubky-ai-bot npm run db:migrateObservability
Section titled “Observability”Health Endpoints
Section titled “Health Endpoints”Comprehensive Health Check:
GET /api/healthReturns detailed status of all services:
- Database connectivity
- Redis connectivity
- AI provider availability
- Last mention poll time
- Worker status
Kubernetes Readiness Probe:
GET /api/health/readyReturns 200 when service is ready to accept traffic.
Kubernetes Liveness Probe:
GET /api/health/liveReturns 200 when service is alive (restarts if fails).
Prometheus Metrics
Section titled “Prometheus Metrics”GET /metricsExported metrics include:
jeb_mentions_received_total- Total mentions processedjeb_actions_executed_total- Actions by type and statusjeb_action_duration_seconds- Action execution timejeb_replies_published_total- Successful repliesjeb_ai_api_calls_total- AI API usage by providerjeb_ai_tokens_used_total- Token consumption trackingjeb_redis_operations_total- Redis stream operationsjeb_db_query_duration_seconds- Database performance
Structured Logging
Section titled “Structured Logging”Winston-based logging with JSON output for production:
{ "timestamp": "2025-01-05T10:30:00.000Z", "level": "info", "message": "Action completed", "mentionId": "abc123", "actionType": "summary", "durationMs": 1234, "aiProvider": "groq", "tokensUsed": 450}Log levels: error, warn, info, debug
Safety & Moderation
Section titled “Safety & Moderation”Wordlist Filtering
Section titled “Wordlist Filtering”Jeb includes configurable banned term lists to prevent inappropriate responses:
Configuration (config/default.json):
{ "safety": { "wordlist": { "enabled": true, "blockOnMatch": true, "lists": ["offensive", "spam", "political"] } }}Behavior:
- Checks generated responses before publishing
- Blocks replies containing banned terms
- Logs blocked content for audit
- Customizable per deployment
Injection Prevention
Section titled “Injection Prevention”Protection against prompt injection attacks:
- Input validation on all user content
- Structured prompts with clear boundaries
- Regex-based injection pattern detection
- AI content safety checks
Rate Limiting
Section titled “Rate Limiting”Per-user rate limiting prevents abuse:
- Configurable limits per time window
- Tracked in Redis for distributed rate limiting
- Graceful degradation when limits exceeded
Database Schema
Section titled “Database Schema”Core Tables
Section titled “Core Tables”mentions: Tracks incoming mentions
id- Unique mention identifierauthor_pubky- User who mentioned Jebcontent- Mention textpost_id- Pubky post IDprocessed_at- Processing timestampstatus-pending,processing,completed,failed,skipped_old
action_executions: Audit trail for actions
id- Execution identifiermention_id- Reference to mentionaction_type-summary,factcheck, etc.status-pending,success,failedduration_ms- Execution timetokens_used- AI token consumptionerror_details- Failure context (if failed)
artifacts: Stores generated content
id- Artifact identifieraction_execution_id- Reference to executionartifact_type-summary,evidence,sourcescontent- Generated content (JSONB)
replies: Published responses
id- Reply identifiermention_id- Original mentionpost_id- Published Pubky post IDcontent- Reply textpublished_at- Publication timestamp
routing_decisions: Classification audit
id- Decision identifiermention_id- Mention being classifiedmethod-heuristicorllmaction_type- Determined actionconfidence- Classification confidence (LLM only)reasoning- Why this action was chosen
Event Sourcing
Section titled “Event Sourcing”All state changes are event-driven:
- Events stored in Redis Streams
- Database reflects event-derived state
- Enables replay and debugging
- Supports horizontal scaling
Advanced Features
Section titled “Advanced Features”MCP Integration
Section titled “MCP Integration”Jeb uses Model Context Protocol for tool integration:
Brave Search Tool:
- Queries Brave Search API for factual verification
- Configurable result limits and timeouts
- Source credibility scoring
- Automatic retry with exponential backoff
Docker MCP Server:
- Separate container for MCP tools (
Dockerfile.brave-mcp) - Communicates with main bot via stdio protocol
- Isolated tool execution environment
Idempotency
Section titled “Idempotency”All operations are idempotent:
Idempotency Keys:
- Mention ingestion:
mention:{mentionId} - Routing decisions:
route:{mentionId} - Action executions:
action:{actionType}:{mentionId}
TTL: 24 hours (prevents duplicate processing)
Benefits:
- Safe to retry failed operations
- Prevents duplicate replies
- Handles network failures gracefully
Dead Letter Queue
Section titled “Dead Letter Queue”Failed messages moved to DLQ for investigation:
# Check DLQ lengthredis-cli xlen pubky:dlq
# Read failed messagesredis-cli xread STREAMS pubky:dlq 0
# Reprocess from DLQnpm run reprocess -- --from-dlqReprocessing
Section titled “Reprocessing”Manually reprocess mentions:
# Reprocess specific mentionnpm run reprocess -- --mention-id abc123
# Reprocess recent failuresnpm run reprocess:recent
# Reprocess with limitsnpm run reprocess:limited -- --limit 10Development
Section titled “Development”Local Development Setup
Section titled “Local Development Setup”# Install dependenciesnpm ci
# Start PostgreSQL and Redis (via Docker Compose)docker compose up -d postgres redis
# Run migrationsnpm run db:migrate
# Start development server with hot reloadnpm run devTesting
Section titled “Testing”# Run all testsnpm test
# Run tests with coveragenpm test:coverage
# Run tests in watch modenpm test:watch
# Run tests in Docker (includes DB and Redis)npm run test:dockerTest categories:
- Unit tests: Individual service logic
- Integration tests: Database operations
- E2E tests: Full workflow simulations
Code Quality
Section titled “Code Quality”# Lint TypeScriptnpm run lintnpm run lint:fix
# Format codenpm run format
# Type checknpm run typecheckUse Cases
Section titled “Use Cases”Community Moderation
Section titled “Community Moderation”Use Jeb to help moderators:
- Summarize long complaint threads
- Fact-check claims in disputes
- Provide context for moderation decisions
News Verification
Section titled “News Verification”Combat misinformation:
- Fact-check breaking news claims
- Provide source citations
- Assess claim credibility
Thread Digests
Section titled “Thread Digests”Help users catch up:
- Summarize active discussions
- Extract key points from long threads
- Highlight consensus or disagreements
Educational Support
Section titled “Educational Support”Assist learning:
- Verify historical or scientific claims
- Provide sources for further reading
- Summarize complex explanations
Research Assistance
Section titled “Research Assistance”Aid researchers:
- Quick literature verification
- Source finding
- Claim cross-referencing
Limitations & Considerations
Section titled “Limitations & Considerations”AI Model Limitations
Section titled “AI Model Limitations”- Hallucinations: AI may generate plausible but incorrect information
- Bias: Models reflect training data biases
- Context Window: Limited thread history can be processed
- Cost: AI API calls incur costs at scale
Network Dependencies
Section titled “Network Dependencies”- Homeserver Availability: Relies on Homeserver uptime
- Nexus API: Depends on Pubky Nexus for mention polling
- Web Search: Fact-checking requires Brave API access
- Redis/PostgreSQL: Infrastructure dependencies
Factchecking Constraints
Section titled “Factchecking Constraints”- Source Quality: Limited by search engine results
- Real-time Events: May not have recent information
- Subjective Topics: Works best for objective facts
- Language: Primarily English (model-dependent)
Privacy Considerations
Section titled “Privacy Considerations”- Public Interactions: All mentions and replies are public
- AI Provider: Content sent to third-party AI APIs
- Search Queries: Fact-check queries sent to Brave
- Audit Logs: All interactions logged for debugging
Operational Considerations
Section titled “Operational Considerations”- Polling Latency: 30s default delay before responding
- AI Rate Limits: Provider-specific throughput limits
- Token Budgets: Large threads may be truncated
- Safety Trade-offs: Wordlist may block legitimate content
Future Enhancements
Section titled “Future Enhancements”Potential improvements for Jeb:
- Multi-language Support: Automatic translation and localization
- Image Analysis: OCR and image fact-checking
- Sentiment Analysis: Detect and respond to emotional tone
- Thread Visualization: Generate visual summaries (graphs, timelines)
- Collaborative Filtering: User feedback on response quality
- Custom Actions: Plugin system for extensible capabilities
- Real-time Streaming: WebSocket-based instant responses
- Fine-tuned Models: Domain-specific model training
- Federated Learning: Privacy-preserving model improvement
- Voice Responses: Audio summary generation
Resources
Section titled “Resources”- Repository: https://github.com/pubky/pubky-ai-bot
- Docker Images: Docker Hub (if published)
- Configuration Examples: config/
- Migration Files: src/infrastructure/database/migrations/
AI Providers
Section titled “AI Providers”- Groq: https://console.groq.com
- OpenAI: https://platform.openai.com
- Anthropic: https://console.anthropic.com
- OpenRouter: https://openrouter.ai
Related Tools
Section titled “Related Tools”- BIP39 Mnemonic Generator: https://iancoleman.io/bip39/
- Brave Search API: https://brave.com/search/api/
- Model Context Protocol: https://modelcontextprotocol.io
See Also
Section titled “See Also”- Pubky App - Social network where Jeb operates
- Pubky Nexus - Backend API for mention polling
- Pubky SDK - SDK used for posting replies
- Homeserver - Where Jeb’s posts are stored
- Pubky Ring - Identity management (for bot account)