npm - @adminide-stack/yantra-help-browser - Versions diffs - 12.0.16-alpha.27 → 12.0.16-alpha.29 - Mend

@adminide-stack/yantra-help-browser 12.0.16-alpha.27 → 12.0.16-alpha.29

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (163) hide show

package/lib/templates/content/technical-questions/data-processing.md CHANGED Viewed

@@ -1,83 +1,102 @@
-# Data Processing
+# Data Processing Pipeline
-How Yantra processes and analyzes data.
+How Yantra ingests, transforms, analyzes, and stores your data — from raw input to search-ready content.
-## Data Pipeline
+---
-### Data Ingestion
+## Data Ingestion
-- **Web Crawling**: Automated web content crawling
-- **API Integration**: Third-party API data ingestion
-- **User Input**: User-generated content processing
-- **File Uploads**: Document and file processing
+Yantra supports multiple ingestion pathways to accommodate diverse data sources:
-### Data Processing
+| Source       | Method                  | Details                                                                 |
+| ------------ | ----------------------- | ----------------------------------------------------------------------- |
+| Web content  | Automated crawling      | Configurable crawl schedules, depth limits, and domain allowlists       |
+| APIs         | REST/GraphQL connectors | Pre-built connectors for 50+ services (Slack, Notion, Confluence, etc.) |
+| File uploads | Drag-and-drop or API    | Supports PDF, DOCX, PPTX, CSV, Markdown, HTML, and plain text           |
+| Databases    | Direct connectors       | PostgreSQL, MySQL, MongoDB read-only connectors with incremental sync   |
-- **Content Extraction**: Text and content extraction
-- **Data Cleaning**: Data cleaning and normalization
-- **Data Enrichment**: Data enrichment and enhancement
-- **Data Validation**: Data quality validation
+### Ingestion guarantees
+- **Exactly-once processing** — Deduplication ensures no content is indexed twice.
+- **Incremental updates** — Only new or modified content is re-processed, minimizing compute costs.
+- **Schema validation** — Incoming data is validated against expected schemas before entering the pipeline.
+---
 ## Processing Architecture
-### Stream Processing
+### Stream processing
+Yantra uses an event-driven architecture built on **Apache Kafka** for real-time data flow:
+1. **Producers** publish raw content events to topic partitions.
+2. **Stream processors** (Kafka Streams) consume events, apply transformations, and emit enriched records.
+3. **Consumers** write processed data to the appropriate storage layer (PostgreSQL, Elasticsearch, Redis).
+### Batch processing
-- **Real-time Processing**: Real-time data processing
-- **Event Streaming**: Event-driven processing
-- **Message Queues**: Asynchronous processing
-- **Batch Processing**: Batch data processing
+For large-volume imports or periodic re-indexing, Yantra runs batch jobs using distributed task queues:
-### Data Transformation
+- Jobs are broken into chunks of 1,000 records.
+- Each chunk is processed in parallel across worker nodes.
+- Progress is tracked in real-time and visible in the Admin Dashboard.
-- **ETL Processes**: Extract, transform, load
-- **Data Mapping**: Data field mapping
-- **Format Conversion**: Data format conversion
-- **Schema Evolution**: Schema management
+---
 ## Content Analysis
-### Text Processing
+Every piece of content passes through a multi-stage analysis pipeline:
-- **Natural Language Processing**: NLP processing
-- **Sentiment Analysis**: Sentiment detection
-- **Topic Modeling**: Topic extraction
-- **Entity Recognition**: Named entity recognition
+### Text processing
-### Content Classification
+- **Language detection** — Automatically identifies 50+ languages.
+- **Tokenization** — Text is split into meaningful tokens using language-specific tokenizers.
+- **Normalization** — Unicode normalization, lowercasing, and stop-word removal.
-- **Content Categorization**: Automatic categorization
-- **Quality Assessment**: Content quality scoring
-- **Relevance Scoring**: Relevance assessment
-- **Duplicate Detection**: Duplicate content detection
+### Semantic analysis
-## Data Storage
+- **Embedding generation** — Each document is converted to a high-dimensional vector using state-of-the-art embedding models.
+- **Topic modeling** — Latent topics are extracted using LDA and transformer-based approaches.
+- **Entity recognition** — People, organizations, dates, locations, and custom entities are identified and tagged.
+- **Sentiment analysis** — Content sentiment (positive, negative, neutral) is scored for analytics.
+### Quality scoring
+| Signal           | Weight | Description                                      |
+| ---------------- | ------ | ------------------------------------------------ |
+| Completeness     | 25%    | Does the content cover its topic thoroughly?     |
+| Freshness        | 20%    | How recently was the content created or updated? |
+| Source authority | 30%    | How trustworthy is the source domain?            |
+| Readability      | 15%    | Flesch-Kincaid score and structural quality      |
+| Uniqueness       | 10%    | Duplicate and near-duplicate detection           |
-### Storage Systems
+---
+## Data Storage
-- **Primary Storage**: PostgreSQL database
-- **Search Storage**: Elasticsearch index
-- **Cache Storage**: Redis cache
-- **File Storage**: Object storage system
+### Multi-tier storage architecture
-### Data Management
+| Tier | Technology                   | Purpose                                    | Retention            |
+| ---- | ---------------------------- | ------------------------------------------ | -------------------- |
+| Hot  | PostgreSQL + Redis           | Active queries, user data, real-time cache | Indefinite           |
+| Warm | Elasticsearch                | Full-text search index, vector search      | Indefinite           |
+| Cold | S3-compatible object storage | Archived content, raw uploads, backups     | Per retention policy |
-- **Data Lifecycle**: Data lifecycle management
-- **Data Retention**: Data retention policies
-- **Data Archival**: Data archival processes
-- **Data Deletion**: Secure data deletion
+### Data lifecycle
-## Performance Optimization
+1. **Ingest** — Raw data lands in the processing queue.
+2. **Process** — Content is analyzed, enriched, and indexed.
+3. **Serve** — Processed data is available for search and retrieval.
+4. **Archive** — Older content is compressed and moved to cold storage based on access patterns.
+5. **Delete** — Data past its retention period is securely purged (overwritten + cryptographic erasure).
-### Processing Performance
+---
-- **Parallel Processing**: Multi-threaded processing
-- **Distributed Processing**: Distributed computing
-- **Caching**: Intelligent caching strategies
-- **Optimization**: Performance optimization
+## Performance & Scalability
-### Scalability
+- **Throughput** — The pipeline processes 10,000+ documents per minute at steady state.
+- **Latency** — End-to-end ingestion-to-searchable time is under 30 seconds for real-time sources.
+- **Horizontal scaling** — Worker nodes auto-scale based on queue depth. During peak loads, the system scales from 4 to 32 workers automatically.
+- **Backpressure handling** — If downstream systems slow down, the pipeline applies backpressure to producers rather than dropping data.
-- **Horizontal Scaling**: Scale-out architecture
-- **Load Distribution**: Load balancing
-- **Resource Management**: Resource optimization
-- **Auto-scaling**: Automatic scaling
+> **Enterprise customers** can configure custom processing rules, retention policies, and data routing via the Admin Dashboard.

package/lib/templates/content/technical-questions/database-architecture.md CHANGED Viewed

@@ -1,83 +1,120 @@
 # Database Architecture
-Technical details about data storage and retrieval.
+How Yantra stores, indexes, and retrieves data across multiple database systems optimized for different access patterns.
-## Database Design
+---
-### Primary Database
+## Database Design Philosophy
-- **PostgreSQL**: Primary relational database
-- **Schema Design**: Optimized database schema
-- **Indexing Strategy**: Comprehensive indexing
-- **Partitioning**: Database partitioning
+Yantra follows a **polyglot persistence** strategy — each data type is stored in the database engine best suited for its access pattern:
-### Data Models
+| Data type                      | Engine                | Why                                                     |
+| ------------------------------ | --------------------- | ------------------------------------------------------- |
+| User accounts, billing, config | PostgreSQL 16         | ACID transactions, relational integrity, mature tooling |
+| Full-text + vector search      | Elasticsearch 8.x     | Inverted indexes, BM25 ranking, kNN vector search       |
+| Sessions, cache, rate limits   | Redis 7               | Sub-millisecond reads, TTL support, pub/sub             |
+| File uploads, backups          | S3-compatible storage | Virtually unlimited capacity, 11 nines durability       |
-- **User Data**: User account and profile data
-- **Content Data**: Search content and metadata
-- **Usage Data**: User interaction and analytics data
-- **System Data**: System configuration and logs
+---
-## Storage Systems
+## PostgreSQL — Primary Database
-### Multi-tier Storage
+### Schema design principles
-- **Hot Storage**: Frequently accessed data
-- **Warm Storage**: Moderately accessed data
-- **Cold Storage**: Archive and backup data
-- **Cache Layer**: High-speed cache storage
+- **Normalized core tables** — Users, organizations, subscriptions, and permissions follow 3NF to avoid data anomalies.
+- **JSONB for flexibility** — Metadata, user preferences, and integration configs use JSONB columns, combining schema flexibility with indexing support.
+- **Timestamped everything** — Every table includes `created_at` and `updated_at` columns with timezone-aware timestamps.
+- **Soft deletes** — Records are marked as deleted rather than physically removed, enabling audit trails and data recovery.
-### Data Distribution
+### Indexing strategy
-- **Sharding**: Horizontal data partitioning
-- **Replication**: Data replication for availability
-- **Backup**: Automated backup systems
-- **Recovery**: Disaster recovery procedures
+| Index type | Use case                   | Example                                            |
+| ---------- | -------------------------- | -------------------------------------------------- |
+| B-tree     | Equality and range queries | `WHERE created_at > '2026-01-01'`                  |
+| GIN        | JSONB containment queries  | `WHERE metadata @> '{"type": "pdf"}'`              |
+| Partial    | Hot data subsets           | `WHERE status = 'active'` (index only active rows) |
+| Covering   | Avoid table lookups        | Include all `SELECT` columns in the index          |
-## Query Optimization
+### High availability
-### Performance Tuning
+- **Streaming replication** — One synchronous standby + two async replicas.
+- **Automatic failover** — Patroni manages leader election; failover completes in < 10 seconds.
+- **Point-in-time recovery** — WAL archiving enables recovery to any second within the retention window (30 days).
-- **Query Analysis**: Query performance analysis
-- **Index Optimization**: Database index optimization
-- **Execution Plans**: Query execution optimization
-- **Connection Pooling**: Database connection pooling
+---
-### Caching Strategy
+## Elasticsearch — Search Index
-- **Query Caching**: Database query caching
-- **Result Caching**: Application-level caching
-- **CDN Caching**: Content delivery network caching
-- **Distributed Caching**: Distributed cache systems
+### Index architecture
+Each content type has its own Elasticsearch index with optimized mappings:
+- **Text fields** use `text` type with custom analyzers (language-specific stemming, synonym expansion).
+- **Vector fields** use `dense_vector` type for kNN semantic search.
+- **Keyword fields** for exact-match filtering (tags, content type, source).
+- **Date fields** for time-range queries and recency boosting.
+### Cluster topology
+- **3 dedicated master nodes** for cluster coordination.
+- **6+ data nodes** with SSD storage for search performance.
+- **2 coordinating nodes** for query routing and result aggregation.
+---
+## Redis — Cache & Real-Time
+### Cache patterns
+| Pattern             | Use case                     | TTL                     |
+| ------------------- | ---------------------------- | ----------------------- |
+| Query result cache  | Identical search queries     | 5 minutes               |
+| Session store       | User authentication sessions | 1 hour                  |
+| Rate limit counters | API rate limiting            | Rolling 1-minute window |
+| Pub/Sub channels    | Real-time notifications      | N/A (ephemeral)         |
+### Memory management
+- **Maxmemory policy** set to `allkeys-lru` — least recently used keys are evicted when memory limits are reached.
+- **Key namespacing** — All keys are prefixed by service name to avoid collisions.
+- **Cluster mode** — Redis Cluster with 6 nodes (3 primary + 3 replica) for horizontal scaling.
+---
 ## Data Management
-### Data Lifecycle
+### Backup strategy
+| What          | Frequency                   | Retention  | Method                          |
+| ------------- | --------------------------- | ---------- | ------------------------------- |
+| PostgreSQL    | Continuous WAL + daily full | 30 days    | pg_basebackup + WAL archiving   |
+| Elasticsearch | Daily snapshots             | 14 days    | Snapshot to S3                  |
+| Redis         | RDB snapshots + AOF         | 7 days     | Automated via Redis persistence |
+| File storage  | Cross-region replication    | Indefinite | S3 cross-region replication     |
-- **Data Ingestion**: Data collection and ingestion
-- **Data Processing**: Data transformation and processing
-- **Data Storage**: Data storage and organization
-- **Data Archival**: Data archival and cleanup
+### Data governance
-### Data Quality
+- **Encryption at rest** — AES-256 for all database storage volumes.
+- **Encryption in transit** — TLS 1.3 for all database connections.
+- **Access control** — Database credentials are rotated monthly via HashiCorp Vault.
+- **Audit logging** — All schema changes and administrative queries are logged.
-- **Data Validation**: Data quality validation
-- **Data Cleaning**: Data cleaning and normalization
-- **Data Monitoring**: Data quality monitoring
-- **Data Governance**: Data governance policies
+---
 ## Scalability
-### Horizontal Scaling
+### Current capacity
-- **Database Sharding**: Horizontal database scaling
-- **Read Replicas**: Read-only database replicas
-- **Load Distribution**: Database load distribution
-- **Auto-scaling**: Automatic database scaling
+| Metric                            | Value  |
+| --------------------------------- | ------ |
+| Total indexed documents           | 500M+  |
+| Database size (PostgreSQL)        | 2.4 TB |
+| Search index size (Elasticsearch) | 8.7 TB |
+| Peak queries per second           | 12,000 |
+| Average query latency             | 45 ms  |
-### Performance Monitoring
+### Scaling strategy
-- **Database Metrics**: Database performance metrics
-- **Query Monitoring**: Query performance monitoring
-- **Resource Monitoring**: Database resource monitoring
-- **Alerting**: Database performance alerting
+- **Vertical** — Increase instance sizes for immediate capacity (database-level).
+- **Horizontal** — Add read replicas (PostgreSQL), data nodes (Elasticsearch), or shard nodes (Redis) for linear scaling.
+- **Partitioning** — Time-based partitioning for PostgreSQL tables with high write volume.

package/lib/templates/content/technical-questions/infrastructure.md CHANGED Viewed

@@ -1,83 +1,132 @@
-# Infrastructure
+# Infrastructure & Reliability
-Yantra's technical infrastructure and scalability.
+Yantra's cloud infrastructure is designed for high availability, global performance, and security at every layer.
-## Cloud Infrastructure
+---
-### Cloud Architecture
+## Cloud Architecture
-- **Multi-cloud**: Multi-cloud deployment strategy
-- **Microservices**: Microservices architecture
-- **Containerization**: Docker containerization
-- **Orchestration**: Kubernetes orchestration
+### Multi-region deployment
-### Infrastructure Components
+Yantra runs across **3 AWS regions** (US East, EU West, AP Southeast) with active-active configuration. User traffic is routed to the nearest region via latency-based DNS routing.
-- **Compute**: Scalable compute resources
-- **Storage**: Distributed storage systems
-- **Networking**: High-performance networking
-- **Security**: Infrastructure security
+### Container orchestration
-## Scalability Design
+All services run as Docker containers orchestrated by **Kubernetes (EKS)**:
-### Horizontal Scaling
+- **Namespaces** isolate production, staging, and development environments.
+- **Resource quotas** prevent any single service from consuming excessive cluster resources.
+- **Rolling deployments** ensure zero-downtime updates with automatic rollback on health check failures.
+- **Horizontal Pod Autoscaler** adjusts replica counts based on CPU, memory, and custom metrics.
-- **Auto-scaling**: Automatic scaling based on demand
-- **Load Balancing**: Advanced load balancing
-- **Distributed Systems**: Distributed architecture
-- **Resource Pooling**: Resource pooling strategies
+### Service mesh
-### Performance Optimization
+An Istio-based service mesh provides:
-- **Caching**: Multi-layer caching
-- **CDN**: Content delivery network
-- **Database Optimization**: Database performance tuning
-- **Network Optimization**: Network performance optimization
+- **Mutual TLS** between all services (zero-trust networking).
+- **Traffic management** — Canary deployments, circuit breaking, retry policies.
+- **Observability** — Distributed tracing with Jaeger, metrics with Prometheus.
+---
 ## High Availability
-### Availability Design
+### Availability targets
+| Component      | Target SLA | Actual (trailing 12 months) |
+| -------------- | ---------- | --------------------------- |
+| API Gateway    | 99.99%     | 99.995%                     |
+| Search Service | 99.95%     | 99.98%                      |
+| AI Service     | 99.9%      | 99.94%                      |
+| Data Pipeline  | 99.9%      | 99.92%                      |
+### Redundancy design
+- **No single point of failure** — Every component has at least 2 replicas across different availability zones.
+- **Database failover** — Automated failover with < 10-second recovery for PostgreSQL and Redis.
+- **Cross-region replication** — Critical data is replicated across regions for disaster recovery.
+- **Graceful degradation** — If the AI service is unavailable, search still returns results without AI-generated summaries.
+### Disaster recovery
+| Metric                         | Target    | Actual     |
+| ------------------------------ | --------- | ---------- |
+| Recovery Time Objective (RTO)  | < 4 hours | 2.1 hours  |
+| Recovery Point Objective (RPO) | < 1 hour  | 15 minutes |
+| DR test frequency              | Quarterly | Monthly    |
+---
+## Monitoring & Observability
+### The three pillars
-- **Redundancy**: System redundancy
-- **Failover**: Automatic failover
-- **Disaster Recovery**: Disaster recovery procedures
-- **Backup Systems**: Comprehensive backup systems
+| Pillar  | Tools                            | Details                                          |
+| ------- | -------------------------------- | ------------------------------------------------ |
+| Metrics | Prometheus + Grafana             | 2,000+ custom metrics, 15-second scrape interval |
+| Logs    | Fluentd + Elasticsearch + Kibana | Structured JSON logs, 30-day retention           |
+| Traces  | Jaeger + OpenTelemetry           | End-to-end request tracing across all services   |
-### Monitoring
+### Alerting
-- **Health Monitoring**: System health monitoring
-- **Performance Monitoring**: Performance metrics
-- **Alerting**: Automated alerting systems
-- **Logging**: Comprehensive logging
+- **PagerDuty integration** for critical alerts (P1/P2) with automatic escalation.
+- **Slack notifications** for warnings and informational alerts.
+- **Anomaly detection** — ML-based alerting detects unusual patterns before they become incidents.
+- **Runbooks** — Every alert links to a runbook with diagnosis steps and remediation procedures.
+---
 ## Security Infrastructure
-### Security Measures
+### Network security
+- **VPC isolation** — All services run in private subnets with no direct internet access.
+- **WAF (Web Application Firewall)** — Protects against OWASP Top 10 threats.
+- **DDoS protection** — AWS Shield Advanced with automatic traffic scrubbing.
+- **Egress filtering** — Outbound traffic is restricted to known-good destinations.
+### Secrets management
+- **HashiCorp Vault** for all secrets, API keys, and database credentials.
+- **Automatic rotation** — Secrets are rotated on configurable schedules (default: 30 days).
+- **Just-in-time access** — Engineers request temporary elevated access via an approval workflow.
+---
+## CI/CD Pipeline
+### Deployment flow
+1. **Code push** — Developer pushes to a feature branch on GitHub.
+2. **CI checks** — Automated linting, type checking, unit tests, and integration tests run in GitHub Actions.
+3. **Build** — Docker images are built, scanned for vulnerabilities, and pushed to ECR.
+4. **Staging deploy** — ArgoCD deploys to the staging environment automatically.
+5. **QA validation** — Automated end-to-end tests + manual spot checks.
+6. **Production deploy** — Canary deployment to 5% of traffic, then gradual rollout to 100%.
+7. **Post-deploy monitoring** — Automated health checks verify error rates and latency for 30 minutes.
-- **Network Security**: Network-level security
-- **Application Security**: Application security
-- **Data Security**: Data protection measures
-- **Access Control**: Access control systems
+### Deployment frequency
-### Compliance
+- **Production deploys** — 8-12 per day across all services.
+- **Rollback time** — Under 60 seconds to previous version.
+- **Feature flags** — LaunchDarkly for gradual feature rollouts and instant kill switches.
-- **Security Standards**: Industry security standards
-- **Compliance Monitoring**: Compliance monitoring
-- **Audit Logging**: Comprehensive audit logs
-- **Security Testing**: Regular security testing
+---
-## Global Infrastructure
+## Global Performance
-### Geographic Distribution
+### Content delivery
-- **Multi-region**: Multi-region deployment
-- **Edge Computing**: Edge computing capabilities
-- **Latency Optimization**: Low-latency optimization
-- **Data Residency**: Data residency compliance
+- **200+ edge locations** via CloudFront CDN.
+- **Edge caching** for static assets with 85%+ cache hit rate.
+- **Dynamic content acceleration** — Optimized TCP connections and HTTP/3 support.
-### Network Architecture
+### Latency optimization
-- **Global Network**: Global network infrastructure
-- **Peering**: Network peering agreements
-- **Traffic Management**: Intelligent traffic management
-- **Bandwidth**: High-bandwidth connectivity
+| Region         | Average API latency |
+| -------------- | ------------------- |
+| US East        | 35 ms               |
+| US West        | 52 ms               |
+| EU West        | 41 ms               |
+| AP Southeast   | 68 ms               |
+| Global average | 48 ms               |