IKS Cognitive Research Platform
IKS Cognitive Research Platform - Technical Design
Version: 1.0
Date: July 16, 2025
Author(s): Shankar Santhamoorthy
Status: Approved for Implementation
IKS Cognitive Research Platform - Technical Design
2.1. High-Level Reference Architecture Blueprint
2.2. Architectural Tiers & Components
3. Detailed Design & Data Flows
3.1. Authentication Flow (OIDC)
3.2. RAG Query & Concordance Flow
4. Deployment & Infrastructure (GKE)
9. Scalability, Resilience, and Security
10. Future Considerations (v2.0)
1. Introduction
1.1. Project Overview & Goals
This document outlines the technical design for "Concorde," a cloud-native, microservice-based application designed to provide a conversational AI agent. The agent can answer questions by retrieving, comparing, and synthesizing information from multiple, independent knowledge sources (a local file system and a Google Drive folder). The primary goal is to create a scalable, secure, and extensible platform for agentic information retrieval and analysis.
1.2. Scope
- In Scope:
- A web-based chat interface for user interaction.
- Ingestion of documents from a designated local folder and a Google Drive folder.
- A core "Concordance Engine" that uses LangChain to orchestrate multiple retrieval agents.
- Logic to compare and reason about the retrieved information from different sources.
- A secure authentication layer using OIDC/OAuth 2.0.
- Deployment of the entire system as containerized microservices on Google Kubernetes Engine (GKE).
- A full DevSecOps pipeline for CI/CD, security scanning, and monitoring.
- Out of Scope:
- Support for data sources other than the local file system and Google Drive in v1.0.
- Advanced, role-based access control (RBAC) within the application itself.
- Real-time, collaborative document editing.
1.3. Key Terminology
- RAG: Retrieval-Augmented Generation.
- MCP: Model Context Protocol - The structured JSON object used for communication between the API tier and the backend.
- A2A: Agent-to-Agent - Communication between the Orchestrator and specialized Retriever agents.
- GKE: Google Kubernetes Engine.
- IdP: Identity Provider (e.g., Google, Facebook).
1.4 Reference Documents
# | Reference Type | URL |
1 | IKS Cognitive Research Platform - Business Requirements |
2. System Architecture
The system is designed as a set of decoupled microservices, orchestrated by Kubernetes, and integrated with managed Google Cloud services.
2.1. High-Level Reference Architecture Blueprint
Description | For Higher Resolution View Click the link below |
URL Link |
2.2. Architectural Tiers & Components
Tier Name | Brief Description | Core Technology | Deployment |
Frontend Web-Client | The user's browser running the Angular SPA. | Angular, TypeScript | User's Local Browser |
Security Gateway | The secure entry point to the cluster, handling authentication. | GKE Ingress, API Gateway, OIDC | GKE Ingress Resource |
Frontend Web-Service | Serves the static files for the Angular application. | Nginx, Docker | GKE Deployment & Service |
API Tier | Manages business logic, state, and assembles the MCP. | Node.js, Express, MongoDB, Docker | GKE Deployment & Service |
Concordance Engine | The central orchestrator that manages the RAG workflow. | Python, Flask, LangChain, Docker | GKE Deployment & Service |
RAG Retriever Agents | Specialized microservices for data retrieval from each source. | Python, Flask, Docker | GKE Deployments & Services |
Data Sources | Persistent knowledge stores. | Google Drive API, GCP Persistent Disk | GCP Managed Service / GKE PV |
Core AI Services | Managed Google AI services for embeddings, search, and generation. | Vertex AI, Gemini Models | GCP Managed Services |
Technology Stacks
This table breaks down every architectural tier, detailing not just the "what" but the "why" behind each technology and design pattern choice.
# | Tier / Module Name | Brief Description | Core Functionality | Technology Stack |
1 | Frontend Web-Client | The user's browser running the dynamic, single-page chat application. Interacts with the backend via the secure gateway. | - Renders the chat UI. - Manages real-time UI state. - Handles OIDC redirects. - Attaches session tokens to API requests. | Angular Framework |
2 | Security Gateway | The single, hardened entry point to the GKE cluster, enforcing authentication before allowing traffic to internal services. | - Intercepts all incoming traffic. - Manages the OIDC/OAuth 2.0 login flow. - Validates session tokens (JWTs). - Routes authenticated traffic. | GKE Ingress Controller with Identity-Aware Proxy (IAP) or a dedicated API Gateway (e.g., Kong) |
3 | Frontend Web-Service | A lightweight, containerized web server that serves the static files of the Angular application. | - Serves the initial index.html. - Serves compiled JS, CSS, and assets. | Nginx |
4 | API Tier | The central backend microservice manages business logic, state, and publishing events. | - Provides REST & WebSocket endpoints. - Manages chat history in MongoDB. - Checks cache before processing. - Assembles & publishes MCP events to Pub/Sub. | - Web Server: Node.js / Express.js - Database: MongoDB - Cache: Redis - Message Queue Client: Google Cloud Pub/Sub SDK |
5 | Concordance Engine (Orchestrator) | The 'brain' of the RAG system, orchestrating multiple retrieval agents and LLMs via an event-driven flow. | - Subscribes to 'QUERY_RECEIVED' events. - Publishes 'RETRIEVAL_JOB' events. - Subscribes to 'CONTEXT_READY' events. - Performs concordance via LangChain. - Calls multiple LLMs via an aggregator. - Publishes 'ANSWER_READY' events. | - Web Framework/Runtime: Python / Flask - Orchestration: LangChain - Message Queue Client: Google Cloud Pub/Sub SDK - Cache: Redis |
6 | RAG Retriever Agents | Specialized microservices that retrieve context from one specific data source and manage their own long-term memory. | - Subscribes to 'RETRIEVAL_JOB' events. - Fetches context from its data source (Local Volume or Drive API). - Interacts with Vertex AI for embedding/search. - Updates its own Agentic Memory. - Publishes 'CONTEXT_READY' events. | - Web Framework/Runtime: Python / Flask - Database: MongoDB (for Agentic Memory) - Data Sources: Google Drive API, File System I/O - AI Services: Vertex AI SDK |
7 | Core LLM & AI Tier | The suite of fully managed Google and third-party AI services providing foundational AI capabilities. | - Embedding: Converts text to vectors. - Vector Search: Stores and searches vectors. - Generation: Creates text answers. | - Vertex AI Vector Search - Vertex AI Model Garden (Gemini) - Third-Party LLM APIs (OpenAI, Anthropic) |
8 | MCP & A2A Protocols | The logical data contracts that define how the microservices communicate, both for user context and inter-agent tasks. | - MCP: Carries user query and chat history. - A2A: Carries retrieval job instructions and results. | JSON (Data Format) |
Additional Information
Description | The following additional information on each of these tracks is available
|
URL Link |
3. Detailed Design & Data Flows
3.1. Authentication Flow (OIDC)
The OIDC Authentication Code Flow is managed by the Security Gateway. It redirects unauthenticated users to an external IdP (e.g., Google). Upon successful login, it exchanges an authorization code for a JWT, validates it, and establishes a secure session for the user before forwarding requests to the application.
- An unauthenticated user request hits the Security Gateway.
- The Gateway redirects the user's browser to the configured Identity Provider (IdP).
- The user authenticates with the IdP.
- The IdP redirects the user back to the Gateway with a one-time authorization code.
- The Gateway performs a back-channel exchange of the code for a JWT ID Token.
- The Gateway validates the JWT, creates a session, and forwards the request to the internal services, injecting the user's identity into an HTTP header (e.g., X-Authenticated-User-Email).
3.2. RAG Query & Concordance Flow
- MCP Assembly (API Tier): Upon receiving a query, the ApiPod fetches the user's chat history from MongoDB (to support UC-03) and assembles the formal MCP object, which now includes a sources array (e.g., ['local_files', 'gemini_ai']).
- Orchestration (Concordance Engine): The ConcordancePod receives the MCP. Its core routing logic inspects the sources array.
a. For RAG sources (local_files, google_drive): It dispatches an A2A retrieval job to the corresponding specialized retriever microservice.
b. For Direct LLM sources (gemini_ai, chat_gpt): It passes the query directly to its internal LLM Aggregator component. - Parallel Execution: All dispatched jobs run in parallel.
- Aggregation & Concordance: The Orchestrator waits for all jobs to return their results (RAG contexts and/or direct LLM answers). It then uses a powerful reasoning LLM (e.g., Gemini 1.5 Pro) to perform the final concordance analysis and synthesize the final answer as required by UC-02.
- Response & History: The final answer is passed back to the ApiPod, which saves the exchange to MongoDB (fulfilling UC-04) and returns the result to the user.
- Request Initiation (MCP Client):
- The Angular Client sends a simple query { "query": "..." } to the API Tier.
- The Node.js API Pod receives the query, fetches the conversation history from MongoDB, and assembles the formal MCP Object.
- Orchestration (MCP Server & A2A):
- The API Pod sends the MCP Object via HTTP POST to the Concordance Engine Pod.
- The Concordance Engine parses the MCP, extracts the query, and makes parallel HTTP POST requests (A2A) to the Local File Retriever and the Google Drive Retriever.
- Context Retrieval:
- Each Retriever Agent receives the query, generates an embedding using the Vertex AI Embedding Model, and queries the Vertex AI Vector Search to find the top-K relevant text chunks from its specific data source.
- Each agent returns its retrieved context to the Concordance Engine.
- Concordance & Synthesis:
- The Concordance Engine's LangChain logic receives context from all agents.
- It constructs a detailed analytical prompt containing the RAG contexts, the conversation history, and the user's query.
- It sends this rich prompt to the Gemini LLM for analysis and synthesis.
- Response Delivery:
- The Gemini LLM returns the final, synthesized answer.
- The answer is passed back up the chain: Concordance Engine -> API Tier -> Angular Client -> User.
- The API Tier saves the final bot response to MongoDB for history.
The graphical representation of this process flow is provided here
Description | For Higher Resolution View Click the link below |
URL Link |
# | Component | Color Code | Steps |
1 | Request Initiation | Blue |
|
2 | Orchestration | Orange |
|
3 | Context Retrieval | Green |
|
4 | Concordance & Synthesis | Blue |
|
5 | Response Delivery | Yellow |
|
3.3. Data Contracts
# | Category | Description | Schema (json) |
1 | Model Context Protocol (MCP) v2.0: | The JSON object sent from the API Tier to the Concordance Engine. | { "query": "string", "sources": ["string"], "conversation_history": [ { "role": "user" | "model", "content": "string" } ], "metadata": { "user_id": "string", "session_id": "string" } } |
2 | A2A Retrieval Request v1.0 | The JSON object sent from the Concordance Engine to a Retriever Agent. | { "query": "string" } |
3 | A2A Retrieval Response v1.0 | The JSON object returned from a Retriever Agent. | { "source_name": "string", // e.g., "local_files" or "google_drive" "retrieved_contexts": [ { "text": "string", "score": "float", "source_document": "string" } ] } |
4. Deployment & Infrastructure (GKE)
4.1. Kubernetes Manifests
The system will be defined by a set of YAML manifests, stored in a dedicated Git repository, including:
- Namespace: A dedicated namespace (e.g., rag-app) to logically isolate all components.
- Deployments: One for each microservice pod (Frontend, API, Concordance, Local Retriever, Drive Retriever, MongoDB).
- Services: A ClusterIP service for each backend deployment to enable internal communication.
- PersistentVolumeClaim: To request storage for the Local Files RAG source and for MongoDB data.
- Ingress: A single Ingress resource to manage external traffic, routing to the Frontend and API services.
- Secrets: To store the Gemini API key, database credentials, and OAuth client secret.
4.2. Containerization
Each microservice will have its own Dockerfile.
- Frontend: A multi-stage Docker build that first uses a Node image to run ng build, then copies the resulting /dist folder into a lightweight Nginx image.
- Backend Services: Will use official Node.js and Python slim base images.
5. DevSecOps Pipeline
The development lifecycle will be managed by a CI/CD pipeline.
Stage | Key Process | Tools |
Plan & Design | Define user stories, design APIs. | Jira, Git, Mermaid |
Code & Develop | Write feature code and unit tests. | VS Code, Jest, PyTest |
Build & Integrate (CI) | On Git push, auto-build and run tests. | Cloud Build, GitHub Actions |
Secure & Push | Scan Docker images for vulnerabilities before pushing. | Artifact Analysis, Snyk |
Deploy (CD) | On successful push, auto-deploy to GKE via kubectl apply. | Cloud Build, Argo CD |
Operate & Monitor | Collect logs, metrics, and traces. Set up alerts. | Google Cloud Logging/Monitoring |
Feedback & Iterate | Analyze data to create new stories. | Analytics, User Feedback |
This table outlines a mature toolchain for implementing the DevSecOps pipeline we designed.
# | Category | Description | Recommended Tools | Primary Vendor / OSS |
1 | CI/CD Integration | The central platform that orchestrates the entire pipeline, from code commit to deployment. | Google Cloud Build, GitHub Actions | Google / Microsoft |
2 | Software Composition Analysis (SCA) | Scans application dependencies (e.g., from package.json, requirements.txt) for known vulnerabilities. | Snyk, Dependabot (GitHub), Google Artifact Analysis (on container push) | Snyk / Microsoft / Google |
3 | Static Application Security Testing (SAST) | Analyzes the application's source code without executing it to find security flaws like SQL injection, hardcoded secrets, etc. | CodeQL (GitHub), SonarQube, Snyk Code | Microsoft / SonarSource / Snyk |
4 | Dynamic Application Security Testing (DAST) | Tests the *running* application by sending malicious-looking requests to find vulnerabilities like Cross-Site Scripting (XSS). Often run in a staging environment. | OWASP ZAP, Burp Suite, Invicti | OWASP (OSS) / PortSwigger / Invicti |
5 | Container Security Scanning | Scans the final Docker images for vulnerabilities in the base OS layers and system libraries. | Google Artifact Analysis, Trivy, Clair | Google / Aqua Security (OSS) / Quay (OSS) |
6 | Infrastructure as Code (IaC) Security | Scans Terraform or other IaC files for security misconfigurations before they are applied to the cloud environment. | Checkov, tfsec, KICS | Palo Alto Networks (OSS) / Aqua Security (OSS) / Checkmarx (OSS) |
7 | Vulnerability Management | A centralized dashboard for tracking, triaging, and managing all vulnerabilities found across the various scanning stages. | Google Security Command Center, DefectDojo, Kenna Security | Google / OWASP (OSS) / Cisco |
8 | Threat Modeling | A procedural process, not a single tool, for proactively identifying and mitigating potential security threats during the design phase. | Diagramming Tools (Mermaid, Lucidchart), STRIDE methodology, OWASP Threat Dragon | N/A (Process) |
9 | Observability and Monitoring | The collection, analysis, and visualization of logs, metrics, and traces from the live application to detect issues and understand performance. | Google Cloud's operations suite (Cloud Logging, Monitoring, Trace), Prometheus, Grafana, Datadog | Google / CNCF (OSS) / Datadog |
Description | For Higher Resolution View Click the link below |
URL Link |
8. Integration Touchpoint Details
This table provides a granular view of every single "arrow" on our architecture diagram, detailing the nature of each connection.
# | Source Touchpoint Name | Source Touchpoint Description | Source Hosting Location | Destination Touchpoint Name | Destination Touchpoint Description | Destination Hosting Location | Integration Mode | Average Frequency | Trigger Direction | Integration Channel / Protocol | Message Format | Typical Peak Message Length | Message Acknowledgement (Y/N) |
1 | Angular Frontend | User's browser making an API call | User's PC | Security Gateway | The GKE Ingress/API Gateway endpoint | GKE Cluster | On-Demand | Per User Action | Push | HTTPS / REST | JSON | < 5 KB (query text) | Y (HTTP 200 OK) |
2 | Security Gateway | The gateway initiating an OIDC login flow | GKE Cluster | External Identity Provider | The login page of Google, Facebook, etc. | Third-Party SaaS | On-Demand | Per User Login | Push (Redirect) | HTTPS / OIDC | HTTP Redirects | N/A | Y (via redirect with auth code) |
3 | API Pod | Node.js server checking for a cached response | GKE Cluster | Memorystore for Redis | The managed Redis cache instance | GCP Managed Service | On-Demand | Per API Call | Pull | Redis Protocol | Binary | < 50 KB (cached JSON) | Y (protocol-level) |
4 | API Pod | Node.js server publishing the initial user query | GKE Cluster | Google Cloud Pub/Sub | The 'QUERY_RECEIVED' topic | GCP Managed Service | On-Demand | Per User Query | Push | gRPC / HTTP (via SDK) | JSON (MCP Object) | < 50 KB | Y (API call success) |
5 | Google Cloud Pub/Sub | Message queue delivering the user query | GCP Managed Service | Concordance Orchestrator Pod | The Python service subscribing to the topic | GKE Cluster | On-Demand (Event-Driven) | Per User Query | Push (Subscription) | gRPC / HTTP (via SDK) | JSON (MCP Object) | < 50 KB | Y (subscriber acknowledges message) |
6 | Concordance Orchestrator Pod | Orchestrator publishing retrieval jobs | GKE Cluster | Google Cloud Pub/Sub | The 'RETRIEVAL_JOB_DISPATCHED' topic | GCP Managed Service | On-Demand | Per User Query (x2) | Push | gRPC / HTTP (via SDK) | JSON (A2A Request) | < 5 KB | Y (API call success) |
7 | RAG Retriever Pods | Specialized agents fetching their assigned jobs | GKE Cluster | Google Cloud Pub/Sub | The 'RETRIEVAL_JOB_DISPATCHED' topic | GCP Managed Service | On-Demand (Event-Driven) | Per User Query | Pull (Subscription) | gRPC / HTTP (via SDK) | JSON (A2A Request) | < 5 KB | Y (subscriber acknowledges message) |
8 | Local File Retriever Pod | Agent reading from its mounted disk | GKE Cluster | Persistent Volume Claim | The mounted local file directory | GKE Cluster | On-Demand | Per Retrieval Job | Pull | File System I/O (POSIX) | Binary / Text | < 2 MB (per file read) | N/A |
9 | Google Drive Retriever Pod | Agent downloading a file from Drive | GKE Cluster | Google Drive API | The API endpoint for fetching file content | GCP Managed Service | On-Demand | Per Retrieval Job | Pull | HTTPS / REST (OAuth 2.0) | Binary / Text | < 2 MB (per file read) | Y (HTTP 200 OK) |
10 | RAG Retriever Pods | Agents generating embeddings for text chunks | GKE Cluster | Vertex AI Embedding Model | The managed embedding API endpoint | GCP Managed Service | On-Demand | On Data Ingestion & Per Query | Push | gRPC / REST | JSON | < 100 KB (batch of chunks) | Y (API call success) |
11 | RAG Retriever Pods | Agents querying for similar vectors | GKE Cluster | Vertex AI Vector Search | The managed vector search index endpoint | GCP Managed Service | On-Demand | Per Retrieval Job | Push | gRPC / REST | JSON / Vector Format | < 20 KB | Y (API call success) |
12 | RAG Retriever Pods | Agents updating their long-term memory | GKE Cluster | Agentic Memory DB (MongoDB) | The database storing retrieval metadata | GKE Cluster | On-Demand | Per Retrieval Job | Push | MongoDB Wire Protocol | BSON (Binary JSON) | < 10 KB | Y (write acknowledgement) |
13 | RAG Retriever Pods | Agents publishing their results | GKE Cluster | Google Cloud Pub/Sub | The 'CONTEXT_RETRIEVED' topic | GCP Managed Service | On-Demand | Per Retrieval Job | Push | gRPC / HTTP (via SDK) | JSON (A2A Response) | < 100 KB (context chunks) | Y (API call success) |
14 | Concordance Orchestrator Pod | Orchestrator making a final reasoning call | GKE Cluster | External LLM APIs (Gemini, OpenAI, etc.) | The API endpoints for the various LLMs | GCP / Third-Party SaaS | On-Demand | Per User Query | Push | HTTPS / REST / gRPC | JSON | < 200 KB (rich prompt) | Y (API call success) |
15 | API Pod | Node.js server subscribing for the final answer | GKE Cluster | Google Cloud Pub/Sub | The 'FINAL_ANSWER_READY' topic | GCP Managed Service | On-Demand (Event-Driven) | Per User Query | Pull (Subscription) | gRPC / HTTP (via SDK) | JSON | < 10 KB | Y (subscriber acknowledges message) |
16 | API Pod | Node.js server pushing the answer to the client | GKE Cluster | Angular Frontend | The user's active browser session | User's PC | On-Demand (Event-Driven) | Per Final Answer | Push | WebSockets | JSON | < 10 KB | N (fire-and-forget, though TCP ensures delivery) |
9. Scalability, Resilience, and Security
- Scalability: Each microservice deployment can be scaled independently by increasing its replica count in the Kubernetes manifest (kubectl scale deployment...).
- Resilience: GKE automatically handles pod failures by restarting them. The microservice architecture ensures that a failure in one component (e.g., the Drive Retriever) does not bring down the entire system.
- Security:
- Authentication is centralized at the Security Gateway.
- All sensitive data is stored in Kubernetes Secrets.
- Network Policies can be applied within GKE to restrict communication, ensuring, for example, that only the Concordance Engine can call the Retriever Agents.
- Regular container scanning and dependency updates are enforced by the CI/CD pipeline.
10. Future Considerations (v2.0)
- Adding New RAG Agents: The architecture supports this by simply creating a new retriever microservice and updating the Concordance Engine's configuration to call it.
- Caching: A Redis cache could be added between the API Tier and the Concordance Engine to cache responses for common queries.
- Advanced State Management: For more complex, multi-turn interactions, the MongoDB schema could be enhanced to store a more detailed "agent state" for each session.
11.Infrastructure Architecture
This diagram helps to visually depict the target state infrastructure architecture for a scalable production environment
Author: Shankar Santhamoorthy