disablerightclick

संज्ञानात्मकसंशोधनमञ्चस्य तकनीकीनिर्माणम् (saṁjñānātmakasaṁśōdhanamañcasya takanīkīnirmāṇam - Technical design of a cognitive research platform)

 IKS Cognitive Research Platform


IKS Cognitive Research Platform - Technical Design

Version: 1.0
Date: July 16, 2025
Author(s): Shankar Santhamoorthy
Status: Approved for Implementation


IKS Cognitive Research Platform - Technical Design

1. Introduction

1.1. Project Overview & Goals

1.2. Scope

1.3. Key Terminology

2. System Architecture

2.1. High-Level Reference Architecture Blueprint

2.2. Architectural Tiers & Components

Technology Stacks

3. Detailed Design & Data Flows

3.1. Authentication Flow (OIDC)

3.2. RAG Query & Concordance Flow

3.3. Data Contracts

4. Deployment & Infrastructure (GKE)

4.1. Kubernetes Manifests

4.2. Containerization

5. DevSecOps Pipeline

9. Scalability, Resilience, and Security

10. Future Considerations (v2.0)

1. Introduction

1.1. Project Overview & Goals

This document outlines the technical design for "Concorde," a cloud-native, microservice-based application designed to provide a conversational AI agent. The agent can answer questions by retrieving, comparing, and synthesizing information from multiple, independent knowledge sources (a local file system and a Google Drive folder). The primary goal is to create a scalable, secure, and extensible platform for agentic information retrieval and analysis.

1.2. Scope

  • In Scope:
  • A web-based chat interface for user interaction.
  • Ingestion of documents from a designated local folder and a Google Drive folder.
  • A core "Concordance Engine" that uses LangChain to orchestrate multiple retrieval agents.
  • Logic to compare and reason about the retrieved information from different sources.
  • A secure authentication layer using OIDC/OAuth 2.0.
  • Deployment of the entire system as containerized microservices on Google Kubernetes Engine (GKE).
  • A full DevSecOps pipeline for CI/CD, security scanning, and monitoring.

  • Out of Scope:
  • Support for data sources other than the local file system and Google Drive in v1.0.
  • Advanced, role-based access control (RBAC) within the application itself.
  • Real-time, collaborative document editing.

1.3. Key Terminology

  • RAG: Retrieval-Augmented Generation.
  • MCP: Model Context Protocol - The structured JSON object used for communication between the API tier and the backend.
  • A2A: Agent-to-Agent - Communication between the Orchestrator and specialized Retriever agents.
  • GKE: Google Kubernetes Engine.
  • IdP: Identity Provider (e.g., Google, Facebook).

1.4  Reference Documents

#

Reference Type

URL

1

IKS Cognitive Research Platform - Business Requirements

IKS Cognitive Research Platform - Business Requirements


2. System Architecture

The system is designed as a set of decoupled microservices, orchestrated by Kubernetes, and integrated with managed Google Cloud services.

2.1. High-Level Reference Architecture Blueprint

Description

For Higher Resolution View Click the link below

URL Link

New Reference Architecture Blueprint.png

2.2. Architectural Tiers & Components

Tier Name

Brief Description

Core Technology

Deployment

Frontend Web-Client

The user's browser running the Angular SPA.

Angular, TypeScript

User's Local Browser

Security Gateway

The secure entry point to the cluster, handling authentication.

GKE Ingress, API Gateway, OIDC

GKE Ingress Resource

Frontend Web-Service

Serves the static files for the Angular application.

Nginx, Docker

GKE Deployment & Service

API Tier

Manages business logic, state, and assembles the MCP.

Node.js, Express, MongoDB, Docker

GKE Deployment & Service

Concordance Engine

The central orchestrator that manages the RAG workflow.

Python, Flask, LangChain, Docker

GKE Deployment & Service

RAG Retriever Agents

Specialized microservices for data retrieval from each source.

Python, Flask, Docker

GKE Deployments & Services

Data Sources

Persistent knowledge stores.

Google Drive API, GCP Persistent Disk

GCP Managed Service / GKE PV

Core AI Services

Managed Google AI services for embeddings, search, and generation.

Vertex AI, Gemini Models

GCP Managed Services


Technology Stacks

This table breaks down every architectural tier, detailing not just the "what" but the "why" behind each technology and design pattern choice.

#

Tier / Module Name

Brief Description

Core Functionality

Technology Stack

1

Frontend Web-Client

The user's browser running the dynamic, single-page chat application. Interacts with the backend via the secure gateway.

- Renders the chat UI.

- Manages real-time UI state.

- Handles OIDC redirects.

- Attaches session tokens to API requests.

Angular Framework

2

Security Gateway

The single, hardened entry point to the GKE cluster, enforcing authentication before allowing traffic to internal services.

- Intercepts all incoming traffic.

- Manages the OIDC/OAuth 2.0 login flow.

- Validates session tokens (JWTs).

- Routes authenticated traffic.

GKE Ingress Controller with Identity-Aware Proxy (IAP) or a dedicated API Gateway (e.g., Kong)

3

Frontend Web-Service

A lightweight, containerized web server that serves the static files of the Angular application.

- Serves the initial index.html.

- Serves compiled JS, CSS, and assets.

Nginx

4

API Tier

The central backend microservice manages business logic, state, and publishing events.

- Provides REST & WebSocket endpoints.

- Manages chat history in MongoDB.

- Checks cache before processing.

- Assembles & publishes MCP events to Pub/Sub.

- Web Server: Node.js / Express.js

- Database: MongoDB

- Cache: Redis

- Message Queue Client: Google Cloud Pub/Sub SDK

5

Concordance Engine (Orchestrator)

The 'brain' of the RAG system, orchestrating multiple retrieval agents and LLMs via an event-driven flow.

- Subscribes to 'QUERY_RECEIVED' events.

- Publishes 'RETRIEVAL_JOB' events.

- Subscribes to 'CONTEXT_READY' events.

- Performs concordance via LangChain.

- Calls multiple LLMs via an aggregator.

- Publishes 'ANSWER_READY' events.

- Web Framework/Runtime: Python / Flask

- Orchestration: LangChain

- Message Queue Client: Google Cloud Pub/Sub SDK

- Cache: Redis

6

RAG Retriever Agents

Specialized microservices that retrieve context from one specific data source and manage their own long-term memory.

- Subscribes to 'RETRIEVAL_JOB' events.

- Fetches context from its data source (Local Volume or Drive API).

- Interacts with Vertex AI for embedding/search.

- Updates its own Agentic Memory.

- Publishes 'CONTEXT_READY' events.

- Web Framework/Runtime: Python / Flask

- Database: MongoDB (for Agentic Memory)

- Data Sources: Google Drive API, File System I/O

- AI Services: Vertex AI SDK

7

Core LLM & AI Tier

The suite of fully managed Google and third-party AI services providing foundational AI capabilities.

- Embedding: Converts text to vectors.

- Vector Search: Stores and searches vectors.

- Generation: Creates text answers.

- Vertex AI Vector Search

- Vertex AI Model Garden (Gemini)

- Third-Party LLM APIs (OpenAI, Anthropic)

8

MCP & A2A Protocols

The logical data contracts that define how the microservices communicate, both for user context and inter-agent tasks.

- MCP: Carries user query and chat history.

- A2A: Carries retrieval job instructions and results.

JSON (Data Format)

Additional Information

Description

The following additional information on each of these tracks is available

  • Technology Vendors        
  • Rationale for selecting the technology        
  • Deployment Hosting Infrastructure        
  • Integration Touchpoints (Protocols)        
  • Design Patterns for the tier        
  • Rationale for selecting the design pattern

URL Link

My-IKS-CognitiveResearch

3. Detailed Design & Data Flows

3.1. Authentication Flow (OIDC)

The OIDC Authentication Code Flow is managed by the Security Gateway. It redirects unauthenticated users to an external IdP (e.g., Google). Upon successful login, it exchanges an authorization code for a JWT, validates it, and establishes a secure session for the user before forwarding requests to the application.

  1. An unauthenticated user request hits the Security Gateway.
  2. The Gateway redirects the user's browser to the configured Identity Provider (IdP).
  3. The user authenticates with the IdP.
  4. The IdP redirects the user back to the Gateway with a one-time authorization code.
  5. The Gateway performs a back-channel exchange of the code for a JWT ID Token.
  6. The Gateway validates the JWT, creates a session, and forwards the request to the internal services, injecting the user's identity into an HTTP header (e.g., X-Authenticated-User-Email).

3.2. RAG Query & Concordance Flow

  1. MCP Assembly (API Tier): Upon receiving a query, the ApiPod fetches the user's chat history from MongoDB (to support UC-03) and assembles the formal MCP object, which now includes a sources array (e.g., ['local_files', 'gemini_ai']).
  2. Orchestration (Concordance Engine): The ConcordancePod receives the MCP. Its core routing logic inspects the sources array.
    a. For RAG sources (local_files, google_drive): It dispatches an A2A retrieval job to the corresponding specialized retriever microservice.
    b. For Direct LLM sources (gemini_ai, chat_gpt): It passes the query directly to its internal LLM Aggregator component.
  3. Parallel Execution: All dispatched jobs run in parallel.
  4. Aggregation & Concordance: The Orchestrator waits for all jobs to return their results (RAG contexts and/or direct LLM answers). It then uses a powerful reasoning LLM (e.g., Gemini 1.5 Pro) to perform the final concordance analysis and synthesize the final answer as required by UC-02.
  5. Response & History: The final answer is passed back to the ApiPod, which saves the exchange to MongoDB (fulfilling UC-04) and returns the result to the user.

  1. Request Initiation (MCP Client):
  • The Angular Client sends a simple query { "query": "..." } to the API Tier.
  • The Node.js API Pod receives the query, fetches the conversation history from MongoDB, and assembles the formal MCP Object.
  1. Orchestration (MCP Server & A2A):
  • The API Pod sends the MCP Object via HTTP POST to the Concordance Engine Pod.
  • The Concordance Engine parses the MCP, extracts the query, and makes parallel HTTP POST requests (A2A) to the Local File Retriever and the Google Drive Retriever.
  1. Context Retrieval:
  • Each Retriever Agent receives the query, generates an embedding using the Vertex AI Embedding Model, and queries the Vertex AI Vector Search to find the top-K relevant text chunks from its specific data source.
  • Each agent returns its retrieved context to the Concordance Engine.
  1. Concordance & Synthesis:
  • The Concordance Engine's LangChain logic receives context from all agents.
  • It constructs a detailed analytical prompt containing the RAG contexts, the conversation history, and the user's query.
  • It sends this rich prompt to the Gemini LLM for analysis and synthesis.
  1. Response Delivery:
  • The Gemini LLM returns the final, synthesized answer.
  • The answer is passed back up the chain: Concordance Engine -> API Tier -> Angular Client -> User.
  • The API Tier saves the final bot response to MongoDB for history.

The graphical representation of this process flow is provided here

Description

For Higher Resolution View Click the link below

URL Link

RAG Query & Concordance Flow.png

#

Component

Color Code

Steps

1

Request Initiation

Blue

  • Angular client starts the flow
  • API pod assembles MCP object with history

2

Orchestration

Orange

  • Parallel A2A requests to retrieval agents                

3

Context Retrieval

Green

  • Embedding generation and vector search
  • Both retrievers return context chunks

4

Concordance & Synthesis

Blue

  • LangChain constructs analytical prompt
  • Gemini LLM generates final answer

5

Response Delivery

Yellow

  • Answer delivered to client
  • Response saved to MongoDB

3.3. Data Contracts

#

Category

Description

Schema (json)

1

Model Context Protocol (MCP) v2.0:

The JSON object sent from the API Tier to the Concordance Engine.

    { "query": "string", "sources": ["string"], "conversation_history": [ { "role": "user" | "model", "content": "string" } ], "metadata": { "user_id": "string", "session_id": "string" } }

2

A2A Retrieval Request v1.0

The JSON object sent from the Concordance Engine to a Retriever Agent.

  { "query": "string" }

3

A2A Retrieval Response v1.0

The JSON object returned from a Retriever Agent.

   {

  "source_name": "string", // e.g., "local_files" or "google_drive"

  "retrieved_contexts": [

    { "text": "string", "score": "float", "source_document": "string" }

  ]

}


4. Deployment & Infrastructure (GKE)

4.1. Kubernetes Manifests

The system will be defined by a set of YAML manifests, stored in a dedicated Git repository, including:

  • Namespace: A dedicated namespace (e.g., rag-app) to logically isolate all components.
  • Deployments: One for each microservice pod (Frontend, API, Concordance, Local Retriever, Drive Retriever, MongoDB).
  • Services: A ClusterIP service for each backend deployment to enable internal communication.
  • PersistentVolumeClaim: To request storage for the Local Files RAG source and for MongoDB data.
  • Ingress: A single Ingress resource to manage external traffic, routing to the Frontend and API services.
  • Secrets: To store the Gemini API key, database credentials, and OAuth client secret.

4.2. Containerization

Each microservice will have its own Dockerfile.

  • Frontend: A multi-stage Docker build that first uses a Node image to run ng build, then copies the resulting /dist folder into a lightweight Nginx image.
  • Backend Services: Will use official Node.js and Python slim base images.

5. DevSecOps Pipeline

The development lifecycle will be managed by a CI/CD pipeline.

Stage

Key Process

Tools

Plan & Design

Define user stories, design APIs.

Jira, Git, Mermaid

Code & Develop

Write feature code and unit tests.

VS Code, Jest, PyTest

Build & Integrate (CI)

On Git push, auto-build and run tests.

Cloud Build, GitHub Actions

Secure & Push

Scan Docker images for vulnerabilities before pushing.

Artifact Analysis, Snyk

Deploy (CD)

On successful push, auto-deploy to GKE via kubectl apply.

Cloud Build, Argo CD

Operate & Monitor

Collect logs, metrics, and traces. Set up alerts.

Google Cloud Logging/Monitoring

Feedback & Iterate

Analyze data to create new stories.

Analytics, User Feedback


This table outlines a mature toolchain for implementing the DevSecOps pipeline we designed.

#

Category

Description

Recommended Tools

Primary Vendor / OSS

1

CI/CD Integration

The central platform that orchestrates the entire pipeline, from code commit to deployment.

Google Cloud Build, GitHub Actions

Google / Microsoft

2

Software Composition Analysis (SCA)

Scans application dependencies (e.g., from package.json, requirements.txt) for known vulnerabilities.

Snyk, Dependabot (GitHub), Google Artifact Analysis (on container push)

Snyk / Microsoft / Google

3

Static Application Security Testing (SAST)

Analyzes the application's source code without executing it to find security flaws like SQL injection, hardcoded secrets, etc.

CodeQL (GitHub), SonarQube, Snyk Code

Microsoft / SonarSource / Snyk

4

Dynamic Application Security Testing (DAST)

Tests the *running* application by sending malicious-looking requests to find vulnerabilities like Cross-Site Scripting (XSS). Often run in a staging environment.

OWASP ZAP, Burp Suite, Invicti

OWASP (OSS) / PortSwigger / Invicti

5

Container Security Scanning

Scans the final Docker images for vulnerabilities in the base OS layers and system libraries.

Google Artifact Analysis, Trivy, Clair

Google / Aqua Security (OSS) / Quay (OSS)

6

Infrastructure as Code (IaC) Security

Scans Terraform or other IaC files for security misconfigurations before they are applied to the cloud environment.

Checkov, tfsec, KICS

Palo Alto Networks (OSS) / Aqua Security (OSS) / Checkmarx (OSS)

7

Vulnerability Management

A centralized dashboard for tracking, triaging, and managing all vulnerabilities found across the various scanning stages.

Google Security Command Center, DefectDojo, Kenna Security

Google / OWASP (OSS) / Cisco

8

Threat Modeling

A procedural process, not a single tool, for proactively identifying and mitigating potential security threats during the design phase.

Diagramming Tools (Mermaid, Lucidchart), STRIDE methodology, OWASP Threat Dragon

N/A (Process)

9

Observability and Monitoring

The collection, analysis, and visualization of logs, metrics, and traces from the live application to detect issues and understand performance.

Google Cloud's operations suite (Cloud Logging, Monitoring, Trace), Prometheus, Grafana, Datadog

Google / CNCF (OSS) / Datadog

Description

For Higher Resolution View Click the link below

URL Link

DevSecOps Pipeline.png

8. Integration Touchpoint Details

This table provides a granular view of every single "arrow" on our architecture diagram, detailing the nature of each connection.

#

Source Touchpoint Name

Source Touchpoint Description

Source Hosting Location

Destination Touchpoint Name

Destination Touchpoint Description

Destination Hosting Location

Integration Mode

Average Frequency

Trigger Direction

Integration Channel / Protocol

Message Format

Typical Peak Message Length

Message Acknowledgement (Y/N)

1

Angular Frontend

User's browser making an API call

User's PC

Security Gateway

The GKE Ingress/API Gateway endpoint

GKE Cluster

On-Demand

Per User Action

Push

HTTPS / REST

JSON

< 5 KB (query text)

Y (HTTP 200 OK)

2

Security Gateway

The gateway initiating an OIDC login flow

GKE Cluster

External Identity Provider

The login page of Google, Facebook, etc.

Third-Party SaaS

On-Demand

Per User Login

Push (Redirect)

HTTPS / OIDC

HTTP Redirects

N/A

Y (via redirect with auth code)

3

API Pod

Node.js server checking for a cached response

GKE Cluster

Memorystore for Redis

The managed Redis cache instance

GCP Managed Service

On-Demand

Per API Call

Pull

Redis Protocol

Binary

< 50 KB (cached JSON)

Y (protocol-level)

4

API Pod

Node.js server publishing the initial user query

GKE Cluster

Google Cloud Pub/Sub

The 'QUERY_RECEIVED' topic

GCP Managed Service

On-Demand

Per User Query

Push

gRPC / HTTP (via SDK)

JSON (MCP Object)

< 50 KB

Y (API call success)

5

Google Cloud Pub/Sub

Message queue delivering the user query

GCP Managed Service

Concordance Orchestrator Pod

The Python service subscribing to the topic

GKE Cluster

On-Demand (Event-Driven)

Per User Query

Push (Subscription)

gRPC / HTTP (via SDK)

JSON (MCP Object)

< 50 KB

Y (subscriber acknowledges message)

6

Concordance Orchestrator Pod

Orchestrator publishing retrieval jobs

GKE Cluster

Google Cloud Pub/Sub

The 'RETRIEVAL_JOB_DISPATCHED' topic

GCP Managed Service

On-Demand

Per User Query (x2)

Push

gRPC / HTTP (via SDK)

JSON (A2A Request)

< 5 KB

Y (API call success)

7

RAG Retriever Pods

Specialized agents fetching their assigned jobs

GKE Cluster

Google Cloud Pub/Sub

The 'RETRIEVAL_JOB_DISPATCHED' topic

GCP Managed Service

On-Demand (Event-Driven)

Per User Query

Pull (Subscription)

gRPC / HTTP (via SDK)

JSON (A2A Request)

< 5 KB

Y (subscriber acknowledges message)

8

Local File Retriever Pod

Agent reading from its mounted disk

GKE Cluster

Persistent Volume Claim

The mounted local file directory

GKE Cluster

On-Demand

Per Retrieval Job

Pull

File System I/O (POSIX)

Binary / Text

< 2 MB (per file read)

N/A

9

Google Drive Retriever Pod

Agent downloading a file from Drive

GKE Cluster

Google Drive API

The API endpoint for fetching file content

GCP Managed Service

On-Demand

Per Retrieval Job

Pull

HTTPS / REST (OAuth 2.0)

Binary / Text

< 2 MB (per file read)

Y (HTTP 200 OK)

10

RAG Retriever Pods

Agents generating embeddings for text chunks

GKE Cluster

Vertex AI Embedding Model

The managed embedding API endpoint

GCP Managed Service

On-Demand

On Data Ingestion & Per Query

Push

gRPC / REST

JSON

< 100 KB (batch of chunks)

Y (API call success)

11

RAG Retriever Pods

Agents querying for similar vectors

GKE Cluster

Vertex AI Vector Search

The managed vector search index endpoint

GCP Managed Service

On-Demand

Per Retrieval Job

Push

gRPC / REST

JSON / Vector Format

< 20 KB

Y (API call success)

12

RAG Retriever Pods

Agents updating their long-term memory

GKE Cluster

Agentic Memory DB (MongoDB)

The database storing retrieval metadata

GKE Cluster

On-Demand

Per Retrieval Job

Push

MongoDB Wire Protocol

BSON (Binary JSON)

< 10 KB

Y (write acknowledgement)

13

RAG Retriever Pods

Agents publishing their results

GKE Cluster

Google Cloud Pub/Sub

The 'CONTEXT_RETRIEVED' topic

GCP Managed Service

On-Demand

Per Retrieval Job

Push

gRPC / HTTP (via SDK)

JSON (A2A Response)

< 100 KB (context chunks)

Y (API call success)

14

Concordance Orchestrator Pod

Orchestrator making a final reasoning call

GKE Cluster

External LLM APIs (Gemini, OpenAI, etc.)

The API endpoints for the various LLMs

GCP / Third-Party SaaS

On-Demand

Per User Query

Push

HTTPS / REST / gRPC

JSON

< 200 KB (rich prompt)

Y (API call success)

15

API Pod

Node.js server subscribing for the final answer

GKE Cluster

Google Cloud Pub/Sub

The 'FINAL_ANSWER_READY' topic

GCP Managed Service

On-Demand (Event-Driven)

Per User Query

Pull (Subscription)

gRPC / HTTP (via SDK)

JSON

< 10 KB

Y (subscriber acknowledges message)

16

API Pod

Node.js server pushing the answer to the client

GKE Cluster

Angular Frontend

The user's active browser session

User's PC

On-Demand (Event-Driven)

Per Final Answer

Push

WebSockets

JSON

< 10 KB

N (fire-and-forget, though TCP ensures delivery)

9. Scalability, Resilience, and Security

  • Scalability: Each microservice deployment can be scaled independently by increasing its replica count in the Kubernetes manifest (kubectl scale deployment...).
  • Resilience: GKE automatically handles pod failures by restarting them. The microservice architecture ensures that a failure in one component (e.g., the Drive Retriever) does not bring down the entire system.
  • Security:
  • Authentication is centralized at the Security Gateway.
  • All sensitive data is stored in Kubernetes Secrets.
  • Network Policies can be applied within GKE to restrict communication, ensuring, for example, that only the Concordance Engine can call the Retriever Agents.
  • Regular container scanning and dependency updates are enforced by the CI/CD pipeline.

10. Future Considerations (v2.0)

  • Adding New RAG Agents: The architecture supports this by simply creating a new retriever microservice and updating the Concordance Engine's configuration to call it.
  • Caching: A Redis cache could be added between the API Tier and the Concordance Engine to cache responses for common queries.
  • Advanced State Management: For more complex, multi-turn interactions, the MongoDB schema could be enhanced to store a more detailed "agent state" for each session.

11.Infrastructure Architecture  

This diagram helps to visually depict the target state infrastructure architecture for a scalable production environment

   

Author: Shankar Santhamoorthy