The build behind NetPulse AI

Architecture, data, and what it took to ship — from the four-agent SequentialAgent down to the partition keys on the BigQuery table.

phase Top 100 · refinement due 2026-04-30 stack ADK · Vertex · AlloyDB · BigQuery code github / hackathon-telecom-ops
/ 01 — About

About NetPulse AI

NetPulse AI is a multi-agent telecom operations assistant built for the APAC GenAI Academy 2026 hackathon. A natural-language complaint goes in; a structured incident ticket comes out — complete with the related network events the operator should know about, the CDR findings that back up the customer’s account of what happened, and a recommended NOC action plan.

/ 02 — Architecture

Architecture in detail

The core ADK package telecom_ops exposes a SequentialAgent that runs four LlmAgent sub-agents in order: classifier, network investigator, CDR analyzer, response formatter. Each one reads the shared session state, queries its data store through the MCP toolbox, and enriches the state for the next. The fourth writes a structured ticket to AlloyDB and the run ends.

NetPulse AI architecture diagram
User → UI → SequentialAgent (4 LlmAgents on Gemini) → MCP Toolbox → BigQuery + AlloyDB

Why a SequentialAgent

The four steps have strict data dependencies — the network investigator can’t run until the classifier has tagged a region; the CDR analyzer joins on the time window the network investigator returned; the formatter needs all three to write a coherent ticket. A SequentialAgent models this as code rather than encoding it into a prompt, so the dependency stays correct under prompt churn.

Why MCP Toolbox in front of the data stores

Direct BigQuery MCP endpoints returned 403 / connection-closed on Cloud Run during early integration. The toolbox-as-intermediary pattern works reliably and gives one place to evolve tool definitions without redeploying the agent service.

Why both BigQuery and AlloyDB

All-BigQuery makes ticket writes/updates painful — no transactions, latency wrong for an interactive UI. All-AlloyDB makes event scans expensive, forces scaling decisions for telemetry that doesn’t need a relational store, and loses the cheap historical depth. Showing two services used for the right reasons mirrors how telecom ops actually splits analytical investigation from operational record-keeping.

BIGQUERY · WHY
Append-only telemetry, bursty diagnostic scans
  • Network telemetry is the textbook OLAP case — high-volume, append-only, bursty queries that scan wide time windows. Columnar storage + pay-per-query economics fit the “investigate the last N hours” pattern better than keeping AlloyDB hot enough to scan that volume.
  • Serverless — no scaling decisions during the hackathon.
  • Schema flexibility for evolving event shapes; new event types appear all the time in real telco data.
  • Investigation queries are read-only, so BigQuery’s eventual-consistency / batch-load model is fine.
ALLOYDB · WHY
Targeted reads, mutable rows, ACID transactions
  • CDR lookups are targeted — specific MSISDN, narrow time window, joinable with subscriber metadata. That’s point/range access on indexes, OLTP shape, not scans.
  • Tickets need real ACID + frequent state transitions (open → investigating → resolved). BigQuery is the wrong tool for mutable rows.
  • PostgreSQL compatibility gives rich SQL, foreign keys, and clean joins between CDRs and ticket records.
  • Low-latency reads to power the Call Records and Incident Tickets viewer tabs without query-cost anxiety per page load.
/ 03 — Tech stack

Tech stack on Google Cloud

The complete list of moving parts. Everything is Google Cloud (the hackathon track requires it); the LLM-side work is Vertex AI Gemini.

ORCHESTRATION
Google ADK · SequentialAgent
Coordinates four LlmAgents in order, threading session state forward through output_key handoffs.
MODELS
Vertex AI · Gemini
All four agents run on gemini-3.1-flash-lite-preview at the global endpoint, with a 4-attempt model ladder failing over via gemini-3-flash-preview to gemini-2.5-flash under quota pressure.
NETWORK DATA
BigQuery
network_events table — DAY-partitioned on started_at, clustered by (region, severity). 50,000 events across 10 metros, 6-month rolling window.
CDR DATA
AlloyDB · SQL + NL2SQL
call_records served in two tiers: parameterized SQL primary (query_cdr_summary, query_cdr_worst_towers) executes in <2s; query_cdr_nl is the AlloyDB AI NL2SQL fallback for off-script free-form prompts. A structurally read-only role executes either path.
TICKET SINK
AlloyDB Postgres
incident_tickets — append-only writes via the native ADK tool save_incident_ticket. Connection pool refreshed every 5 minutes to dodge silent-death sockets.
TOOL TRANSPORT
MCP Toolbox
Three BigQuery tools (telecom_network_toolset) plus three CDR tools on AlloyDB (cdr_toolset: 2 parameterized SQL + 1 NL2SQL fallback) live in a separate Cloud Run service. Agents reach the data via the toolbox; the toolbox reaches the warehouses via service accounts.
FRONT END
Flask + SSE
netpulse-ui wraps the same root_agent in a hero landing + workspace timeline. Each request runs its own asyncio loop in a worker thread; events stream out incrementally via Server-Sent Events.
DEPLOY
Cloud Run · 2 services
netpulse-ui serves the chat surface; network-toolbox hosts the MCP toolbox. Both built from a Dockerfile in the project root, deployed from main.
/ 04 — Data schema

Data schema three surfaces

Three data surfaces feed the run. Each viewer page describes its schema and filter dimensions in detail.

/ 05 — Bring your own data

Bring your own data in three steps

NetPulse AI ships with seed data for ten Indonesian metros. Adapting to a different telecom dataset is a three-step exercise:

  1. Replace the seed CSVs in docs/seed-data/ with your own network_events.csv and call_records.csv. Keep the column shapes intact — see network events schema and call records schema for column-by-column descriptions.
  2. Re-run scripts/setup_bigquery.py --seed --recreate to drop and rebuild the BQ table with your data, then scripts/setup_alloydb.py --seed to load CDRs into AlloyDB. The --recreate flag is destructive but is the only way to reapply the partition + cluster spec.
  3. Adjust the region whitelist in telecom_ops/tools.py (VALID_REGIONS) and the toolbox config in tools.yaml if your cities diverge from the Indonesian-metro defaults. The agents will pick up the new vocabulary on next deploy.

The agents themselves are dataset-agnostic — what changes is the data, the region whitelist, and the natural-language prompt examples in tools.yaml that ground AlloyDB AI’s NL-to-SQL translation.

/ 06 — Phase history

Phase history over time

NetPulse AI was built through a series of timeboxed phases. Each phase landed in a single PR (in most cases) and was visually verified end-to-end before the next began.

/ 07 — Roadmap

Roadmap next

The hackathon scope is a refined prototype, not a product. These are the next directions if the project continues past 2026-04-30.

NEXT
Custom-schema BYO data
Today, swapping in your own data needs CSV reshaping. Next: a schema-mapping config so any telecom-shaped feed (CDR + events + ticket sink) plugs in without touching code.
SOON
Hourly cron + drift detection
Re-poll BigQuery on an hourly cadence; surface schema drift, row-count anomalies, and unfamiliar regions to the operator before they reach an agent.
MAYBE
React rewrite of the workspace
The current workspace is server-rendered Flask + a single inline JS handler. A SPA would unlock streaming-state inspection (per-agent JSON view, replay, diffing).