AI & ML

Building Multi-Agent Systems with Memory: A Developer's Guide to Local Testing

· 5 min read

To support our mission of accelerating the developer journey on Google Cloud, we built Dev Signal: a multi-agent system that transforms raw community signals into reliable technical guidance by automating the path from discovery to expert content creation.

In part 1 and part 2 of this series, we standardized core capabilities through the Model Context Protocol (MCP) and built a multi-agent architecture integrated with the Vertex AI memory bank for long-term intelligence and persistence. Now we'll show you how to test your multi-agent system locally.

To explore the code at your own pace, clone the repository here.

Testing the Agent Locally

Before deploying your agentic system to Google Cloud Run, verify that its specialized components work together on your workstation. This testing phase validates trend discovery, technical grounding, and creative drafting within a local feedback loop, saving time and resources during development.

You'll configure local secrets, implement environment-aware utilities, and use a dedicated test runner to verify that Dev Signal correctly retrieves user preferences from the Vertex AI memory bank in the cloud. This local verification ensures your agent's "brain" and "hands" are properly synchronized before deployment.

Environment Setup

Create a .env file in your project root. These variables are used for local development and will be replaced by Terraform/Secret Manager in production.

Paste this code in dev-signal/.env and update with your own details.

Note: GOOGLE_CLOUD_LOCATION is set as global because that's where Gemini-3-flash-preview is supported. We'll use GOOGLE_CLOUD_LOCATION for the model location.

code_block
<ListValue: [StructValue([('code', '# Google Cloud Configuration\r\nGOOGLE_CLOUD_PROJECT=your-project-id\r\nGOOGLE_CLOUD_LOCATION=global\r\nGOOGLE_CLOUD_REGION=us-central1\r\nGOOGLE_GENAI_USE_VERTEXAI=True\r\nAI_ASSETS_BUCKET=your_bucket_name\r\n\r\n# Reddit API Credentials\r\nREDDIT_CLIENT_ID=your_client_id\r\nREDDIT_CLIENT_SECRET=your_client_secret\r\nREDDIT_USER_AGENT=my-agent/0.1\r\n\r\n# Developer Knowledge API Key\r\nDK_API_KEY=your_api_key'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f0a2179bc40>)])]>

Helper Utilities

Create a new directory for your application utils.

code_block
<ListValue: [StructValue([('code', 'cd dev_signal_agent\r\nmkdir app_utils\r\ncd app_utils'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f0a2179b310>)])]>

Environment Configuration

This module standardizes how the agent discovers the active Google Cloud Project and Region, ensuring seamless transitions between development environments. Using load_dotenv(), the script checks for local configurations before falling back to google.auth.default() or environment variables to retrieve the Project ID. This automated approach ensures your agent is properly authenticated and grounded in the correct cloud context without manual configuration changes.

The script provides a robust secret management layer. It resolves sensitive credentials like Reddit API keys first from the local environment (for rapid development), then dynamically from the Google Cloud Secret Manager API for production security. By returning these as a dictionary rather than injecting them into environment variables, the module maintains a clean security posture.

The script calibrates the environment by distinguishing between global and regional requirements for different AI services. It assigns the "global" location for models to access cutting-edge preview features while designating a regional location like us-central1 for infrastructure like the Vertex AI Agent Engine. By finalizing this setup with a global SDK initialization, the module integrates these settings into the session, allowing the rest of your application to interact with models and memory banks without repeatedly passing project or location parameters.

Paste this code in dev_signal_agent/app_utils/env.py

code_block
<ListValue: [StructValue([('code', 'import os\r\nimport google.auth\r\nimport vertexai\r\nfrom google.cloud import secretmanager\r\nfrom dotenv import load_dotenv\r\n\r\ndef _fetch_secrets(project_id: str):\r\n """Fetch secrets from Secret Manager and return them as a dictionary."""\r\n secrets_to_fetch = ["REDDIT_CLIENT_ID", "REDDIT_CLIENT_SECRET", "REDDIT_USER_AGENT", "DK_API_KEY"]\r\n fetched_secrets = {}\r\n\r\n # First, check local environment (for local development via .env)\r\n for s in secrets_to_fetch:\r\n val = os.getenv(s)\r\n if val:\r\n fetched_secrets[s] = val\r\n\r\n # If keys are missing (common in production), fetch from Secret Manager API\r\n if len(fetched_secrets) < len(secrets_to_fetch):\r\n client = secretmanager.SecretManagerServiceClient()\r\n for secret_id in secrets_to_fetch:\r\n if secret_id not in fetched_secrets:\r\n name = f"projects/{project_id}/secrets/{secret_id}/versions/latest"\r\n try:\r\n response = client.access_secret_version(request={"name": name})\r\n # DO NOT set os.environ[secret_id] here. \r\n # Keep it in this dictionary only.\r\n fetched_secrets[secret_id] = response.payload.data.decode("UTF-8")\r\n except Exception as e:\r\n print(f"Warning: Could not fetch {secret_id} from Secret Manager: {e}")\r\n\r\n return fetched_secrets\r\n\r\ndef init_environment():\r\n """Consolidated environment discovery."""\r\n load_dotenv()\r\n try:\r\n _, project_id = google.auth.default()\r\n except Exception:\r\n project_id = os.getenv("GOOGLE_CLOUD_PROJECT")\r\n \r\n model_location = os.getenv("GOOGLE_CLOUD_LOCATION", "global")\r\n service_location = os.getenv("GOOGLE_CLOUD_REGION", "us-central1")\r\n \r\n secrets = {}\r\n if project_id:\r\n vertexai.init(project=project_id, location=service_location)\r\n # Fetch secrets into a local variable\r\n secrets = _fetch_secrets(project_id)\r\n \r\n return project_id, model_location, service_location, secrets'), ('language', 'lang-py'), ('caption', <wagtail.rich_text.RichText object at 0x7f0a2179b0d0>)])]>

Local Testing Script

The Google ADK includes a built-in Web UI that's excellent for visualizing agent logic and tool composition.

Launch it by running in the project root:

code_block
<ListValue: [StructValue([('code', 'uv run adk web'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f0a2179bc10>)])]>

However, the default Web UI won't test the long-term memory integration described in this tutorial because it's not pre-connected to a Vertex AI memory session. By default, the generic UI relies on in-memory services that don't persist data across sessions. We use the dedicated test_local.py script to explicitly initialize the VertexAiMemoryBankService. This ensures that even in a local environment, your agent communicates with the real cloud-based memory bank to validate preference persistence.

The test_local.py script:

  1. Connects to the real Vertex AI Agent Engine in the cloud for memory storage.

  2. Uses an in-memory session service for local chat history, allowing you to clear it easily between tests.

  3. Runs a chat loop for interactive conversation with your agent.

Navigate back to the root dev-signal folder:

cd ../..

Create a new file at dev-signal/test_local.py and add the following code:

import asyncio
import os
import google.auth
import vertexai
import uuid
from dotenv import load_dotenv
from google.adk.runners import Runner
from google.adk.memory.vertex_ai_memory_bank_service import VertexAiMemoryBankService
from google.adk.sessions import InMemorySessionService
from vertexai import agent_engines
from google.genai import types
from dev_signal_agent.agent import root_agent

# Load environment variables
load_dotenv()

async def main():
    # 1. Setup Configuration
    project_id = os.getenv("GOOGLE_CLOUD_PROJECT")
    # Agent Engine (Memory) MUST use a regional endpoint
    resource_location = "us-central1"
    agent_name = "dev-signal"
    
    print(f"--- Initializing Vertex AI in {resource_location} ---")
    vertexai.init(project=project_id, location=resource_location)

    # 2. Find the Agent Engine Resource for Memory
    existing_agents = list(agent_engines.list(filter=f"display_name={agent_name}"))
    if existing_agents:
        agent_engine = existing_agents[0]
        agent_engine_id = agent_engine.resource_name.split("/")[-1]
        print(f"✅ Using persistent Memory Bank from Agent: {agent_engine_id}")
    else:
        print(f"❌ Error: Agent Engine '{agent_name}' not found. Please deploy with Terraform first.")
        return

    # 3. Initialize Services
    # We use InMemorySessionService for easier local testing (IDs are flexible)
    # BUT we use VertexAiMemoryBankService for REAL cloud persistence
    session_service = InMemorySessionService()
    
    memory_service = VertexAiMemoryBankService(
        project=project_id,
        location=resource_location,
        agent_engine_id=agent_engine_id
    )

    # 4. Create a Runner
    runner = Runner(
        agent=root_agent,
        app_name="dev-signal",
        session_service=session_service,
        memory_service=memory_service 
    )

    # 5. Run a Test Loop
    user_id = "local-tester"
    
    print("\n--- TEST SCENARIO ---")
    print("1. Start a session, tell the agent your preference (e.g., 'write in rhymes').")
    print("2. Type 'new' to start a FRESH session (local state wiped).")
    print("3. Ask for a blog post. The agent should retrieve your preference from the CLOUD memory.")
    
    current_session_id = f"session-{str(uuid.uuid4())[:8]}"
    await session_service.create_session(
        app_name="dev-signal",
        user_id=user_id,
        session_id=current_session_id
    )
    print(f"\n--- Chat Session (ID: {current_session_id}) ---")

    while True:
        user_input = input("\nYou: ")
        
        if user_input.lower() in ["exit", "quit"]:
            break
        
        if user_input.lower() == "new":
            # Simulate starting a completely fresh session
            current_session_id = f"session-{str(uuid.uuid4())[:8]}"
            await session_service.create_session(
                app_name="dev-signal",
                user_id=user_id,
                session_id=current_session_id
            )
            print(f"\n--- Fresh Session Started (ID: {current_session_id}) ---")
            print("(Local history is empty, retrieval must come from Memory Bank)")
            continue

        print("Agent is thinking...")
        async for event in runner.run_async(
            user_id=user_id,
            session_id=current_session_id,
            new_message=types.Content(parts=[types.Part(text=user_input)])
        ):
            if event.content and event.content.parts:
                for part in event.content.parts:
                    if part.text:
                        print(f"Agent: {part.text}")
            
            if event.get_function_calls():
                for fc in event.get_function_calls():
                    print(f"🛠️ Tool Call: {fc.name}")

if __name__ == "__main__":
    asyncio.run(main())

Running the Test

First, authenticate with your Google Cloud Application Default Credentials:

gcloud auth application-default login

Then execute the test script:

uv run test_local.py

Test Scenario

This scenario validates the complete agent lifecycle: discovery and research through multimodal content creation and long-term memory retrieval.

Phase 1: Teaching & Multimodal Creation (Session 1)

Goal: Establish technical context and set a specific stylistic preference.

Discovery

Ask the agent to identify trending Cloud Run topics.

Input: "Find high-engagement questions about AI agents on Cloud Run from the last 21 days."

test1
test2

Research

Direct the agent to perform a deep dive on a specific result.

Input: "Use the GCP Expert to research topic #1."

test3

Personalization

Request a blog post and explicitly define your style preference.

Input: "Draft a blog post based on this research. From now on, I want all my technical blogs written in the style of a 90s Rap Song."

test4

Image Generation

Ask the agent to generate an image illustrating the blog's main concepts using the Nano Banana Pro tool. The image will be saved to your Google Cloud bucket, and you'll receive a path in the format: https://storage.mtls.cloud.google.com/...

tokenoptimization

Phase 2: Long-Term Memory Recall (Session 2)

Goal: Verify the agent recalls preferences across a completely fresh session.

  1. Type new in the console to clear local session history and start fresh.

  2. Query your stored preferences to test the Vertex AI memory bank.

    1. Input: "What are my current topics of interest and what is my preferred blogging style?"

  3. Confirm the agent successfully retrieves your "AI Agents on Cloud Run" interest and "Rap" style from cloud storage.

test5

Final Test: Request a blog post on an unrelated topic—say, "GKE Autopilot"—and verify that the agent automatically generates it in rap format without any explicit prompt.

Summary

This installment focused on validating agent functionality in a local environment before moving to cloud deployment. By configuring local secrets and leveraging environment-aware utilities, we used a dedicated test runner to verify that core reasoning and tool integration work as expected. We confirmed the complete workflow—from Reddit discovery through expert content generation—and validated that the agent correctly retrieves user preferences from the Vertex AI memory bank, even in entirely new sessions.

Want to run the test yourself? Clone the repository and execute the test_local.py script to watch Dev Signal pull your preferences from the Vertex AI memory bank in real time. For a closer look at memory orchestration mechanics, consult this quickstart guide.

In the final part of this series, we'll deploy our prototype as a production service on Google Cloud Run using Terraform for secure infrastructure, and explore the path to production readiness through continuous evaluation and security hardening.

Special thanks to Remigiusz Samborski for his thoughtful review and feedback.

For more content like this, follow me on LinkedIn and X.