Imagine this, you just got done watching the latest Firebase Release Notes and thought to yourself “That is a cool feature, I want to add it to my app!”. You go to your favorite AI editing tool and request that feature to be added into your app and… it implements something very outdated and different from what you requested. Yeah, we could try to prompt again to see if we get a different result but often these LLMs are trained on data that is months if not years old. It will not have the latest information about that new feature that was just released. So, what can we do about this?

The Developer Knowledge MCP Server

We can utilize the Developer Knowledge MCP server to improve our results. The Developer Knowledge MCP server is a gateway to official Google documentation on tools and languages like Firebase, Android, Google Cloud, and more. The Developer Knowledge MCP server contains authoritative up-to-date information straight from Google’s official technical documentation giving you the ability to:

Get technical guidance
Help with coding and implementation
Troubleshooting and debugging the current output
Being able to compare across different products.

Let’s take a look at the results from an inference request using the Developer Knowledge MCP server and without using it.

The example

We are going to use the default ADK agent that is created when creating a new python agent. It looks something like this:

from google.adk.agents.llm_agent import Agent

root_agent = Agent(
    model="gemini-2.5-flash",
    name="root_agent",
    description="A helpful assistant for user questions.",
    instruction="Answer user questions to the best of your knowledge",
)

The knowledge cutoff for this agent is dated January 2025. This means that it likely does not have any information publicly available after that date which includes hybrid inference SDKs (released May 20, 2025). Using this default agent, if we wanted to make a query to understand hybrid inference we may see results like this:

View Default Output

It seems there might be a slight misunderstanding regarding the specific name. As of my last update, there isn't an officially named, standalone SDK called the "Firebase AI Logic SDK" for public use.

However, the concept of building AI logic and performing hybrid inference within a Firebase ecosystem is absolutely possible and encouraged! You can achieve this powerful combination by leveraging various existing Firebase and Google Cloud services.

Let's break down what Hybrid Inference means and how you can implement it using Firebase and Google Cloud components.

What is Hybrid Inference?
Hybrid inference refers to a strategy where parts of a machine learning model's prediction process run on the client-side (e.g., on a mobile device or in a web browser) and other parts run on the server-side (in the cloud). This approach offers several benefits:

Reduced Latency: Critical or simple tasks can be processed instantly on the device without network delays.
Privacy: Sensitive user data can be processed on-device, reducing the need to send it to the cloud.
Offline Capability: Basic functionality can still work when there's no internet connection.
Resource Optimization: Heavy computational tasks or very large models can be offloaded to powerful cloud servers, saving device battery and processing power.
Dynamic Updates: Cloud models can be updated frequently without requiring users to update their app.
Cost Efficiency: Only necessary data is sent to the cloud, potentially reducing data transfer costs.
How to Achieve Hybrid Inference with Firebase & Google Cloud
You can build robust hybrid inference systems by combining the following Firebase and Google Cloud services:

1. On-Device (Client-Side) Inference: With Firebase ML Kit
What it is: Firebase ML Kit provides SDKs for common machine learning tasks on mobile devices (Android and iOS). It allows you to use pre-built APIs (e.g., text recognition, face detection, image labeling) or deploy your own custom TensorFlow Lite models directly to your app.
How to use for Hybrid Inference:
Pre-processing/Filtering: Run a lightweight model on the device to quickly filter data or perform initial processing. For example, check if an image contains a specific object before sending it to the cloud for more detailed analysis.
Basic Prediction: Perform simple, fast inferences that don't require high accuracy or large models.
User Experience: Provide immediate feedback to the user while more complex processing happens in the background.
Offline Support: Ensure core AI features work even without an internet connection.
2. Cloud-Side (Server-Side) Inference: With Cloud Functions for Firebase & Vertex AI
What it is:
Cloud Functions for Firebase: Serverless functions that run in response to events triggered by Firebase features (HTTP requests, database changes, authentication events, etc.).
Vertex AI (Google Cloud): Google's unified machine learning platform that allows you to build, deploy, and scale ML models. It supports various model types (custom models, AutoML models) and provides robust serving capabilities.
How to use for Hybrid Inference:
Complex Models: Deploy larger, more accurate, or computationally intensive models on Vertex AI.
Dynamic Updates: Update your cloud models independently of your app.
Centralized Logic: Manage and scale your AI logic on the server.
Calling Cloud APIs: Cloud Functions can easily call other powerful Google Cloud AI APIs (like Vision AI, Natural Language API, Speech-to-Text, etc.) or external APIs (e.g., large language models like GPT-4 via an API).
Orchestration: Cloud Functions act as the bridge. Your client app sends data to a Cloud Function (via an HTTP request, or by writing to Firestore), which then orchestrates the call to Vertex AI or other cloud services, processes the results, and sends them back to the client or stores them in Firestore.
3. Orchestration and Data Flow: Firebase Realtime Database / Cloud Firestore & Cloud Storage
Cloud Firestore / Realtime Database: Use these NoSQL databases to:
Store input data (e.g., text, metadata about an image) that triggers a Cloud Function.
Store the results of cloud inference, which the client app can then listen to in real-time.
Manage model metadata or configuration.
Cloud Storage for Firebase: For handling large binary files like images, videos, or audio that need to be processed by cloud models. The client uploads to Storage, and a Cloud Function triggered by the upload can then process it.
4. Development Workflow: GenKit
GenKit (a newer open-source framework by Google) is designed to help developers build production-ready AI applications, particularly those leveraging Large Language Models (LLMs). While GenKit itself runs locally for development and is deployed server-side (often as a Cloud Function), it helps you orchestrate the "AI logic" of your application, which could involve making decisions about when to use on-device vs. cloud models, defining tool use, and managing prompts.
Example Scenario: Hybrid Image Analysis App
Let's imagine you're building an app that analyzes user-uploaded photos:

User Interaction (Client-side with ML Kit):

A user takes a photo.
On-device ML Kit immediately runs a lightweight model to detect if a human face is present in the image and blurs it for privacy if desired (low latency, privacy).
It also quickly checks for basic content categories (e.g., "outdoor scene," "food") using a small on-device model.
Upload & Trigger (Firebase Storage & Cloud Functions):

If the user approves, the blurred image is uploaded to Firebase Cloud Storage.
An HTTP-triggered Cloud Function (or a function triggered by a Storage event) is invoked.
Advanced Analysis (Cloud Functions & Vertex AI):

The Cloud Function downloads the image from Storage.
It then calls a more powerful, custom model deployed on Vertex AI to perform detailed analysis:
Advanced object recognition (e.g., "brand of clothing," "breed of dog").
Sentiment analysis of any text detected in the image.
Image captioning.
The Cloud Function can also integrate with other Google Cloud AI APIs (e.g., Vision AI for landmark detection).
Result Delivery (Cloud Firestore):

The Cloud Function stores the rich analysis results (e.g., JSON object with identified objects, captions, etc.) in Cloud Firestore.
The client app, subscribed to changes in Firestore, instantly receives these detailed results and displays them to the user.
In summary, while there isn't a single "Firebase AI Logic SDK," you can absolutely build sophisticated AI-powered applications using a hybrid inference approach by combining Firebase ML Kit for on-device smarts with Cloud Functions for Firebase and Vertex AI for powerful, scalable cloud-based machine learning.

The knowledge cutoff references older Firebase libraries like ML kit and does not mention anything recent about AI Logics hybrid SDKs.

Results with Developer Knowledge MCP

By adding the developer knowledge MCP server, we can get much better results. We can add a toolset that just ingests all available tools that are provided by the MCP server like so:

from google.adk.agents import Agent
from google.adk.tools.mcp_tool import McpToolset
from google.adk.tools.mcp_tool.mcp_session_manager import StreamableHTTPConnectionParams

DEVELOPER_KNOWLEDGE_API_KEY = "YOUR_DEVELOPER_KNOWLEDGE_API_KEY"

root_agent = Agent(
    model="gemini-2.5-flash",
    name="root_agent",
    instruction="Answer user questions to the best of your knowledge.",
    description="A helpful assistant for user questions.",
    tools=[
        McpToolset(
            connection_params=StreamableHTTPConnectionParams(
                url="https://developerknowledge.googleapis.com/mcp",
                headers={"X-Goog-Api-Key": DEVELOPER_KNOWLEDGE_API_KEY},
            ),
        )
    ],
)

And once again, we will send the same exact query to the service to inspect the results.

View Output

The Firebase AI Logic SDK enables you to build AI-powered applications with "hybrid inference," meaning your app can leverage on-device AI models when available and seamlessly fall back to cloud-hosted models otherwise. This offers a balance of performance, cost-efficiency, and availability.

Here's a breakdown of the Firebase AI Logic SDK and how to use hybrid inference:

What is the Firebase AI Logic SDK?

The Firebase AI Logic SDK is a tool designed to integrate AI capabilities into your web and Android applications. It's currently available as a "Preview" feature for Web and "Experimental" for Android, meaning it's still under development and may undergo changes.

Key Benefits of Hybrid Inference:

Enhanced Privacy: On-device models process sensitive data locally, keeping it off the cloud.
Local Context: Models can leverage local device data for more relevant responses.
Cost-Free Inference: On-device inference doesn't incur cloud costs.
Offline Functionality: AI features can still be used even without an internet connection.
Wider Audience Reach: By intelligently switching between on-device and cloud models, you can ensure your AI features are accessible to more users, regardless of device capabilities or internet connectivity.
How to Use Hybrid Inference:

The core concept of hybrid inference lies in configuring the SDK to prioritize either on-device or cloud inference, or to explicitly use only one.

Supported Platforms and Underlying APIs:

Web Apps: On-device inference is supported for web apps running on Chrome on Desktop, utilizing the [Prompt API from Chrome](https://developer.chrome.com/docs/extensions/ai/prompt-api). Cloud-hosted inference uses your chosen Gemini API provider (Gemini Developer API or Vertex AI Gemini API).
Android Apps: On-device inference is supported on [specific Android devices](https://developers.google.com/ml-kit/genai#prompt-device) and uses the [Prompt API from ML Kit](https://developers.google.com/ml-kit/genai/prompt/android). Cloud-hosted inference also uses a Gemini API provider.
Getting Started (General Steps):

Create a Firebase Project: Set up a Firebase project and register your web or Android application.
Install the SDK: Install the Firebase SDK in your project (e.g., npm install firebase for web).
Initialize Firebase: Initialize Firebase in your application.
Inference Modes:

The Firebase AI Logic SDK provides several inference modes to control how hybrid inference behaves:

PREFER_ON_DEVICE (Default in examples): This mode attempts to use the on-device model first. If the on-device model is unavailable or doesn't support the request type, it will log an error on the device and automatically fall back to the cloud-hosted model.
ONLY_ON_DEVICE: This mode strictly attempts to use only the on-device model. If the on-device model is not available or cannot handle the request, it will throw an exception.
PREFER_IN_CLOUD: This mode prioritizes the cloud-hosted model if the device is online and the model is available. If the device is offline, it will fall back to the on-device model. In other failure scenarios, it will throw an exception.
ONLY_IN_CLOUD: This mode attempts to use only the cloud-hosted model if the device is online and the model is available. Otherwise, it will throw an exception.
Supported Capabilities for On-Device Inference:

Single-turn text generation (not multi-turn chat).
Streaming or non-streaming output.
Generating text from text-only input.
Generating text from text-and-image input (JPEG and PNG for web, single Bitmap for Android).
Generating structured output (e.g., JSON, enums).
Features Not Yet Available for On-Device Inference (as of the preview/experimental release):

For many of these, if PREFER_ON_DEVICE is set, the SDK will automatically fall back to the cloud-hosted model. However, ONLY_ON_DEVICE will throw an error.

Generating text from image file types other than JPEG and PNG.
Generating text from audio, video, and document (e.g., PDF) inputs.
Generating images using Gemini or Imagen models.
Providing files using URLs in multimodal requests (files must be provided as inline data).
Multi-turn chat.
Bi-directional streaming with the Gemini Live API.
Providing the model with tools (e.g., function calling, code execution, URL context, grounding with Google Search).
Counting tokens (this will always throw an error as counts differ between cloud and on-device models).
AI monitoring in the Firebase console for on-device inference (cloud-based inference can be monitored).
It's important to review the latest documentation for any updates, especially as these features are in preview/experimental stages.

The results

We can see in this result we get higher fidelity responses with more details about the service and it even includes relevant links in its response. This brought more information forward about how to use these modern and recent features. Using the Developer Knowledge MCP server worked really well in this sample and there are even more places to bring it forward. For instance, since we used Gemini we could have easily used the Google Search Tool for Gemini, but not all model providers offer a search experience on top of their models. In these cases the MCP server would really shine and add much more value to the models being used for highly relevant authoritative information. There are many more examples I could point to for using the Developer Knowledge MCP server. If you have used it in a unique way, please let me know on LinkedIn or X.

Bridging the Knowledge Gap: How the Developer Knowledge MCP Server Keeps Your LLM Up-to-Date

The Developer Knowledge MCP Server

The example

Results with Developer Knowledge MCP

The results

Related Content

Location AI: Ground Gemini in Google Maps

Reducing Storage Egress Costs with Cloud CDN and Signed URLs in Dart

Cure AI Wait Anxiety with the London Bus Hack