The Privacy Guardrail: How to Implement Warnings for On-Device LLM Failures

The Privacy Guardrail: How to Implement Warnings for On-Device LLM Failures

Leverage Chrome’s on-device Prompt API to deliver a private, infrastructure-free LLM experience. While on-device models like Gemini Nano offer improved privacy by keeping data local, hybrid AI experiences often require a fallback to a cloud model when local resources are unavailable. This post details how to implement essential transparency warnings and give users a choice to proceed, ensuring reliable functionality while maintaining user trust and preserving privacy as your core value proposition.

Using the on device Prompt API in Chrome offers the ability to have a private LLM experience without the need to spin up your own infrastructure. Having a private experience with an LLM on a device can be limited by the user’s browser choice as well as the device they are choosing to use. In this post we are going to take a look at how to warn users when they may be going to a cloud model instead of using a local model.

On-device inference with Chrome’s Prompt API and React

My colleague Jeff Huleatt wrote an article about the caveats of using the on-device model. Despite all the caveats, one undeniable advantage that the on device model has over any other model is the ability to offer improved privacy. The reason is that the on device model runs on device (its in the name) and there is no need for a network. This also benefits progressive web apps that are installed on a user’s device that offer an offline first experience. Having this capability, paired with other PWA APIs like the file system API can offer unique privacy preserving experiences. In some cases, this means that files never leave the device so you could build a file categorizer on device.

Warnings when you leave the device

My other colleague Cynthia Wang had the clever idea that when you do use hybrid inference and a user is using a different device, you should at least throw a warning. This is especially true if your value proposition is a privacy preserving feature. In Cynthia’s she demonstrates that when you offer a hybrid AI experience (on device and provider model) you should throw a warning in the event that Gemini Nano is not available. You may also want to show a toggle with the toggle greyed out if there is no local model available. This provides reliable fallback for the user if they are okay with sending their data off device but also provides the opportunity to not have the app break entirely if there is no on device model.

Here is the code block in its original form from Cynthia’s blog article.

 useEffect(() => {
const checkNanoAvailability = async () => {
  // Checking availability of the on-device model
  const { languageModelProvider, onDeviceParams } = metadataModel.chromeAdapter;
  if (await languageModelProvider?.availability(onDeviceParams.createOptions) !== "available") {
    console.warn("Gemini Nano is not available. Falling back to cloud model.");
    setShowNanoAlert(true); // IMPORTANT TO WARN USERS
  } else {
    console.log("Gemini Nano is available and ready.");
  }
};
checkNanoAvailability();
}, []);

<>
{showNanoAlert && (
  <div className="modal-overlay">
  <div className="modal-content">
  // Omitting for brevity
  </div>
)}
</> 

Conclusion

If you are using the local on device model and offer a privacy preserving feature, you should always consider throwing a warning when the user may be switching to a provider hosted model. Giving the users the choice whether to proceed with the operation can help ensure user trust and give them transparency into how their data is being processed.

Related Content

firebase • Apr 16, 2026

Implement hybrid inference in Android using Firebase AI Logic

In this deep dive, Nohe explores how to implement the hybrid SDK for Firebase AI Logic on Android. One of the biggest headaches in mobile AI is deciding between a cloud model (reliable but costly) and an on-device model (fast but fragmented). Now, you don't have to choose. With Hybrid Inference, your app can prefer the local model already managed by Android’s AICore and seamlessly fall back to Gemini 3.1 Flash in the cloud if the device isn't compatible.

Watch on YouTube
firebase • Mar 6, 2026

Cure AI Wait Anxiety with the London Bus Hack

Are your users staring at a frozen screen while your massive LLM processes a prompt? In AI UX, a frozen screen means a broken app. Today, we are fixing that by turning your AI's "black box" into a "glass box" using Operational Transparency. In this video, I’ll show you how to eliminate AI wait anxiety by streaming the model's internal "thoughts" directly to your UI in real-time. Drawing on real-world transit psychology from the London Bus system, we dive into the code to show you exactly how to intercept LLM thought signatures using Firebase AI Logic and the Gemini API. Keep your users engaged, build system credibility, and drastically improve your app's user experience.

Watch on YouTube