BigQuery AI Functions Explained: Run Gemini in SQL
A practitioner's guide to BigQuery AI functions: what AI.GENERATE, AI.GENERATE_TABLE, and AI.SIMILARITY do, when to use each, and how context caching cuts cost.
Caleb North is a fictional AI persona, not a real person. This article was written by AI and reviewed by a human editor before publishing. How we work →

Most teams that want to run a model over their data still treat it as a separate system. You export rows, push them to an inference service, get JSON back, and reload it. That round trip is where pipelines rot — the schema drifts, a batch silently fails, and nobody notices until a dashboard looks wrong three days later.
BigQuery's AI functions collapse that loop. The model call becomes a function in your SQL, the results land in a table, and the whole thing runs inside the warehouse you already trust. That's the real pitch here — not "AI in your database" as a slogan, but one fewer moving part to babysit at 3am.
Here's what each function actually does, when to reach for it, and the rough shape of the SQL.
The setup you do once
Before any of the AI functions work, you connect BigQuery to a Gemini model. You create a remote model that points at a Gemini endpoint through a Cloud resource connection. After that, the functions know which model to call.
CREATE OR REPLACE MODEL `myproject.mydataset.gemini`
REMOTE WITH CONNECTION `myproject.us.my_connection`
OPTIONS (endpoint = 'gemini-2.5-flash');
The connection is the part people forget. It needs a service account with permission to call the model, and that service account is exactly the kind of secret that quietly breaks a deploy when someone rotates it. Write down where it lives.
AI.GENERATE: one model call per row
AI.GENERATE is the workhorse. You pass it a prompt — which can reference columns — and it returns a STRUCT containing the generated value, the full model response, and a status field. The status field matters more than the result field on a bad day, because it's how you find the rows that failed without scanning logs.
SELECT
review_id,
AI.GENERATE(
('Summarize this review in one sentence: ', review_text),
connection_id => 'us.my_connection',
endpoint => 'gemini-2.5-flash'
).result AS summary
FROM `myproject.mydataset.reviews`;
Use this for summaries, translation, free-text extraction — anything where you want one generated string per row and you're fine with unstructured output. There are typed cousins (a boolean-returning variant, for example) when you want a clean column type back instead of parsing a string.
The honest caveat: this is a model call per row. Run it across ten million rows without thinking and you'll feel it in both latency and the bill. Filter first. Materialize the result. Don't call it live in a dashboard query.
AI.GENERATE_TABLE: structured output at scale
This is the one I'd point most data teams at. AI.GENERATE_TABLE takes a prompt and an output schema, and returns rows shaped to that schema instead of a blob you have to parse. It went GA recently, and it now accepts mixed inputs — text, images, video, and documents — not just text.
SELECT *
FROM AI.GENERATE_TABLE(
MODEL `myproject.mydataset.gemini`,
(SELECT ticket_text AS prompt FROM `myproject.mydataset.tickets`),
STRUCT('category STRING, urgency INT64, customer_sentiment STRING' AS output_schema)
);
The schema is the whole point. You define category STRING, urgency INT64, sentiment STRING, and you get typed columns back. That's classification, sentiment scoring, field extraction, image captioning, and transcription — all expressed as "here are my rows, here's the shape I want, fill it in."
Two things from the failure-mode side. First, the schema is a request, not a guarantee — the model can still hand you a null or an off-type value, so validate the output table before anything joins against it. Second, because it writes a real table, it slots cleanly into scheduled queries and incremental loads. That's the maintainable pattern: extract once, store, read many times.
AI.SIMILARITY: semantic search without the plumbing
AI.SIMILARITY is the most streamlined way to do semantic search in BigQuery. You hand it two inputs, it generates embeddings for both and returns the cosine similarity as a FLOAT64. Embedding generation and the similarity math happen in one step.
SELECT
doc_id,
AI.SIMILARITY(
doc_text,
'How do I reset my password?',
connection_id => 'us.my_connection'
) AS score
FROM `myproject.mydataset.help_docs`
ORDER BY score DESC
LIMIT 5;
When it's the right tool: ad hoc relevance ranking, dedup, quick "find me rows like this one" work. When it's the wrong tool: a high-traffic search feature over millions of rows. For that, precompute embeddings into a column once and use a vector index — generating embeddings on the fly for every query is the kind of convenience that's lovely in a notebook and ruinous in production.
Context caching: the cost lever you'll actually use
Here's the part that pays for itself. If your prompt includes a large static block — a long reference document, a taxonomy, a set of few-shot examples — you're paying for those input tokens on every single row. Across a big batch, that's most of your cost, and it's pure waste.
Gemini context caching lets you store that static context once and reference it by ID on subsequent calls instead of re-sending it. Google has described savings on the order of a 90% input-token discount on the cached portion for large batch jobs.
The pattern: create the cache, then pass its ID through model_params so each call points at the cached content instead of paying to re-process it. If you're doing few-shot prompting or grounding against a fixed reference set across thousands or millions of rows, this isn't an optimization to do later — it's the difference between a job you can afford to run nightly and one you run once and regret.
When this is the right tool — and when it isn't
Reach for BigQuery AI functions when the data already lives in BigQuery and you want enrichment to be just another query in the same governance, the same scheduling, the same access controls. Batch classification, extraction, summarization, semantic ranking — this is the sweet spot, and keeping it in-warehouse means one fewer service to secure and monitor.
Don't reach for it when you need sub-second interactive responses, tight control over retries and streaming, or model behavior the warehouse abstraction hides from you. Those belong in an application-layer service.
The take: these functions are genuinely good plumbing for the unglamorous middle of a data pipeline. Materialize your results instead of calling models live, validate the output schema before you trust it, and turn on context caching before you run anything at scale. Do those three things and this stays boring — which, for production, is the highest compliment I've got.
Frequently asked questions
Do I need a separate ML pipeline to call Gemini from BigQuery?
No. You create a remote model that points at a Gemini endpoint through a Cloud resource connection, then call the AI functions in normal SQL. The inference runs server-side and results land in your tables.
What's the difference between AI.GENERATE and AI.GENERATE_TABLE?
AI.GENERATE returns a single STRUCT per row with the generated value plus the full response and a status. AI.GENERATE_TABLE writes structured rows that match an output schema you define, so it's the better fit for classification or extraction across many rows.
How does context caching save money?
If you send the same large reference text or few-shot examples on every row, you pay for those input tokens every time. Caching that static context once lets repeated calls reference the cache instead of re-sending it, cutting input token cost substantially on large batch jobs.
Can these functions read images and documents, not just text?
Yes. The current generation of these functions accepts mixed inputs including text, images, video, and documents, so you can caption images or extract fields from scanned files in the same SQL you use for text.
The StackBrief weekly
New reviews and the AI-coding-tool news worth knowing — with our take. One email a week, unsubscribe anytime.
Keep reading

What's New in Google Workspace AI at Cloud Next '26
A practical roundup of Google Workspace AI changes from Cloud Next '26 — what actually changed for everyday Docs, Sheets, Gmail, and Meet users and teams.
June 4, 2026
Gemini Enterprise for Customer Experience, Explained
Gemini Enterprise for Customer Experience: Google's prebuilt Shopping and Food Ordering agents plus an Omnichannel Gateway that holds context across channels.
June 5, 2026
What Is Vertex AI Memory Bank? Agent Memory Explained
A practitioner's guide to Vertex AI Memory Bank: managed long-term memory that lets agents remember preferences and past sessions instead of starting fresh.
June 5, 2026