Tutorial9 min read · Updated May 15, 2026

How to Add OpenAI to Next.js (2026 Step-by-Step Guide)

Adding OpenAI to Next.js takes three things: an API key, a Server Route that proxies your requests (never call OpenAI from the browser), and a streaming UI pattern that doesn't make users wait 20 seconds for the full response. This guide covers all three, plus the production gotchas — runtime selection, rate limiting, and cost control.

The architecture in one diagram

Browser → your Next.js Route Handler → OpenAI API → stream tokens back through your handler → render incrementally in the browser. The Route Handler exists for one reason: hiding your API key. If you call OpenAI directly from the client, your key shows up in the bundle and every visitor's browser leaks it.

The streaming hop is what makes the UX feel fast. OpenAI sends Server-Sent Events (SSE); Next.js Route Handlers forward them with `new Response(stream)`; the browser reads tokens as they arrive. Users see the first word in ~200ms instead of waiting 20s for a complete answer.

Step 1 — Set up the API key and SDK

Get an API key from platform.openai.com → dashboard → API keys → 'Create new secret key'. Store it as `OPENAI_API_KEY` in `.env.local`. NEVER prefix with `NEXT_PUBLIC_` — that exposes it in client bundles.

Install the official SDK: `npm i openai`. For ergonomic streaming + React hooks, also install Vercel's AI SDK: `npm i ai @ai-sdk/openai`. Most production apps use the AI SDK as the wrapper; it abstracts the streaming protocol and handles error states.

Add the env var to Vercel: `vercel env add OPENAI_API_KEY production` (paste the key when prompted). Or use the Vercel dashboard → Settings → Environment Variables.

Step 2 — Create the Route Handler

Create `app/api/chat/route.ts`. Set `runtime = 'edge'` for the fastest cold-start (Edge functions warm in ~50ms vs 500ms for Node), unless you need Node-only APIs. Use `streamText()` from the AI SDK with the OpenAI provider, return `result.toDataStreamResponse()`.

Pass through the messages from the request body. Validate the input with Zod — `messages: z.array(z.object({ role: z.enum(['user', 'assistant', 'system']), content: z.string().max(10000) }))`. Without validation, anyone can fire huge prompts at your billing.

Set a system prompt that constrains the model to your domain. For a documentation chat, system = 'You only answer questions about our product. Refuse off-topic questions politely.' This is the single biggest lever for output quality — far more than the model choice.

Step 3 — Wire the frontend

Use `useChat()` from `@ai-sdk/react` — it handles streaming state, message history, input control, and submit handling in one hook. Or implement it manually with `fetch` and a `ReadableStream` reader.

Render messages with a 'thinking…' indicator on the most recent assistant message while it's streaming. The AI SDK's `useChat` exposes a `status` field (`submitted`, `streaming`, `ready`, `error`) for this.

Add a textarea + submit button that POSTs to your route handler. Disable the input while a response is streaming. Show the last assistant message at the top of the visible area; auto-scroll on new tokens.

Step 4 — Rate limit aggressively

OpenAI charges by token. Without rate limiting, a malicious user can burn your monthly budget in minutes by hitting your endpoint in a loop with long prompts. Use Upstash Redis + `@upstash/ratelimit` for distributed rate limiting on serverless.

Sensible limits: 20 messages per IP per hour for anonymous users; 100 per hour for signed-in users; 1000 for paying customers. Return HTTP 429 with a `Retry-After` header when exceeded.

Also implement per-user spend caps in your own database — track the token count per user per day, refuse if over the cap. The Upstash rate limit guards against attacks; the spend cap guards against runaway costs from a paying user with a bug.

Production gotchas

Timeouts: Vercel Edge functions cap at 30s; OpenAI requests can run longer with `gpt-4-turbo` on long contexts. Either switch to a Node runtime with `maxDuration: 60` or break long generations into chunks.

Cost: log every token usage to your DB (`promptTokens`, `completionTokens`, `model`). Without this you can't diagnose which feature is burning your budget. The AI SDK exposes usage on the final stream event.

Caching: identical prompts can be cached if your use case allows it. Use the OpenAI prompt-caching feature (auto-cached at 1024+ tokens of system prompt) plus your own response cache for deterministic queries.

Streaming + Suspense: Next.js Suspense boundaries can interfere with streaming responses. Render the chat UI as a Client Component to avoid suspense boundaries swallowing intermediate tokens.

How to do it

  1. 1

    Get an OpenAI API key

    platform.openai.com → API keys → Create new secret key. Store as OPENAI_API_KEY in .env.local — never prefix with NEXT_PUBLIC_.

  2. 2

    Install the SDKs

    npm i openai ai @ai-sdk/openai @ai-sdk/react. The AI SDK from Vercel wraps OpenAI's streaming protocol and ships React hooks.

  3. 3

    Create app/api/chat/route.ts

    Set runtime = 'edge'. Use streamText() with the OpenAI provider. Return result.toDataStreamResponse(). Validate messages with Zod before forwarding.

  4. 4

    Build the chat UI client component

    Use useChat() from @ai-sdk/react — it returns messages, input, handleSubmit, status. Render messages with a 'streaming' indicator on the in-flight one.

  5. 5

    Add rate limiting

    Wrap your route handler with @upstash/ratelimit. 20 req/hour anonymous, 100 signed-in. Without this, abuse will burn your billing.

  6. 6

    Deploy + monitor token usage

    Push to Vercel. Log usage.promptTokens + usage.completionTokens per request to your DB. Set up a Stripe alert at your budget threshold.

Frequently asked questions

Should I use the OpenAI SDK or Vercel's AI SDK?

Both. The OpenAI SDK is the official client. The AI SDK is Vercel's wrapper that adds streaming primitives + React hooks. Use AI SDK as the entry point in most apps; drop down to the OpenAI SDK directly only for features the AI SDK doesn't expose (e.g., specific moderation endpoints).

Edge runtime or Node runtime?

Edge is faster (50ms cold start vs 500ms) and cheaper, but limits you to 30s execution time and lacks Node APIs (fs, child_process). For pure OpenAI chat, Edge is the right default. Use Node when you need long-running generations (>30s) or to call other server-only libraries during the request.

Which model should I default to?

gpt-4o-mini for cost-sensitive use cases (~95% as good as gpt-4o at 1/10th the cost). gpt-4o for highest quality. gpt-4-turbo for very long context windows. Choose based on quality requirements + budget. The AI SDK makes swapping a one-line change.

How do I prevent abuse?

Three layers: (1) IP-based rate limit at the edge (Upstash); (2) per-user spend cap in your DB; (3) input length cap (reject prompts > 10k chars). Without all three you'll get burned the first time a bad actor finds your endpoint.

Ready to build?

Try InBuild for free — describe what you want, get a complete site in 30 seconds, export the code anytime.

Start free

More guides