// AI NATIVE STACK

AI Native › AI Agent › Agent Framework › LangChain

CRASH COURSE · AI-NATIVE · intermediate · 28 min read · v1.0

LangChain v1 — the standard framework for building agents.

agent-framework ai-native langchain agents llm python

TL;DR — LangChain is the glue between your code and any LLM. As of v1.0 (GA, Oct 2025) it stopped being a grab-bag of chains and became an agent framework: one function, create_agent(), builds a production agent on the LangGraph runtime, and a middleware system lets you control every step of the loop. Legacy chains moved to langchain-classic. This is the crash course for the v1 era.

What LangChain is

LangChain is an open-source framework for building applications on top of large language models. It gives you one consistent interface over every model provider (OpenAI, Anthropic, Google, AWS Bedrock, local models via Ollama) plus the building blocks an LLM app actually needs — messages, prompts, tools, structured output, retrieval, memory — and a runtime to run agents reliably.

In the AI Native landscape it sits in AI Agent › Agent Framework: the application layer that turns a raw model endpoint into something that retrieves context, calls tools, loops, and makes decisions.

The mental model is three layers:

Your app create_agent(...) · invoke / stream · your tools, prompts, schemas LangChain abstractions Runnable interface · messages & content_blocks · @tool · middleware structured output · retrieval · memory — built on the LangGraph runtime OpenAI Anthropic Google · Bedrock · Ollama

Fig 1 — You write against LangChain's interfaces; the provider underneath is swappable.

What changed in v1 (read this first)

If you learned LangChain in the 0.x days, almost everything you remember about chains is now legacy. The v1.0 release (October 2025) is the first stable major version — a commitment to no breaking changes until 2.0 — and it re-centered the whole framework on agents. The headlines:

  • create_agent() is the new front door. It replaces langgraph.prebuilt.create_react_agent and the old AgentExecutor. One call gives you a tool-calling agent on the LangGraph runtime.
  • Middleware. Six hook points around the agent loop let you inject retries, summarization, PII redaction, human-in-the-loop, guardrails — without rewriting the agent.
  • Standard content blocks. message.content_blocks gives a provider-agnostic view of text, reasoning, tool calls, and images.
  • Structured output in the main loop. The agent can return a typed object without a second LLM call.
  • Clean namespace. Core imports live under langchain.*; deprecated chains/retrievers/indexing moved to langchain-classic.
migration note Old tutorials import LLMChain, ConversationChain, RetrievalQA, initialize_agent. Those still exist — in langchain-classic (pip install langchain-classic). For anything new, use create_agent + middleware instead.

Install & setup

Install the core package plus the provider integration you want. New model names work without upgrading LangChain because provider packages pass the name straight through.

# core framework + a provider
pip install langchain langchain-openai
# or: langchain-anthropic, langchain-google-genai, langchain-aws, langchain-ollama

export OPENAI_API_KEY=sk-...

Models — the core primitive

Everything bottoms out in a chat model. init_chat_model builds one for any provider from a single string:

from langchain.chat_models import init_chat_model

model = init_chat_model("gpt-5.4", model_provider="openai")
# shorthand also works: init_chat_model("openai:gpt-5.4")

resp = model.invoke("Why do parrots talk?")
print(resp.text)            # convenience accessor for plain text

Every model supports the same four verbs — invoke, stream, batch, and their async a* twins:

for chunk in model.stream("Explain vector databases"):
    print(chunk.text, end="", flush=True)

answers = model.batch(["What is RAG?", "What is an embedding?"])

Pass conversation history as a list of role/content dicts (or LangChain message objects):

resp = model.invoke([
    {"role": "system", "content": "Translate English to French."},
    {"role": "user",   "content": "I love building applications."},
])

Common knobs: temperature, max_tokens, timeout, max_retries (default 6).

Content blocks

Different providers return text, reasoning traces, and images in different shapes. v1's .content_blocks normalizes them so your code is provider-agnostic:

resp = model.invoke("Think step by step, then answer.")
for block in resp.content_blocks:
    if block["type"] == "reasoning":
        print("THINKING:", block["reasoning"])
    elif block["type"] == "text":
        print("ANSWER:", block["text"])

LCEL — composing with the pipe

The LangChain Expression Language is still the backbone for non-agent pipelines. Every component — prompts, models, parsers, retrievers — implements the Runnable interface (invoke/batch/stream), and the | operator wires them into one Runnable that inherits sync, async, batch, and streaming for free.

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

prompt = ChatPromptTemplate.from_template("Explain {topic} in one sentence.")
chain  = prompt | model | StrOutputParser()

print(chain.invoke({"topic": "vector databases"}))

Reach for LCEL when you have a fixed sequence of steps (prompt → model → parse, or a retrieval chain). Reach for create_agent when the model needs to decide what to do next.

Structured output

Stop regex-parsing model text. with_structured_output binds a Pydantic schema and gives you a validated object back:

from pydantic import BaseModel, Field

class Movie(BaseModel):
    title: str  = Field(description="Movie title")
    year: int   = Field(description="Release year")
    rating: float = Field(description="Rating out of 10")

structured = model.with_structured_output(Movie)
movie = structured.invoke("Give me details about Inception")
print(movie.title, movie.year)   # -> Inception 2010

Under the hood this uses the provider's native structured-output / tool-calling support. In an agent you get the same thing via response_format (next section) — without a second model call.

Tools & tool calling

A tool is just a Python function the model can choose to call. Decorate it with @tool; the docstring becomes the description the model reads, and the type hints become the argument schema.

from langchain.tools import tool

@tool
def get_weather(location: str) -> str:
    """Get the current weather at a location."""
    return f"It's sunny in {location}."

# bind directly to a model for raw tool-calling:
model_with_tools = model.bind_tools([get_weather])
resp = model_with_tools.invoke("What's the weather in Boston?")
for call in resp.tool_calls:
    print(call["name"], call["args"])   # get_weather {'location': 'Boston'}

bind_tools only asks the model which tool to call — it doesn't run it. The agent loop (next) is what actually executes tools and feeds results back.

Agents — create_agent

This is the headline of v1. create_agent builds a complete tool-calling agent: it calls the model, runs any requested tools, feeds results back, and loops until the model is done — all on the durable LangGraph runtime.

from langchain.agents import create_agent
from langchain.tools import tool

@tool
def search(query: str) -> str:
    """Search the web for information."""
    return f"Results for: {query}"

agent = create_agent(
    model="openai:gpt-5.4",
    tools=[search],
    system_prompt="You are a concise research assistant.",
)

result = agent.invoke(
    {"messages": [{"role": "user", "content": "Find recent news on vLLM"}]}
)
print(result["messages"][-1].content)
before_agent before_model MODELwrap_model_call TOOLSwrap_tool_call after_model after_agent loop until done

Fig 2 — The agent loop and the six middleware hook points around it.

Typed responses with response_format

from pydantic import BaseModel

class Answer(BaseModel):
    summary: str
    confidence: float

agent = create_agent("openai:gpt-5.4", tools=[search], response_format=Answer)
result = agent.invoke({"messages": [{"role": "user", "content": "Summarize AI infra trends"}]})
print(result["structured_response"])   # Answer(summary=..., confidence=...)

Middleware — the v1 superpower

Middleware is how you control the agent loop without forking it. Each piece can hook six points: before_agent, before_model, wrap_model_call, wrap_tool_call, after_model, after_agent (see Fig 2). Write them as decorators for one-off hooks, or as a class when you need several hooks + async.

from langchain.agents.middleware import before_model, wrap_model_call, AgentState, ModelRequest, ModelResponse
from langgraph.runtime import Runtime
from typing import Callable, Any

@before_model
def log_turn(state: AgentState, runtime: Runtime) -> dict[str, Any] | None:
    print(f"about to call model with {len(state['messages'])} messages")
    return None

@wrap_model_call
def retry(request: ModelRequest, handler: Callable[[ModelRequest], ModelResponse]) -> ModelResponse:
    for attempt in range(3):
        try:
            return handler(request)
        except Exception:
            if attempt == 2:
                raise

agent = create_agent("openai:gpt-5.4", tools=[search], middleware=[log_turn, retry])

And the batteries-included middleware you'll reach for constantly:

from langchain.agents.middleware import (
    SummarizationMiddleware,     # compress long histories to fit context
    HumanInTheLoopMiddleware,    # pause for approval before risky tools
    PIIMiddleware,               # redact emails / cards / secrets
)

agent = create_agent(
    "openai:gpt-5.4",
    tools=[search],
    middleware=[SummarizationMiddleware(), PIIMiddleware()],
)
why it matters Pre-v1, "add retries / trim context / get human approval" each meant rewriting your agent graph. Middleware makes them composable, reusable plugins. This is the single biggest reason to be on v1.

Memory & conversation state

Agents are stateless per call. To remember earlier turns, give the agent a checkpointer and pass a stable thread_id. The runtime persists the message history per thread.

from langgraph.checkpoint.memory import InMemorySaver

agent = create_agent("openai:gpt-5.4", tools=[], checkpointer=InMemorySaver())
config = {"configurable": {"thread_id": "user-42"}}

agent.invoke({"messages": [{"role": "user", "content": "My name is Shivam."}]}, config=config)
r = agent.invoke({"messages": [{"role": "user", "content": "What's my name?"}]}, config=config)
print(r["messages"][-1].content)   # -> "Your name is Shivam."

Swap InMemorySaver for a Postgres/Redis checkpointer in production so state survives restarts. For long-term, cross-session memory across threads, dedicated stores like Mem0 or LangGraph's store layer take over.

Retrieval & RAG

The classic recipe: embed your documents, store the vectors, fetch the relevant ones at query time, stuff them into the prompt.

from langchain.embeddings import init_embeddings
from langchain_core.vectorstores import InMemoryVectorStore

emb   = init_embeddings("openai:text-embedding-3-small")
store = InMemoryVectorStore.from_texts(
    ["LangChain standardizes LLM apps.",
     "LangGraph adds durable, stateful agent runtime."],
    embedding=emb,
)
retriever = store.as_retriever(search_kwargs={"k": 2})
docs = retriever.invoke("what runtime do agents use?")

In v1 the idiomatic pattern is agentic RAG — wrap retrieval as a tool and let the agent decide when to search, instead of hard-wiring a retrieve-then-answer chain:

from langchain.tools import tool
from langchain.agents import create_agent

@tool
def search_docs(query: str) -> str:
    """Search the internal knowledge base."""
    hits = retriever.invoke(query)
    return "\n\n".join(d.page_content for d in hits)

agent = create_agent("openai:gpt-5.4", tools=[search_docs],
                     system_prompt="Answer using the knowledge base. Cite what you used.")
print(agent.invoke({"messages":[{"role":"user","content":"What runtime do agents use?"}]})["messages"][-1].content)

For real corpora swap InMemoryVectorStore for a real engine — Milvus, Qdrant, pgvector, Weaviate — and add a text splitter to chunk documents before embedding.

Streaming

Agents stream too. stream_mode="values" emits the full state after each step; "updates" emits just the deltas; "messages" streams tokens as they generate.

from langchain.messages import AIMessage, HumanMessage

for chunk in agent.stream(
    {"messages": [{"role": "user", "content": "Search vLLM news and summarize"}]},
    stream_mode="values",
):
    msg = chunk["messages"][-1]
    if isinstance(msg, AIMessage) and msg.tool_calls:
        print("calling:", [tc["name"] for tc in msg.tool_calls])
    elif isinstance(msg, AIMessage):
        print("agent:", msg.content)

Observability — LangSmith

Agents are non-deterministic and multi-step, so "it gave a weird answer" is hard to debug blind. LangSmith traces every model call, tool call, token count, and latency. It's opt-in via env vars — no code change:

export LANGSMITH_TRACING=true
export LANGSMITH_API_KEY=ls-...

You get a full waterfall of each run, plus evaluation datasets and prompt versioning. It's a separate hosted product, but the tracing SDK is free for solo use and the single highest-leverage thing to turn on early.

The ecosystem around it

Package / productWhat it is
langchainThe framework: create_agent, models, tools, middleware.
langchain-coreRunnable interface, messages, base abstractions. Tiny, stable.
langchain-<provider>Integrations (-openai, -anthropic, …). Versioned separately.
LangGraphThe low-level runtime under create_agent. Drop to it for custom graphs, branching, multi-agent.
langchain-classicLegacy chains, old retrievers, indexing API. For migrations only.
LangSmithTracing, evals, monitoring. The observability layer.

When to use, when to skip

Use it when you're building an agent — something that calls tools, loops, retrieves, and needs guardrails — and want provider portability plus a huge integration ecosystem. create_agent + middleware is the fastest path to a production-shaped agent.

Skip it for a single dumb completion (the provider SDK is enough). When you outgrow the linear agent loop — complex branching, multiple coordinating agents, explicit state machines — drop down to LangGraph directly. Teams wanting a smaller, strictly-typed surface sometimes prefer Pydantic AI; RAG-first apps sometimes prefer LlamaIndex.

production gotchas Pin versions — provider packages move fast. Prefer langchain-core + a specific provider over the meta-package to keep your dependency surface small. Turn on LangSmith tracing on day one. Use a durable checkpointer (Postgres/Redis), not InMemorySaver, in prod. And don't reach for an agent when an LCEL chain (or a plain function) would do — the loop costs tokens and latency.

vs the alternatives

ToolBest forTrade-off
LangChain v1General agents, middleware, integrations, portabilityBig surface; ecosystem churn
LangGraphCustom graphs, branching, multi-agent, max controlMore to wire by hand
LlamaIndexRAG / data-indexing-first appsLighter agent tooling
Pydantic AIType-safe, minimal, Pythonic agentsSmaller ecosystem
CrewAI / AutoGenOpinionated multi-agent collaborationLess low-level control
Raw provider SDKSingle calls, total controlYou build all the plumbing

Verified against the official LangChain v1 docs (docs.langchain.com), May 2026. APIs shown target langchain >= 1.0.

← AI Native Stack
© cvam — written in plaintext, served warm