Malecu | Custom AI Solutions for Business Growth

Designing Multi‑Agent Workflows with LangGraph and CrewAI: Patterns, Memory, and Tooling

16 min read

Designing Multi‑Agent Workflows with LangGraph and CrewAI: Patterns, Memory, and Tooling

Designing Multi‑Agent Workflows with LangGraph and CrewAI: Patterns, Memory, and Tooling

Designing multi-agent workflows is moving from experimental to essential. With business tasks spanning research, reasoning, data retrieval, decision-making, and execution, a single large language model (LLM) struggle to stay coherent, consistent, and cost-effective across long chains of work. Enter multi-agent systems: specialized, collaborating agents with scoped roles, shared state, and robust tooling.

This pillar guide gives you the end-to-end blueprint to design, build, and scale multi-agent workflows using LangGraph and CrewAI. You’ll learn proven patterns for coordination and memory, practical tooling strategies, and how to evaluate, secure, and operate your agents in production.

If you’re still surveying the agent framework landscape, start with our high-level overview: Agent Frameworks & Orchestration: A Complete Guide. For side-by-side tradeoffs across the most popular stacks, see our in-depth comparison, LangChain vs LangGraph vs AutoGen vs CrewAI: Which Agent Framework Should You Use in 2026?.

Table of contents

  • What are multi-agent workflows?
  • When to use multi-agent systems vs. single agents
  • Core orchestration patterns and communication topologies
  • LangGraph tutorial: Graph-driven agents and state machines
  • CrewAI agents: Roles, tasks, and collaborative processes
  • Memory architectures: Short-term, long-term, and shared context
  • Tooling strategies: Retrieval, function calling, and external systems
  • Reliability, evaluation, and observability
  • Safety, governance, and human-in-the-loop controls
  • Performance, cost, and scaling strategies
  • Example project: Research-to-content pipeline across LangGraph and CrewAI
  • Implementation checklist and next steps

What Are Multi‑Agent Workflows?

Multi-agent workflows are systems where two or more specialized agents coordinate to achieve a goal. Each agent is optimized for a role—planner, researcher, analyst, reviewer, or executor—while the workflow manages how they communicate, share memory, call tools, and decide when to stop.

In practice, multi-agent workflows help you:

  • Decompose complex tasks into focused roles with clearer prompts and smaller context windows.
  • Improve reliability through checks, reviews, and redundancy.
  • Reduce cost by routing subtasks to cheaper models and minimizing repeated reasoning.

Well-structured agent teams outperform single, monolithic prompts when the work spans multiple competencies, requires staged validation, or needs iterative refinement. They also give you more levers—like state control, retries, and breakpoints—to reach production-grade reliability.

When to Use Multi‑Agent Systems vs. Single Agents

Not every problem needs a team. Start simple and scale complexity as needed. Use multi-agent workflows when:

  • The task naturally decomposes into distinct steps (e.g., research → analysis → drafting → compliance review → publishing).
  • You need multiple validation passes (e.g., fact-checking, policy checks) without losing context or exhausting token limits.
  • You want to route different subtasks to different tools, models, or data sources.
  • The system must maintain a durable shared context over longer horizons (e.g., multi-day projects, customer journeys).

Prefer a single agent when the task is:

  • Short, well-bounded, and does not need cross-checks.
  • Purely generative without retrieval or external actions.
  • Latency-sensitive with minimal steps.

If you’re still assessing your options, our agent frameworks & orchestration guide explains the spectrum from single prompts to complex, graph-driven teams.

Core Orchestration Patterns and Communication Topologies

Choosing the right coordination strategy is the difference between a chatty, expensive swarm and a purposeful, efficient team. Below are the most common, production-proven patterns.

Planner–Executor

  • One agent plans steps; others execute. Great for decomposing broad objectives and mitigating hallucinations by grounding actions in explicit subtasks.

Router–Specialist

  • A router triages tasks to role-specific experts (e.g., legal, data, marketing). This reduces prompt complexity and token usage.

Debate and Critique

  • Two or more agents propose answers; a judge selects or synthesizes. Useful for high-stakes decisions, but be mindful of cost and latency.

Blackboard (Shared Workspace)

  • Agents read/write from a central state store. Promotes modularity and traceability, perfect for long-running workflows.

Round‑Robin Refinement

  • Agents iteratively improve a shared artifact (e.g., a draft). Add stop criteria to prevent endless loops.

Protocol-Driven Messaging

  • Agents adhere to typed messages or schemas so that outputs become valid inputs for the next step. This dramatically improves reliability.

Actionable takeaway: Start with Planner–Executor or Router–Specialist. Add Debate/Judge or Reviewers only for steps where error costs are high.

LangGraph Tutorial: Graph‑Driven Agents and State Machines

LangGraph is designed for building stateful, directed workflows where each node is a deterministic function (often LLM-backed) and edges encode control flow. It shines when you need fine-grained control of state, breakpoints, retries, and human-in-the-loop (HITL) interventions.

Core concepts

  • State: A typed object holding messages, artifacts, and metadata. Treat state as the single source of truth.
  • Nodes: Functions that read and mutate state (e.g., plan(), research(), draft()). Nodes can call tools and models.
  • Edges: Rules that decide which node to run next, often based on state flags, scores, or schema validation.
  • Checkpoints/Breakpoints: Pause execution for human review or external triggers.
  • Persistence: Store state in a database to resume long runs or branches.

Minimal pseudo-code example (planner → researcher → writer → reviewer → done):

state = {
  "messages": [],
  "plan": None,
  "research": None,
  "draft": None,
  "approved": False
}

@node
def planner(state):
    state["plan"] = plan_with_llm(state["messages"])  # produce structured steps
    return state

@node
def researcher(state):
    state["research"] = retrieve_and_summarize(state["plan"])  # tool calls + RAG
    return state

@node
def writer(state):
    state["draft"] = write_with_citations(state["plan"], state["research"])
    return state

@node
def reviewer(state):
    state["approved"] = policy_check(state["draft"])  # LLM or rules-based
    return state

@edge
def route(state):
    if state["plan"] is None:
        return planner
    if state["research"] is None:
        return researcher
    if state["draft"] is None:
        return writer
    if not state["approved"]:
        return reviewer
    return None  # done

run_graph(state, route)

Why teams choose LangGraph

  • Fine-grained control: Deterministic edges, typed state, and explicit transitions.
  • Transparency: Clear traces for observability and debugging.
  • HITL support: Native breakpoints make approvals and escalation straightforward.
  • Incremental complexity: Start with a simple graph and evolve.

When to be cautious

  • More boilerplate than “agentic chat” frameworks.
  • Requires design discipline: schemas, contracts, and state design.

Want a bigger-picture comparison with other frameworks? See our perspective in LangChain vs LangGraph vs AutoGen vs CrewAI: Which Agent Framework Should You Use in 2026?.

CrewAI Agents: Roles, Tasks, and Collaborative Processes

CrewAI focuses on collaborative agents with clear roles and tasks, organized into processes (e.g., sequential or hierarchical). It’s a fast way to move from “single agent” to “small team” without designing state graphs from scratch.

Core concepts

  • Agent: An LLM-powered role with a goal and tools (e.g., “Market Researcher”).
  • Task: A specific job assigned to an agent (e.g., “Summarize top 5 competitors”).
  • Crew: A collection of agents executing tasks, often with hand-offs.
  • Process: Orchestration strategy (sequential, parallel, hierarchical).

Pattern: Sequential Crew with a Reviewer

  • Define a Researcher to gather facts.
  • Define a Writer to produce a draft using the researcher’s findings.
  • Define a Reviewer to check citations and style.

In CrewAI, you’ll define agents with system prompts and tools, then create tasks with acceptance criteria. Processes orchestrate how outputs flow across tasks. Compared to LangGraph, you get faster time-to-value for common collaboration patterns, with fewer lines of code and strong defaults.

Why teams choose CrewAI

  • Simple mental model: roles → tasks → processes.
  • Productive defaults for collaboration and hand-offs.
  • Easy to prototype and iterate towards value.

When to be cautious

  • Less explicit control over state transitions than a graph.
  • For complex branching logic or custom gating, you may need extensions or to hybridize with graph-based orchestration.

LangGraph vs. CrewAI at a Glance

DimensionLangGraphCrewAI
Orchestration ModelExplicit graph/state machine with nodes and edgesRole- and task-based collaboration with processes
State ManagementTyped, centralized state; strong control over mutations and transitionsImplicit state via task outputs and hand-offs; simpler defaults
HITLNative breakpoints and resumeReview steps via tasks; less granular pausing
ComplexityMore boilerplate; high controlLess boilerplate; fast prototyping
Best ForComplex branching, compliance gates, long-running flowsSmall-to-medium teams with straightforward hand-offs

For a deeper decision flow, read the framework comparison for 2026: LangChain vs LangGraph vs AutoGen vs CrewAI.

Memory Architectures: Short‑Term, Long‑Term, and Shared Context

Memory choices determine whether your agents remain coherent or drift over time. Treat memory as a layered system.

Short-term (Working Memory)

  • Purpose: Carry immediate context through a step or two.
  • Techniques: Summarized message buffers, scratchpads, function-call arguments, ephemeral variables.
  • Tip: Avoid dumping raw logs into prompts; curate and summarize.

Medium-term (Task Memory)

  • Purpose: Persist artifacts within a run (e.g., plan, findings, draft).
  • Techniques: Structured state (LangGraph), task outputs (CrewAI), JSON documents.
  • Tip: Use schemas to keep artifacts machine-readable and composable.

Long-term (Semantic Memory)

  • Purpose: Remember reusable knowledge across runs or sessions.
  • Techniques: Vector databases, key-value stores, document stores, graph databases.
  • Tip: Separate reference knowledge (RAG corpora) from run-specific artifacts (e.g., decisions, outcomes).

Shared Memory (Blackboard)

  • Purpose: A common workspace for all agents to read/write.
  • Techniques: Central state (graph), knowledge base with lineage, or document collections tagged by run and role.
  • Tip: Annotate entries with provenance (who/what/when/source) to enable verification and rollbacks.

Reflection and Summarization

  • Introduce periodic consolidation steps (e.g., “reflect()”) that:
    • Summarize progress and decisions
    • Extract to structured notes (bullets, tables)
    • Purge low-value tokens to control cost

Actionable takeaway: Design memory first. Define what each agent must know, when they must know it, and how they’ll find it without inflating context windows.

Tooling Strategies: Retrieval, Function Calling, and External Systems

Tools give agents real-world leverage—search, databases, CRMs, spreadsheets, APIs, and code execution. But tools increase surface area and risk. Invest in robust, typed, idempotent tools.

Core principles

  • Schema-first design: Define JSON schemas for tool inputs/outputs. Provide examples. Validate strictly.
  • Idempotency: Design tools so retries don’t double-create or mutate irreversibly (include safe upserts, request IDs).
  • Timeouts, retries, and backoff: Handle flaky endpoints without stalling the workflow.
  • Role-based access: Scope tools per agent to narrow permissions and reduce misuse.

Retrieval (RAG)

  • Use retrieval for fresh, domain-specific facts. Index authoritative sources with metadata: title, URL, date, source type, and trust score.
  • Post-retrieval processing: Deduplicate, rank, and summarize before sending to the LLM to control costs.
  • Cite sources: Store citations in state to enable later verification and auditing.

Function Calling and Action Plans

  • Encourage agents to decide on an action plan before tool calls, especially in Planner–Executor patterns.
  • Validate tool outputs: Don’t trust blindly—add a verify() step or a separate reviewer agent for critical actions.

Composable tooling architecture

  • Wrap external APIs with a small compatibility layer that enforces schemas, logs calls, and tracks latencies.
  • Keep business logic outside the LLM where possible; let the LLM choose actions and parameters, not implement policies.

Actionable takeaway: Treat tools as first-class, typed contracts. Measure per-tool error rates and latencies to target fixes that move reliability fastest.

Reliability, Evaluation, and Observability

Agents must be measurable. Move beyond “it worked once” to repeatably good.

Define success metrics per step

  • Planning: Is the plan complete, non-duplicative, and actionable?
  • Retrieval: Are sources relevant and recent? Are citations valid?
  • Drafting: Does the draft meet tone, structure, and factuality criteria?
  • Review: Are the acceptance criteria satisfied?

Test types

  • Unit tests for tools: Validate schemas and boundary conditions.
  • Scenario tests for workflows: Provide seed inputs and assert on structured outputs.
  • Regression tests: Lock in good behaviors with snapshot comparisons of structured artifacts.

Observability

  • Tracing: Log each node/task, model, prompt, and tool call with timestamps and token counts.
  • Artifacts: Persist plans, retrieved facts, and drafts with provenance and hashes.
  • Error taxonomies: Mark failures as planning/tooling/policy/timeouts to focus remediation.

Quantifying costs (example math)

  • If a single run uses 5 LLM calls at 3K input + 1K output tokens each, and your model costs $1.00 per 1M input tokens and $4.00 per 1M output tokens, your per-run LLM cost is approximately: 5 × [(3,000 × $0.001) + (1,000 × $0.004)] = 5 × ($3.00 + $4.00 per 1M tokens → $0.003 + $0.004) = 5 × $0.007 = $0.035. This excludes retrieval and API tools. Use this kind of breakdown to monitor unit economics.

Actionable takeaway: Instrument from day one. Set budgets and SLAs (e.g., max tokens per run, max latency per step) and enforce them with guards.

Safety, Governance, and Human‑in‑the‑Loop Controls

Production systems must protect users, data, and your brand.

Data governance

  • Data minimization: Only include necessary PII or sensitive data in prompts.
  • Redaction: Mask sensitive fields before sending to third-party models.
  • Access controls: Scope tools and data per role; rotate keys regularly.

Policy and compliance

  • Policy-as-code checkers: Implement LLM or rule-based reviewers for compliance, tone, or restricted topics.
  • Source integrity: Prefer signed or first-party sources; store source metadata for audits.

Human-in-the-loop (HITL)

  • Breakpoints: Pause before high-impact actions (e.g., publishing, CRM updates) for approval.
  • Guided review UIs: Present key artifacts, citations, and diffs to reduce reviewer fatigue.

Red teaming and abuse resistance

  • Adversarial prompts: Test jailbreaks and prompt injection. Sanitize retrieved content.
  • Egress filters: Prevent agents from exfiltrating secrets in outputs.

Actionable takeaway: Codify gates where harm is likely. Use HITL sparingly but decisively at risk hot spots.

Performance, Cost, and Scaling Strategies

Multi-agent systems can sprawl if left unchecked. Design for speed and cost from the start.

Routing and model selection

  • Route lightweight subtasks (e.g., classification, extraction) to smaller, cheaper models.
  • Reserve top-tier models for synthesis or complex reasoning.

Context control

  • Summarize aggressively. Promote only high-value facts into prompts.
  • Use structured references (IDs, keys) instead of inlining long passages.

Parallelism and batching

  • Parallelize independent tasks (e.g., researching multiple sources). Cap concurrency to respect rate limits.
  • Batch tool calls (e.g., query multiple endpoints per request) where APIs support it.

Caching and deduplication

  • Cache retrieval results per query and day; avoid re-fetching stable content.
  • Memoize expensive intermediate steps (e.g., entity extraction) across runs.

Latency budget example

  • Suppose your SLO is p95 < 20s. Allocate: planning 2s, retrieval 8s (parallelized), drafting 8s, review 2s. If a step exceeds its budget, degrade gracefully (e.g., fewer sources, summary-only mode) rather than fail the whole run.

Actionable takeaway: Establish a latency and cost budget per step, then optimize the heaviest steps first. Add caches and parallelism where they make the biggest dent.

Example Project: Research‑to‑Content Pipeline Across LangGraph and CrewAI

Let’s design a realistic mini-case: a content team needs a fact-checked blog post on an emerging AI topic. The workflow: plan → research → outline → draft → review → publish.

Business goals

  • Accuracy: Cite credible sources and avoid outdated claims.
  • Consistency: Match brand voice and readability standards.
  • Efficiency: Produce a strong first draft in one pass; revisions < 20%.

Data and tools

  • Web search + site-specific APIs for trusted publications.
  • Internal brand voice guide and style rules.
  • Vector index of prior approved posts for tone and structure references.

Option A: LangGraph implementation

  • Nodes
    • plan(): Create a structured outline and a list of research questions.
    • retrieve(): Query search and knowledge bases per question; store sources with metadata.
    • outline(): Propose a section-by-section outline grounded in retrieved facts.
    • draft(): Write a draft with inline citations and a bibliography.
    • review(): Check claims against sources; enforce style and tone policies.
    • publish(): Prepare CMS-ready markdown; pause for human approval (breakpoint).
  • Edges
    • If any claim fails verification, loop back to retrieve() with adjusted queries.
    • If tone/style fails, loop back to draft() with specific corrections.
  • Memory
    • state.plan: JSON with goals, target audience, and outline.
    • state.sources: Array of documents with URL, title, date, trust score.
    • state.draft: Markdown with citation keys.
    • state.issues: Array of verification failures to address.

Option B: CrewAI implementation

  • Agents
    • Researcher: Expert in AI news; tools: web search, site APIs.
    • Writer: Skilled technical writer; tools: vector index for voice.
    • Reviewer: Policy and fact checker; tools: citation validator.
  • Tasks (Sequential Process)
    • Task 1: Researcher gathers and summarizes 8–12 recent, credible sources.
    • Task 2: Writer creates a detailed outline and first draft with citations.
    • Task 3: Reviewer flags factual gaps or style violations; requests targeted fixes.
  • HITL
    • Final approval before publishing to CMS.

Tradeoffs

  • LangGraph offers stronger control of verification loops and traceability. Ideal if compliance is strict.
  • CrewAI accelerates from zero to value with minimal overhead. Ideal for teams prioritizing speed to prototype.

Results to aim for

  • A reproducible pipeline where drafts consistently include citations, align with brand voice, and reach approval with minimal rework.

For broader context on selecting your toolchain, revisit our analysis: Agent Frameworks & Orchestration: A Complete Guide and the side-by-side which agent framework should you use in 2026.

Implementation Checklist and Next Steps

A clear, staged plan keeps your project on track and your costs in check.

  • Define outcomes and acceptance criteria: What does “good” look like per agent and per artifact?
  • Choose orchestration: CrewAI for fast collaboration; LangGraph for complex branching or compliance.
  • Design memory: Short-term (scratchpads), task memory (structured state), long-term (vector index). Add provenance tags.
  • Scope tools per role: Schema-first; idempotent; with timeouts and retries. Add a verifier for critical actions.
  • Prompt engineering: Write role prompts with examples and JSON output schemas. Keep them short and directive.
  • Guardrails and HITL: Insert gates before high-impact actions. Add redaction and policy checks.
  • Observability: Trace steps, tokens, latencies; store artifacts and decisions. Set budgets and SLAs.
  • Testing: Unit-test tools; scenario-test workflows with seeded inputs and snapshot comparisons.
  • Pilot and iterate: Start with a narrow use case; collect feedback; expand roles and tools gradually.
  • Productionize: Add caching, parallelism, rate-limit handling, and resilience. Document runbooks and on-call procedures.

Summary: Build Multi‑Agent Systems That Ship Value, Not Chaos

Multi-agent workflows turn complex, cross-disciplinary tasks into reliable, repeatable pipelines. LangGraph gives you precision—typed state, explicit edges, and robust HITL. CrewAI gives you speed—clear roles, task hand-offs, and productive defaults. Combine sound orchestration patterns with layered memory, carefully designed tools, and strong governance, and you unlock systems that are accurate, auditable, and economical.

Key takeaways

  • Start simple with Planner–Executor or Router–Specialist; add reviewers only where errors are costly.
  • Design memory deliberately: short-term for working context, structured task memory for artifacts, and long-term semantic memory for reuse.
  • Treat tools as contracts: typed schemas, idempotency, and verification steps.
  • Measure relentlessly: trace steps, costs, and latencies; enforce budgets and SLAs.
  • Apply governance: data minimization, policy checks, and human approvals where they matter most.

When you’re ready to go deeper on framework tradeoffs and orchestration strategies, read our Agent Frameworks & Orchestration: A Complete Guide and the comprehensive LangChain vs LangGraph vs AutoGen vs CrewAI: Which Agent Framework Should You Use in 2026?.

Want hands-on help tailoring this to your stack and workflows? We design and deploy custom AI chatbots, autonomous agents, and intelligent automations. Schedule a friendly consultation and let’s build something your team will trust.

multi-agent workflows
LangGraph tutorial
CrewAI agents
AI automation
agent orchestration

Related Posts

Chatbot Analytics and Evaluation Case Study: KPIs, A/B Testing, and Conversation Quality

Chatbot Analytics and Evaluation Case Study: KPIs, A/B Testing, and Conversation Quality

By Staff Writer

Tool Use for AI Agents: Actions, Retrievers, and Function Calling with OpenAI, Anthropic, and Google Models

Tool Use for AI Agents: Actions, Retrievers, and Function Calling with OpenAI, Anthropic, and Google Models

By Staff Writer