How We Built a Workflow Engine That Orchestrates 15+ AI Calls (And Never Loses Track)

TL;DR: We built a step orchestrator that manages complex AI workflows with 15+ sequential steps. It handles dependencies, retries failures, tracks progress, enables parallel execution, and maintains reliability. Built from scratch in TypeScript. Zero workflow failures in production.

The Problem: AI Workflows Are Complex

Generating a website isn’t one AI call—it’s 15+ calls in a specific order:

Research business
Search competitors
Generate strategy
Create brand guidelines
Design logo
Generate hero image
Create header
Create footer
Plan pages
Generate page 1 (sections 1-5)
Generate page 2 (sections 1-3)
… (15+ total steps)

Each step depends on previous steps:

Logo needs brand colors (from step 4)
Header needs logo (from step 5)
Pages need strategy (from step 3)

What happens when step 8 fails?

Do we restart from step 1? (expensive, slow)
Do we skip step 8? (incomplete website)
Do we retry step 8? (how many times?)

Traditional approaches:

Sequential scripts: Hard-coded order, no retry logic, brittle
Workflow engines (Temporal, Airflow): Overkill for our needs, complex setup
Hope and pray: Run steps, hope nothing fails (it will)

We needed something better: A workflow engine built for AI.

The Insight: Steps as First-Class Citizens

The breakthrough came when we stopped thinking about “functions” and started thinking about “steps.”

Bad: Functions that call other functions

async function generateWebsite(businessName: string) {
  const research = await researchBusiness(businessName);
  const strategy = await generateStrategy(research);
  const logo = await createLogo(strategy);
  const header = await createHeader(logo);
  // ... 10 more steps
}

Good: Steps that declare dependencies

class HeaderStep implements Step {
  id = 'header';
  requiredInputs = ['logo', 'brandStrategy'];
  
  async execute(context: Context) {
    const logo = context.get('logo');
    const brandStrategy = context.get('brandStrategy');
    return await createHeader(logo, brandStrategy);
  }
}

The difference? Steps are self-documenting, composable, and orchestratable.

How It Works: The Technical Architecture

1. Step Interface

Every step implements this interface:

interface Step {
  // Identity
  id: string;
  name: string;
  description: string;
  
  // Dependencies
  requiredInputs: string[];
  
  // Execution
  execute(context: StepContext): Promise<StepResult>;
  
  // Configuration
  timeout: number;        // Max execution time (ms)
  maxRetries: number;     // Max retry attempts
  progressWeight: number; // Contribution to overall progress (0-100)
  
  // Validation
  validateInputs(context: StepContext): ValidationResult;
}

interface StepContext {
  // Get data from previous steps
  get<T>(key: string): T;
  
  // Store data for future steps
  set<T>(key: string, value: T): void;
  
  // Check if step completed
  isCompleted(stepId: string): boolean;
  
  // Business info
  businessId: string;
  businessName: string;
  versionId: string;
}

interface StepResult {
  success: boolean;
  data?: any;
  error?: string;
  shouldRetry?: boolean;
}

2. Step Orchestrator

The brain that manages step execution:

class StepOrchestrator {
  private steps: Step[];
  private context: StepContext;
  
  constructor(steps: Step[], context: StepContext) {
    this.steps = steps;
    this.context = context;
  }
  
  async execute(): Promise<WorkflowResult> {
    // Validate dependency graph
    this.validateDependencies();
    
    // Execute steps in order
    for (const step of this.steps) {
      console.log(`Executing step: ${step.name}`);
      
      // Validate inputs
      const validation = step.validateInputs(this.context);
      if (!validation.valid) {
        return {
          success: false,
          failedStep: step.id,
          error: validation.errorMessage
        };
      }
      
      // Execute with retries
      const result = await this.executeWithRetries(step);
      
      if (!result.success) {
        return {
          success: false,
          failedStep: step.id,
          error: result.error
        };
      }
      
      // Update progress
      await this.updateProgress(step);
    }
    
    return { success: true };
  }
  
  private async executeWithRetries(step: Step): Promise<StepResult> {
    let attempts = 0;
    let lastError: string;
    
    while (attempts < step.maxRetries) {
      attempts++;
      
      try {
        // Execute with timeout
        const result = await Promise.race([
          step.execute(this.context),
          this.timeout(step.timeout)
        ]);
        
        if (result.success) {
          return result;
        }
        
        lastError = result.error;
        
        if (!result.shouldRetry) {
          break; // Don't retry
        }
        
        console.log(`Step ${step.name} failed, retrying (${attempts}/${step.maxRetries})`);
        
        // Exponential backoff
        await this.sleep(Math.pow(2, attempts) * 1000);
        
      } catch (error) {
        lastError = error.message;
      }
    }
    
    return {
      success: false,
      error: lastError || 'Step failed after max retries'
    };
  }
  
  private timeout(ms: number): Promise<never> {
    return new Promise((_, reject) => {
      setTimeout(() => reject(new Error('Step timeout')), ms);
    });
  }
  
  private async updateProgress(step: Step) {
    const completedWeight = this.steps
      .filter(s => this.context.isCompleted(s.id))
      .reduce((sum, s) => sum + s.progressWeight, 0);
    
    const totalWeight = this.steps
      .reduce((sum, s) => sum + s.progressWeight, 0);
    
    const progress = Math.round((completedWeight / totalWeight) * 100);
    
    await publishProgress(this.context.versionId, {
      step: step.id,
      progress,
      message: `Completed ${step.name}`
    });
  }
  
  private validateDependencies() {
    // Build dependency graph
    const graph = new Map<string, Set<string>>();
    
    for (const step of this.steps) {
      graph.set(step.id, new Set(step.requiredInputs));
    }
    
    // Detect circular dependencies
    const visited = new Set<string>();
    const recursionStack = new Set<string>();
    
    const hasCycle = (stepId: string): boolean => {
      visited.add(stepId);
      recursionStack.add(stepId);
      
      const deps = graph.get(stepId) || new Set();
      for (const dep of deps) {
        if (!visited.has(dep)) {
          if (hasCycle(dep)) return true;
        } else if (recursionStack.has(dep)) {
          return true; // Cycle detected!
        }
      }
      
      recursionStack.delete(stepId);
      return false;
    };
    
    for (const stepId of graph.keys()) {
      if (!visited.has(stepId)) {
        if (hasCycle(stepId)) {
          throw new Error(`Circular dependency detected involving ${stepId}`);
        }
      }
    }
  }
}

3. Example Step Implementation

Here’s a real step from our system:

class LogoStep implements Step {
  id = 'logo';
  name = 'Logo Generation';
  description = 'Create a professional logo';
  requiredInputs = ['brandStrategy'];
  timeout = 60000; // 60 seconds
  maxRetries = 3;
  progressWeight = 10;
  
  validateInputs(context: StepContext): ValidationResult {
    try {
      const brandStrategy = context.get('brandStrategy');
      
      if (!brandStrategy.colors || brandStrategy.colors.length === 0) {
        return {
          valid: false,
          errorMessage: 'Brand strategy missing colors'
        };
      }
      
      return { valid: true };
    } catch (error) {
      return {
        valid: false,
        errorMessage: `Missing required input: brandStrategy`
      };
    }
  }
  
  async execute(context: StepContext): Promise<StepResult> {
    const brandStrategy = context.get('brandStrategy');
    const businessName = context.businessName;
    
    try {
      // Generate logo with AI
      const logoUrl = await generateLogo({
        businessName,
        colors: brandStrategy.colors,
        style: brandStrategy.visualStyle,
        industry: brandStrategy.industry
      });
      
      // Store in context
      context.set('logo', {
        url: logoUrl,
        colors: brandStrategy.colors,
        generatedAt: new Date()
      });
      
      return {
        success: true,
        data: { logoUrl }
      };
      
    } catch (error) {
      return {
        success: false,
        error: error.message,
        shouldRetry: true // Retry on failure
      };
    }
  }
}

4. Parallel Execution

Some steps can run in parallel:

class StepOrchestrator {
  async execute(): Promise<WorkflowResult> {
    // Group steps by dependencies
    const groups = this.groupByDependencies();
    
    for (const group of groups) {
      // Execute group in parallel
      const results = await Promise.all(
        group.map(step => this.executeWithRetries(step))
      );
      
      // Check for failures
      const failed = results.find(r => !r.success);
      if (failed) {
        return {
          success: false,
          error: failed.error
        };
      }
    }
    
    return { success: true };
  }
  
  private groupByDependencies(): Step[][] {
    const groups: Step[][] = [];
    const completed = new Set<string>();
    
    while (completed.size < this.steps.length) {
      // Find steps whose dependencies are all completed
      const ready = this.steps.filter(step => 
        !completed.has(step.id) &&
        step.requiredInputs.every(dep => completed.has(dep))
      );
      
      if (ready.length === 0) {
        throw new Error('Dependency deadlock detected');
      }
      
      groups.push(ready);
      ready.forEach(step => completed.add(step.id));
    }
    
    return groups;
  }
}

Example execution:

Group 1: [research]
Group 2: [webSearch, strategy] // Parallel
Group 3: [brandStrategy]
Group 4: [logo, heroImage]     // Parallel
Group 5: [header, footer]      // Parallel
Group 6: [planning]
Group 7: [page1, page2, page3] // Parallel
Group 8: [assembly]

5. Dynamic Step Enqueueing

Some steps create more steps:

class PlanningStep implements Step {
  id = 'planning';
  name = 'Planning';
  requiredInputs = ['strategy'];
  
  async execute(context: StepContext): Promise<StepResult> {
    const strategy = context.get('strategy');
    
    // Generate page plan
    const pagePlan = await generatePagePlan(strategy);
    
    // Store plan
    context.set('pagePlan', pagePlan);
    
    // Enqueue page generation steps
    for (const page of pagePlan.pages) {
      const pageStep = new PageStep(page.id, page.name, page.sections);
      this.orchestrator.enqueueStep(pageStep);
    }
    
    return {
      success: true,
      data: { pageCount: pagePlan.pages.length }
    };
  }
}

The Challenges We Solved

Challenge 1: Progress Tracking

Problem: Users want to see progress, but steps take different amounts of time

Solution: Weighted progress

const steps = [
  { id: 'research', progressWeight: 10 },
  { id: 'strategy', progressWeight: 15 },
  { id: 'logo', progressWeight: 10 },
  { id: 'pages', progressWeight: 50 }, // Heaviest step
  { id: 'assembly', progressWeight: 15 }
];

// Progress = (completed weight / total weight) * 100

Challenge 2: Partial Failures

Problem: Step 10 fails, but steps 1-9 succeeded. Don’t want to redo everything.

Solution: Resume from last successful step

async function resumeWorkflow(versionId: string) {
  // Load context from database
  const context = await loadContext(versionId);
  
  // Find last completed step
  const completedSteps = new Set(context.completedSteps);
  
  // Filter out completed steps
  const remainingSteps = allSteps.filter(step => 
    !completedSteps.has(step.id)
  );
  
  // Resume execution
  const orchestrator = new StepOrchestrator(remainingSteps, context);
  return await orchestrator.execute();
}

Challenge 3: Debugging Failures

Problem: When a step fails, hard to know why

Solution: Detailed logging + step history

class StepOrchestrator {
  private async executeWithRetries(step: Step): Promise<StepResult> {
    const startTime = Date.now();
    
    try {
      const result = await step.execute(this.context);
      
      // Log success
      await logStepExecution({
        stepId: step.id,
        versionId: this.context.versionId,
        status: 'success',
        duration: Date.now() - startTime,
        result: result.data
      });
      
      return result;
      
    } catch (error) {
      // Log failure
      await logStepExecution({
        stepId: step.id,
        versionId: this.context.versionId,
        status: 'failed',
        duration: Date.now() - startTime,
        error: error.message,
        stack: error.stack
      });
      
      throw error;
    }
  }
}

The Results: Reliable AI Workflows

Before (no orchestrator):

20% of website generations failed mid-process
No way to resume failed generations
No progress tracking
Hard-coded step order
No parallel execution

After (step orchestrator):

0.1% failure rate (only unrecoverable errors)
Automatic resume on failure
Real-time progress updates
Declarative step dependencies
3x faster with parallel execution

Additional benefits:

Easy to add new steps: Just implement the interface
Easy to modify workflow: Change step order, add dependencies
Easy to debug: Detailed logs for every step
Easy to test: Mock context, test steps in isolation

Why This Matters for AI Applications

Most AI applications are single-shot: one prompt, one response. We learned:

Bad: Chain AI calls in code → hope nothing fails Good: Build an orchestrator → handle failures gracefully

The startup lesson: AI workflows need orchestration. Don’t hard-code your workflow—build a system that manages it.

Key Insights

Steps > functions: Declarative dependencies beat imperative code
Retries are essential: AI calls fail, plan for it
Progress matters: Users need feedback on long-running tasks
Parallel execution: Run independent steps simultaneously

What’s Next

We’re exploring:

Conditional steps: Skip steps based on business type
Step branching: Different workflows for different scenarios
Step caching: Reuse results from previous executions
Distributed execution: Run steps on different machines

But the core insight remains: Orchestration is infrastructure, not a feature.

Try it yourself: Generate a website with WebZum, watch the progress bar. Each step is managed by the orchestrator—if one fails, it retries automatically.

Building an AI workflow? Key takeaway: Don’t chain AI calls in code. Build a step orchestrator that manages dependencies, retries, progress, and parallelism.

The future of AI applications isn’t single-shot prompts—it’s orchestrated workflows.

How We Built a Workflow Engine That Orchestrates 15+ AI Calls (And Never Loses Track)

The Problem: AI Workflows Are Complex

Generating a website isn’t one AI call—it’s 15+ calls in a specific order:

Research business
Search competitors
Generate strategy
Create brand guidelines
Design logo
Generate hero image
Create header
Create footer
Plan pages
Generate page 1 (sections 1-5)
Generate page 2 (sections 1-3)
… (15+ total steps)

Each step depends on previous steps:

Logo needs brand colors (from step 4)
Header needs logo (from step 5)
Pages need strategy (from step 3)

What happens when step 8 fails?

Do we restart from step 1? (expensive, slow)
Do we skip step 8? (incomplete website)
Do we retry step 8? (how many times?)

Traditional approaches:

Sequential scripts: Hard-coded order, no retry logic, brittle
Workflow engines (Temporal, Airflow): Overkill for our needs, complex setup
Hope and pray: Run steps, hope nothing fails (it will)

We needed something better: A workflow engine built for AI.

The Insight: Steps as First-Class Citizens

The breakthrough came when we stopped thinking about “functions” and started thinking about “steps.”

Bad: Functions that call other functions

async function generateWebsite(businessName: string) {
  const research = await researchBusiness(businessName);
  const strategy = await generateStrategy(research);
  const logo = await createLogo(strategy);
  const header = await createHeader(logo);
  // ... 10 more steps
}

Good: Steps that declare dependencies

class HeaderStep implements Step {
  id = 'header';
  requiredInputs = ['logo', 'brandStrategy'];
  
  async execute(context: Context) {
    const logo = context.get('logo');
    const brandStrategy = context.get('brandStrategy');
    return await createHeader(logo, brandStrategy);
  }
}

The difference? Steps are self-documenting, composable, and orchestratable.

How It Works: The Technical Architecture

1. Step Interface

Every step implements this interface:

interface Step {
  // Identity
  id: string;
  name: string;
  description: string;
  
  // Dependencies
  requiredInputs: string[];
  
  // Execution
  execute(context: StepContext): Promise<StepResult>;
  
  // Configuration
  timeout: number;        // Max execution time (ms)
  maxRetries: number;     // Max retry attempts
  progressWeight: number; // Contribution to overall progress (0-100)
  
  // Validation
  validateInputs(context: StepContext): ValidationResult;
}

interface StepContext {
  // Get data from previous steps
  get<T>(key: string): T;
  
  // Store data for future steps
  set<T>(key: string, value: T): void;
  
  // Check if step completed
  isCompleted(stepId: string): boolean;
  
  // Business info
  businessId: string;
  businessName: string;
  versionId: string;
}

interface StepResult {
  success: boolean;
  data?: any;
  error?: string;
  shouldRetry?: boolean;
}

2. Step Orchestrator

The brain that manages step execution:

class StepOrchestrator {
  private steps: Step[];
  private context: StepContext;
  
  constructor(steps: Step[], context: StepContext) {
    this.steps = steps;
    this.context = context;
  }
  
  async execute(): Promise<WorkflowResult> {
    // Validate dependency graph
    this.validateDependencies();
    
    // Execute steps in order
    for (const step of this.steps) {
      console.log(`Executing step: ${step.name}`);
      
      // Validate inputs
      const validation = step.validateInputs(this.context);
      if (!validation.valid) {
        return {
          success: false,
          failedStep: step.id,
          error: validation.errorMessage
        };
      }
      
      // Execute with retries
      const result = await this.executeWithRetries(step);
      
      if (!result.success) {
        return {
          success: false,
          failedStep: step.id,
          error: result.error
        };
      }
      
      // Update progress
      await this.updateProgress(step);
    }
    
    return { success: true };
  }
  
  private async executeWithRetries(step: Step): Promise<StepResult> {
    let attempts = 0;
    let lastError: string;
    
    while (attempts < step.maxRetries) {
      attempts++;
      
      try {
        // Execute with timeout
        const result = await Promise.race([
          step.execute(this.context),
          this.timeout(step.timeout)
        ]);
        
        if (result.success) {
          return result;
        }
        
        lastError = result.error;
        
        if (!result.shouldRetry) {
          break; // Don't retry
        }
        
        console.log(`Step ${step.name} failed, retrying (${attempts}/${step.maxRetries})`);
        
        // Exponential backoff
        await this.sleep(Math.pow(2, attempts) * 1000);
        
      } catch (error) {
        lastError = error.message;
      }
    }
    
    return {
      success: false,
      error: lastError || 'Step failed after max retries'
    };
  }
  
  private timeout(ms: number): Promise<never> {
    return new Promise((_, reject) => {
      setTimeout(() => reject(new Error('Step timeout')), ms);
    });
  }
  
  private async updateProgress(step: Step) {
    const completedWeight = this.steps
      .filter(s => this.context.isCompleted(s.id))
      .reduce((sum, s) => sum + s.progressWeight, 0);
    
    const totalWeight = this.steps
      .reduce((sum, s) => sum + s.progressWeight, 0);
    
    const progress = Math.round((completedWeight / totalWeight) * 100);
    
    await publishProgress(this.context.versionId, {
      step: step.id,
      progress,
      message: `Completed ${step.name}`
    });
  }
  
  private validateDependencies() {
    // Build dependency graph
    const graph = new Map<string, Set<string>>();
    
    for (const step of this.steps) {
      graph.set(step.id, new Set(step.requiredInputs));
    }
    
    // Detect circular dependencies
    const visited = new Set<string>();
    const recursionStack = new Set<string>();
    
    const hasCycle = (stepId: string): boolean => {
      visited.add(stepId);
      recursionStack.add(stepId);
      
      const deps = graph.get(stepId) || new Set();
      for (const dep of deps) {
        if (!visited.has(dep)) {
          if (hasCycle(dep)) return true;
        } else if (recursionStack.has(dep)) {
          return true; // Cycle detected!
        }
      }
      
      recursionStack.delete(stepId);
      return false;
    };
    
    for (const stepId of graph.keys()) {
      if (!visited.has(stepId)) {
        if (hasCycle(stepId)) {
          throw new Error(`Circular dependency detected involving ${stepId}`);
        }
      }
    }
  }
}

3. Example Step Implementation

Here’s a real step from our system:

class LogoStep implements Step {
  id = 'logo';
  name = 'Logo Generation';
  description = 'Create a professional logo';
  requiredInputs = ['brandStrategy'];
  timeout = 60000; // 60 seconds
  maxRetries = 3;
  progressWeight = 10;
  
  validateInputs(context: StepContext): ValidationResult {
    try {
      const brandStrategy = context.get('brandStrategy');
      
      if (!brandStrategy.colors || brandStrategy.colors.length === 0) {
        return {
          valid: false,
          errorMessage: 'Brand strategy missing colors'
        };
      }
      
      return { valid: true };
    } catch (error) {
      return {
        valid: false,
        errorMessage: `Missing required input: brandStrategy`
      };
    }
  }
  
  async execute(context: StepContext): Promise<StepResult> {
    const brandStrategy = context.get('brandStrategy');
    const businessName = context.businessName;
    
    try {
      // Generate logo with AI
      const logoUrl = await generateLogo({
        businessName,
        colors: brandStrategy.colors,
        style: brandStrategy.visualStyle,
        industry: brandStrategy.industry
      });
      
      // Store in context
      context.set('logo', {
        url: logoUrl,
        colors: brandStrategy.colors,
        generatedAt: new Date()
      });
      
      return {
        success: true,
        data: { logoUrl }
      };
      
    } catch (error) {
      return {
        success: false,
        error: error.message,
        shouldRetry: true // Retry on failure
      };
    }
  }
}

4. Parallel Execution

Some steps can run in parallel:

class StepOrchestrator {
  async execute(): Promise<WorkflowResult> {
    // Group steps by dependencies
    const groups = this.groupByDependencies();
    
    for (const group of groups) {
      // Execute group in parallel
      const results = await Promise.all(
        group.map(step => this.executeWithRetries(step))
      );
      
      // Check for failures
      const failed = results.find(r => !r.success);
      if (failed) {
        return {
          success: false,
          error: failed.error
        };
      }
    }
    
    return { success: true };
  }
  
  private groupByDependencies(): Step[][] {
    const groups: Step[][] = [];
    const completed = new Set<string>();
    
    while (completed.size < this.steps.length) {
      // Find steps whose dependencies are all completed
      const ready = this.steps.filter(step => 
        !completed.has(step.id) &&
        step.requiredInputs.every(dep => completed.has(dep))
      );
      
      if (ready.length === 0) {
        throw new Error('Dependency deadlock detected');
      }
      
      groups.push(ready);
      ready.forEach(step => completed.add(step.id));
    }
    
    return groups;
  }
}

Example execution:

Group 1: [research]
Group 2: [webSearch, strategy] // Parallel
Group 3: [brandStrategy]
Group 4: [logo, heroImage]     // Parallel
Group 5: [header, footer]      // Parallel
Group 6: [planning]
Group 7: [page1, page2, page3] // Parallel
Group 8: [assembly]

5. Dynamic Step Enqueueing

Some steps create more steps:

class PlanningStep implements Step {
  id = 'planning';
  name = 'Planning';
  requiredInputs = ['strategy'];
  
  async execute(context: StepContext): Promise<StepResult> {
    const strategy = context.get('strategy');
    
    // Generate page plan
    const pagePlan = await generatePagePlan(strategy);
    
    // Store plan
    context.set('pagePlan', pagePlan);
    
    // Enqueue page generation steps
    for (const page of pagePlan.pages) {
      const pageStep = new PageStep(page.id, page.name, page.sections);
      this.orchestrator.enqueueStep(pageStep);
    }
    
    return {
      success: true,
      data: { pageCount: pagePlan.pages.length }
    };
  }
}

The Challenges We Solved

Challenge 1: Progress Tracking

Problem: Users want to see progress, but steps take different amounts of time

Solution: Weighted progress

const steps = [
  { id: 'research', progressWeight: 10 },
  { id: 'strategy', progressWeight: 15 },
  { id: 'logo', progressWeight: 10 },
  { id: 'pages', progressWeight: 50 }, // Heaviest step
  { id: 'assembly', progressWeight: 15 }
];

// Progress = (completed weight / total weight) * 100

Challenge 2: Partial Failures

Problem: Step 10 fails, but steps 1-9 succeeded. Don’t want to redo everything.

Solution: Resume from last successful step

async function resumeWorkflow(versionId: string) {
  // Load context from database
  const context = await loadContext(versionId);
  
  // Find last completed step
  const completedSteps = new Set(context.completedSteps);
  
  // Filter out completed steps
  const remainingSteps = allSteps.filter(step => 
    !completedSteps.has(step.id)
  );
  
  // Resume execution
  const orchestrator = new StepOrchestrator(remainingSteps, context);
  return await orchestrator.execute();
}

Challenge 3: Debugging Failures

Problem: When a step fails, hard to know why

Solution: Detailed logging + step history

class StepOrchestrator {
  private async executeWithRetries(step: Step): Promise<StepResult> {
    const startTime = Date.now();
    
    try {
      const result = await step.execute(this.context);
      
      // Log success
      await logStepExecution({
        stepId: step.id,
        versionId: this.context.versionId,
        status: 'success',
        duration: Date.now() - startTime,
        result: result.data
      });
      
      return result;
      
    } catch (error) {
      // Log failure
      await logStepExecution({
        stepId: step.id,
        versionId: this.context.versionId,
        status: 'failed',
        duration: Date.now() - startTime,
        error: error.message,
        stack: error.stack
      });
      
      throw error;
    }
  }
}

The Results: Reliable AI Workflows

Before (no orchestrator):

20% of website generations failed mid-process
No way to resume failed generations
No progress tracking
Hard-coded step order
No parallel execution

After (step orchestrator):

0.1% failure rate (only unrecoverable errors)
Automatic resume on failure
Real-time progress updates
Declarative step dependencies
3x faster with parallel execution

Additional benefits:

Easy to add new steps: Just implement the interface
Easy to modify workflow: Change step order, add dependencies
Easy to debug: Detailed logs for every step
Easy to test: Mock context, test steps in isolation

Why This Matters for AI Applications

Most AI applications are single-shot: one prompt, one response. We learned:

Bad: Chain AI calls in code → hope nothing fails Good: Build an orchestrator → handle failures gracefully

The startup lesson: AI workflows need orchestration. Don’t hard-code your workflow—build a system that manages it.

Key Insights

Steps > functions: Declarative dependencies beat imperative code
Retries are essential: AI calls fail, plan for it
Progress matters: Users need feedback on long-running tasks
Parallel execution: Run independent steps simultaneously

What’s Next

We’re exploring:

Conditional steps: Skip steps based on business type
Step branching: Different workflows for different scenarios
Step caching: Reuse results from previous executions
Distributed execution: Run steps on different machines

But the core insight remains: Orchestration is infrastructure, not a feature.

Try it yourself: Generate a website with WebZum, watch the progress bar. Each step is managed by the orchestrator—if one fails, it retries automatically.

Building an AI workflow? Key takeaway: Don’t chain AI calls in code. Build a step orchestrator that manages dependencies, retries, progress, and parallelism.

The future of AI applications isn’t single-shot prompts—it’s orchestrated workflows.

WebZum

How We Built a Workflow Engine That Orchestrates 15+ AI Calls (And Never Loses Track)

How We Built a Workflow Engine That Orchestrates 15+ AI Calls (And Never Loses Track)

The Problem: AI Workflows Are Complex

The Insight: Steps as First-Class Citizens

How It Works: The Technical Architecture

1. Step Interface

2. Step Orchestrator

3. Example Step Implementation

4. Parallel Execution

5. Dynamic Step Enqueueing

The Challenges We Solved

Challenge 1: Progress Tracking

Challenge 2: Partial Failures

Challenge 3: Debugging Failures

The Results: Reliable AI Workflows

Why This Matters for AI Applications

Key Insights

What’s Next

Ready to Build Your Website?

WebZum

How We Built a Workflow Engine That Orchestrates 15+ AI Calls (And Never Loses Track)

How We Built a Workflow Engine That Orchestrates 15+ AI Calls (And Never Loses Track)

The Problem: AI Workflows Are Complex

The Insight: Steps as First-Class Citizens

How It Works: The Technical Architecture

1. Step Interface

2. Step Orchestrator

3. Example Step Implementation

4. Parallel Execution

5. Dynamic Step Enqueueing

The Challenges We Solved

Challenge 1: Progress Tracking

Challenge 2: Partial Failures

Challenge 3: Debugging Failures

The Results: Reliable AI Workflows

Why This Matters for AI Applications

Key Insights

What’s Next

Ready to Build Your Website?