Error Handling & Recovery

Retry Policies

Configure sophisticated retry strategies with customizable limits, delays, and backoff patterns. Learn more

Error Classification

Distinguish between transient failures that can be retried and permanent failures that require intervention using custom error types.

Logging & Monitoring

Utilize built-in logging and OpenTelemetry for observing errors.

Error Types

Understanding the nature of potential errors helps in designing appropriate handling strategies:

Transient Failures

Temporary issues that may resolve on retry:

Network timeouts
Temporary service unavailability
Rate limiting
Database deadlocks (These should generally be allowed to retry)

Permanent Failures

Issues that won’t be resolved by retrying:

Invalid input data (ValidationError)
Business rule violations (Throw NonRetryableError)
Authentication failures (Throw NonRetryableError)
Resource not found (Throw NonRetryableError) (These should often prevent retries)

System Errors

Unexpected system-level issues that might cause the workflow to fail:

Out of memory
Process crashes
Engine configuration errors (Often lead to a FAILED instance state)

Handling Errors in Workflows

Use standard try...catch blocks combined with the specific error types exported by @identity-flow/sdk for robust error handling.

Catching Specific Errors
Throwing Specific Errors

import { defineWorkflow, NonRetryableError, ValidationError, isValidationError, isLockedError } from '@identity-flow/sdk';

export default defineWorkflow('process-payment', async (flow) => {
try {
await flow.do('charge card', {
retries: {
limit: 3,
delay: '30 seconds',
backoff: 'exponential'
}
}, async () => {
// This function might throw different error types
return await chargeCard(flow.params);
});
flow.log('Payment successful');
} catch (error) {
if (isValidationError(error)) {
// Handle data validation issues specifically
flow.warn('Payment failed due to validation error:', error.message, error.issues);
// Perhaps notify user or end workflow gracefully
return { status: 'VALIDATION*FAILED', issues: error.issues };
} else if (error instanceof NonRetryableError || isLockedError(error)) {
// Handle errors explicitly marked as non-retryable or locked resources
flow.error('Permanent error during payment:', error.message, { code: error.code });
// Perform cleanup if needed
await flow.do('cleanup-failed-payment', async () => { /* ... \_/ });
return { status: 'FAILED_PERMANENTLY', reason: error.message };
} else {
// Assume other errors are transient and might have been retried
// This block catches the error after retries are exhausted
flow.error('Payment failed after retries:', error.message);
// Rethrow to fail the workflow instance
throw error;
}
}
});

import { defineWorkflow, NonRetryableError, ValidationError } from '@identity-flow/sdk';

// Inside an activity function (e.g., used within flow.do)
async function chargeCard(params: any) {
  // ... pre-checks ...

  if (!params.cardNumber || !params.cvc) {
    // Throw specific validation error
    throw new ValidationError('Missing card details', {
      issues: [{ message: 'Card number and CVC are required' }]
    });
  }

  try {
    const response = await paymentGateway.charge(params);
    if (response.code === 'CARD_DECLINED') {
      // Throw NonRetryableError for permanent declines
      throw new NonRetryableError('Card declined by issuer', {
        code: response.code,
        cause: response.error // Optionally chain the original error
       });
    }
    if (!response.success) {
      // Throw generic error for potentially transient gateway issues (will be retried)
      throw new Error(`Payment gateway error: ${response.message}`);
    }
    return response.transactionId;
  } catch(error) {
    // Handle or re-throw other network/unexpected errors
    if (error instanceof NonRetryableError) throw error; // Don't wrap non-retryable
    throw new Error('Failed to communicate with payment gateway', { cause: error });
  }
}

Error Monitoring

Effective monitoring is key to understanding workflow health.

Use the built-in flow.error(), flow.warn() methods within your workflow logic, especially in catch blocks, to log detailed contextual information when errors occur. These logs are associated with the specific workflow instance and step, making debugging easier via the GraphQL API or engine logs.

export default defineWorkflow('process-order', async (flow) => {
  try {
    await flow.do('process payment', async () => {
      return await processPayment(flow.params);
    });
  } catch (error) {
    // Log detailed error information
    flow.error('Payment processing failed', {
      orderId: flow.params.orderId,
      error: error.message,
      code: error.code, // Include custom codes if available
      stack: error.stack,
      retryable: !(error instanceof NonRetryableError), // Indicate if it was considered retryable
      context: {
        /* Any relevant context */
      },
    });
    // Decide whether to re-throw to fail the workflow
    throw error;
  }
});

Advanced Observability (OpenTelemetry)

The IdentityFlow SDK includes re-exports from the @opentelemetry/api package under @identity-flow/sdk/telemetry. For advanced use cases, you can leverage these APIs to create custom spans or interact with the trace context propagated through flow.instance.meta. Setting up and exporting this telemetry data requires integration with an OpenTelemetry collector and backend. Refer to the OpenTelemetry Documentation for more details on instrumenting applications.

Best Practices

Classify Errors

Use NonRetryableError for permanent failures.
Throw standard Error for transient issues.
Use ValidationError for data issues.
Catch specific errors using type guards.

Configure Retries

Set appropriate retry limits via options.
Use exponential backoff for external services.
Disable retries ({ retries: false }) when needed. See Retry Policies

Log Contextually

Use flow.error and flow.warn in catch blocks.
Include relevant parameters and error details.
Log decisions made during error handling.

Monitor Systematically

Query failed workflow instances via API.
Analyze logs for patterns.
Integrate with external monitoring tools using logs or OpenTelemetry. See Observability

Next Steps

Retry Policies

Learn more about configuring retry strategies.

Observability

Dive deeper into monitoring and tracing.

Testing

Master testing strategies.