Skip to content

Error Handling & Recovery

Retry Policies

Configure sophisticated retry strategies with customizable limits, delays, and backoff patterns. Learn more

Error Classification

Distinguish between transient failures that can be retried and permanent failures that require intervention using custom error types.

Logging & Monitoring

Utilize built-in logging and OpenTelemetry for observing errors.

Understanding the nature of potential errors helps in designing appropriate handling strategies:

Transient Failures

Temporary issues that may resolve on retry:

  • Network timeouts
  • Temporary service unavailability
  • Rate limiting
  • Database deadlocks (These should generally be allowed to retry)

Permanent Failures

Issues that won’t be resolved by retrying:

  • Invalid input data (ValidationError)
  • Business rule violations (Throw NonRetryableError)
  • Authentication failures (Throw NonRetryableError)
  • Resource not found (Throw NonRetryableError) (These should often prevent retries)

System Errors

Unexpected system-level issues that might cause the workflow to fail:

  • Out of memory
  • Process crashes
  • Engine configuration errors (Often lead to a FAILED instance state)

Use standard try...catch blocks combined with the specific error types exported by @identity-flow/sdk for robust error handling.

import { defineWorkflow, NonRetryableError, ValidationError, isValidationError, isLockedError } from '@identity-flow/sdk';
export default defineWorkflow('process-payment', async (flow) => {
try {
await flow.do('charge card', {
retries: {
limit: 3,
delay: '30 seconds',
backoff: 'exponential'
}
}, async () => {
// This function might throw different error types
return await chargeCard(flow.params);
});
flow.log('Payment successful');
} catch (error) {
if (isValidationError(error)) {
// Handle data validation issues specifically
flow.warn('Payment failed due to validation error:', error.message, error.issues);
// Perhaps notify user or end workflow gracefully
return { status: 'VALIDATION*FAILED', issues: error.issues };
} else if (error instanceof NonRetryableError || isLockedError(error)) {
// Handle errors explicitly marked as non-retryable or locked resources
flow.error('Permanent error during payment:', error.message, { code: error.code });
// Perform cleanup if needed
await flow.do('cleanup-failed-payment', async () => { /* ... \_/ });
return { status: 'FAILED_PERMANENTLY', reason: error.message };
} else {
// Assume other errors are transient and might have been retried
// This block catches the error after retries are exhausted
flow.error('Payment failed after retries:', error.message);
// Rethrow to fail the workflow instance
throw error;
}
}
});

Effective monitoring is key to understanding workflow health.

Use the built-in flow.error(), flow.warn() methods within your workflow logic, especially in catch blocks, to log detailed contextual information when errors occur. These logs are associated with the specific workflow instance and step, making debugging easier via the GraphQL API or engine logs.

export default defineWorkflow('process-order', async (flow) => {
try {
await flow.do('process payment', async () => {
return await processPayment(flow.params);
});
} catch (error) {
// Log detailed error information
flow.error('Payment processing failed', {
orderId: flow.params.orderId,
error: error.message,
code: error.code, // Include custom codes if available
stack: error.stack,
retryable: !(error instanceof NonRetryableError), // Indicate if it was considered retryable
context: {
/* Any relevant context */
},
});
// Decide whether to re-throw to fail the workflow
throw error;
}
});

The IdentityFlow SDK includes re-exports from the @opentelemetry/api package under @identity-flow/sdk/telemetry. For advanced use cases, you can leverage these APIs to create custom spans or interact with the trace context propagated through flow.instance.meta. Setting up and exporting this telemetry data requires integration with an OpenTelemetry collector and backend. Refer to the OpenTelemetry Documentation for more details on instrumenting applications.

Classify Errors

  • Use NonRetryableError for permanent failures.
  • Throw standard Error for transient issues.
  • Use ValidationError for data issues.
  • Catch specific errors using type guards.

Configure Retries

  • Set appropriate retry limits via options.
  • Use exponential backoff for external services.
  • Disable retries ({ retries: false }) when needed. See Retry Policies

Log Contextually

  • Use flow.error and flow.warn in catch blocks.
  • Include relevant parameters and error details.
  • Log decisions made during error handling.

Monitor Systematically

  • Query failed workflow instances via API.
  • Analyze logs for patterns.
  • Integrate with external monitoring tools using logs or OpenTelemetry. See Observability