Intelligent Retry and Circuit-Breaker Support in OpenAI .NET SDK #823
Replies: 2 comments
-
|
Following up after some testing in production-like scenarios. I’ve validated that adding a simple retry wrapper using Polly improves resiliency without breaking existing SDK behavior. If the maintainers are open to this enhancement, I can share a minimal proof-of-concept or contribute a draft design doc for potential integration. Appreciate any feedback on whether this aligns with the current SDK roadmap. |
Beta Was this translation helpful? Give feedback.
-
|
Adding additional context to support the earlier proposal on improving streaming reliability for high-frequency workloads. In enterprise-grade .NET environments, we frequently observe gaps in end-to-end observability during streaming interactions, especially when multiple downstream operations are chained to the same response stream. The primary issue appears when the token stream does not surface intermediate telemetry hooks that would allow developers to correlate partial responses with the corresponding activity IDs. This becomes more evident in multi-service orchestrations where distributed tracing tools depend on consistent event boundaries. A possible enhancement would be to expose a lightweight telemetry callback or structured event surface in the streaming pipeline. This would allow client applications to attach correlation IDs, capture latency patterns, and instrument downstream logic without altering the core SDK behavior. Even a minimal hook would significantly improve traceability and error diagnostics for real-time workloads. Sharing this additional insight here for completeness, since many modern financial and compliance-focused applications depend on deterministic streaming and robust observability |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Context
In enterprise-grade workloads, transient network failures, throttling (HTTP 429), or model-side latency spikes can occur under heavy traffic. While the OpenAI .NET SDK currently surfaces these as exceptions, many enterprise integrations require automated recovery and graceful degradation patterns to maintain system reliability.
To improve resiliency and enable broader adoption across mission-critical systems, it may be beneficial to introduce built-in retry and circuit-breaker policies within the SDK.
Motivation
Typical enterprise applications (for example, financial systems, real-time dashboards, or AI-assisted workflows) execute OpenAI API calls inside microservice or workflow orchestration layers.
Implementing consistent fault-tolerance logic across all services becomes repetitive and error-prone.
Embedding configurable retry and circuit-breaker support in the SDK could:
Proposed Design
A high-level pattern might include:
Benefits
Implementation Considerations
Target Audience
Agentic API Sentry is designed for professionals and teams operating in high-stakes, compliance-sensitive environments where API reliability, security, and auditability are non-negotiable. Key audiences include:
This tool is especially relevant for organizations that prioritize immutability, auditability, and runtime predictability in their API lifecycle.
Closing Note
This feature would significantly strengthen the SDK’s readiness for enterprise and real-time workloads by standardizing fault-tolerant communication patterns.
I’m sharing this idea to invite feedback from the maintainers and community.
If aligned with the SDK roadmap, I would be happy to collaborate or contribute a design proposal to explore implementation options.
GitHub repo: https://github.com/MahendhiranK
LinkedIn: Mahendhiran
Beta Was this translation helpful? Give feedback.
All reactions