Every app you've built is an ETL pipeline (you just didn't call it that)
Summary
ETL pipelines often become complex with custom orchestration. Inngest simplifies this with step-level durability, automatic retries, and built-in observability, letting you focus on product features instead of infrastructure.
The hidden cost of building ETL pipelines
Every web application with a background job is secretly an ETL pipeline. That admin CSV upload feature extracts file contents, transforms rows, and loads them into a database. A Stripe webhook handler does the same.
These systems, whether called "integrations" or "sync jobs," all face the same core problems. They require status tracking, retry logic, and idempotency checks. With AI features becoming ubiquitous, this pattern is exploding.
AI-powered features are often just ETL with a non-deterministic step. Extracting a support ticket, transforming it with an LLM for categorization, and loading the result is a classic ETL cycle. The critical difference is that the transform can fail unpredictably.
The evolutionary trap of a simple job
Most projects start with a clean, three-line function. For example, a job that fetches support thread replies and uses an LLM to categorize them.
This works until it doesn't. Without tracking, you have no visibility into failures. The first fix is adding a status column to the database and wrapping the logic in try-catch blocks.
This creates new problems. A failing job might retry indefinitely, creating duplicate data and crashing systems. The next fix adds idempotency checks, deduplication logic, and attempt counters.
The three-line function balloons into a complex, unmaintainable script. Your domain model becomes bloated with pipeline state. You are now maintaining a bespoke orchestration system instead of building your product.
Observability becomes a new project
This hand-rolled complexity creates a black box. Only a developer can diagnose why a specific support thread failed to categorize.
Your operations team cannot see which records are stuck. Your Director of Customer Support has no way to check system health. You inevitably need to build an observability dashboard so non-technical users can understand data flow.
That dashboard is an entire project. Each incremental fix—status tracking, retries, idempotency—is reasonable alone. Together, they pull you deep into the infrastructure business.
Step-level durability changes the game
A platform like Inngest approaches this with step-level durability. The same categorization function is rebuilt using discrete, checkpointed steps.
Each step.run is a durable checkpoint. If the LLM call fails, the system retries only that step. It does not re-fetch the replies or re-run the entire function.
This eliminates whole classes of bugs. The state of the pipeline is tracked by the platform, not your domain model. There is no hand-rolled idempotency or retry logic.
- The support thread model represents a support thread, not a job status.
- Failed retries cannot duplicate data because completed steps are saved.
- Every function run is visible in a dashboard, showing which step failed and why.
The power of event-driven fan-out
The architecture enables clean, event-driven growth. The function is triggered by an event like support/thread.created.
The function doesn't know what creates threads, and the creator doesn't know the function exists. They are decoupled. When you need new features, you add new functions that listen to the same event.
You can have separate functions for categorization, generating a suggested response, updating analytics, and checking for billing escalation. This is fan-out: one event, multiple independent reactions.
Adding new behavior means deploying a new function, not modifying complex, existing code. Each function has its own retry logic and failure isolation. If analytics breaks, categorization keeps working.
Building without the infrastructure tax
This is the architecture every bespoke ETL system eventually needs. The problem isn't knowing event-driven is right; it's that building the infrastructure is a massive undertaking.
Deadlines and priorities often lead to a compromised, half-built system where spaghetti code wins. A platform like Inngest provides the clean version from day one.
The biggest win for product owners is the built-in observability. Teams often spend multiple quarters building this themselves. The complexity you only cared about because you had to simply disappears.
You can focus on your product's domain instead of orchestration plumbing. An entire category of hard problems becomes boring, which is the best possible outcome.
Related Articles
Python virtual environments: isolation without the chaos
Use local virtual environments to isolate Python project dependencies, preventing version conflicts and ensuring each project runs reliably with its own packages.
The “funhouse mirror”: How AI reflects the hidden truths of your software pipeline
AI accelerates software development but can amplify existing problems, turning speed into technical debt without proper guardrails and fundamentals.
Stay in the loop
Get the best AI-curated news delivered to your inbox. No spam, unsubscribe anytime.
