Every app you've built is an ETL pipeline (you just didn't call it that)

The hidden cost of building ETL pipelines

Every web application with a background job is secretly an ETL pipeline. That admin CSV upload feature extracts file contents, transforms rows, and loads them into a database. A Stripe webhook handler does the same.

These systems, whether called "integrations" or "sync jobs," all face the same core problems. They require status tracking, retry logic, and idempotency checks. With AI features becoming ubiquitous, this pattern is exploding.

AI-powered features are often just ETL with a non-deterministic step. Extracting a support ticket, transforming it with an LLM for categorization, and loading the result is a classic ETL cycle. The critical difference is that the transform can fail unpredictably.

The evolutionary trap of a simple job

Most projects start with a clean, three-line function. For example, a job that fetches support thread replies and uses an LLM to categorize them.

This works until it doesn't. Without tracking, you have no visibility into failures. The first fix is adding a status column to the database and wrapping the logic in try-catch blocks.

This creates new problems. A failing job might retry indefinitely, creating duplicate data and crashing systems. The next fix adds idempotency checks, deduplication logic, and attempt counters.

The three-line function balloons into a complex, unmaintainable script. Your domain model becomes bloated with pipeline state. You are now maintaining a bespoke orchestration system instead of building your product.

Observability becomes a new project

This hand-rolled complexity creates a black box. Only a developer can diagnose why a specific support thread failed to categorize.

Your operations team cannot see which records are stuck. Your Director of Customer Support has no way to check system health. You inevitably need to build an observability dashboard so non-technical users can understand data flow.

That dashboard is an entire project. Each incremental fix—status tracking, retries, idempotency—is reasonable alone. Together, they pull you deep into the infrastructure business.

Step-level durability changes the game

A platform like Inngest approaches this with step-level durability. The same categorization function is rebuilt using discrete, checkpointed steps.

Each step.run is a durable checkpoint. If the LLM call fails, the system retries only that step. It does not re-fetch the replies or re-run the entire function.

This eliminates whole classes of bugs. The state of the pipeline is tracked by the platform, not your domain model. There is no hand-rolled idempotency or retry logic.

The support thread model represents a support thread, not a job status.
Failed retries cannot duplicate data because completed steps are saved.
Every function run is visible in a dashboard, showing which step failed and why.

The power of event-driven fan-out

The architecture enables clean, event-driven growth. The function is triggered by an event like support/thread.created.

The function doesn't know what creates threads, and the creator doesn't know the function exists. They are decoupled. When you need new features, you add new functions that listen to the same event.

You can have separate functions for categorization, generating a suggested response, updating analytics, and checking for billing escalation. This is fan-out: one event, multiple independent reactions.

Adding new behavior means deploying a new function, not modifying complex, existing code. Each function has its own retry logic and failure isolation. If analytics breaks, categorization keeps working.

Building without the infrastructure tax

This is the architecture every bespoke ETL system eventually needs. The problem isn't knowing event-driven is right; it's that building the infrastructure is a massive undertaking.

Deadlines and priorities often lead to a compromised, half-built system where spaghetti code wins. A platform like Inngest provides the clean version from day one.

The biggest win for product owners is the built-in observability. Teams often spend multiple quarters building this themselves. The complexity you only cared about because you had to simply disappears.

You can focus on your product's domain instead of orchestration plumbing. An entire category of hard problems becomes boring, which is the best possible outcome.

Every app you've built is an ETL pipeline (you just didn't call it that)

The hidden cost of building ETL pipelines

The evolutionary trap of a simple job

Observability becomes a new project

Step-level durability changes the game

The power of event-driven fan-out

Building without the infrastructure tax

Related Articles

Python virtual environments: isolation without the chaos

The “funhouse mirror”: How AI reflects the hidden truths of your software pipeline

Stay in the loop

Beyond vibe coding: the case for spec-driven AI development

Related Articles

Python virtual environments: isolation without the chaos
Feb 20, 20263 min read

The “funhouse mirror”: How AI reflects the hidden truths of your software pipeline
Feb 20, 20263 min read

Beyond vibe coding: the case for spec-driven AI development