Skip to content

Reliable SFTP Integration with Apache Camel: Idempotency, Retries & Dead Letters

Parsing a file correctly is the easy half. The hard half — the part that pages you at 2am — is everything around it: the same file getting picked up twice, a downstream system being briefly down, a malformed record that should be quarantined rather than crashing the whole batch. Here’s how I make Apache Camel file routes survive the real world.

1. Don’t process the same file twice

SFTP polling is at-least-once by nature. A connection drops after you’ve read a file but before you’ve moved it, and the next poll sees it again. Camel’s idempotent consumer prevents reprocessing by remembering what it has already seen:

from("sftp://user@bank-host:22/outbound/ach"
        + "?password=RAW({{bank.sftp.password}})"
        + "&readLock=idempotent-changed"
        + "&idempotentRepository=#fileIdempotentRepo"
        + "&move=.done&moveFailed=.error")
    .routeId("ach-inbound")
    .idempotentConsumer(header("CamelFileName"))
      .idempotentRepository("fileIdempotentRepo")
    .log("Processing new file ${header.CamelFileName}")
    .to("direct:processAch");

For a single node an in-memory repo is fine, but it forgets on restart. In production — especially across multiple instances — back it with something shared:

@Bean
IdempotentRepository fileIdempotentRepo(DataSource ds) {
  // Survives restarts and is safe across multiple app instances.
  return JdbcMessageIdRepository.jdbcMessageIdRepository(ds, "ach-inbound");
}

Use the file name plus a content hash as the key if a filename can legitimately recur with new contents.

2. Retry transient failures, don’t retry bugs

When the downstream core-banking API blips, you want to back off and retry — not fail the batch. A redelivery policy with exponential backoff handles that:

errorHandler(defaultErrorHandler()
    .maximumRedeliveries(5)
    .redeliveryDelay(2000)
    .backOffMultiplier(2)            // 2s, 4s, 8s, 16s, 32s
    .retryAttemptedLogLevel(LoggingLevel.WARN));

The key discipline: only transient errors should be retryable. A SocketTimeoutException deserves a retry; a RecordValidationException does not — retrying it just burns five attempts on data that will never parse.

3. Quarantine poison messages with a dead-letter channel

Records that can’t be processed shouldn’t block the rest of the file or disappear. Route them to a dead-letter destination for inspection and reprocessing:

onException(RecordValidationException.class)
    .handled(true)                   // don't kill the route
    .maximumRedeliveries(0)          // no point retrying bad data
    .to("sftp://user@bank-host:22/quarantine")
    .log(LoggingLevel.ERROR,
         "Quarantined bad record from ${header.CamelFileName}: ${exception.message}");

Pair this with per-record splitting and stopOnException(false) so one bad row doesn’t abort the other 999,999 good ones:

.split(body()).streaming().stopOnException(false)
  .to("bean:achPostingService")
.end();

4. Make failures visible

Silent integrations are dangerous integrations. A few cheap habits pay off:

  • Stable routeIds so logs and metrics are searchable per route.
  • Camel’s Micrometer support for throughput, in-flight, and failure counts, surfaced in your dashboards.
  • Alert on the quarantine folder — anything landing there means a human should look.

The shape of a resilient route

Put together, every robust file route I build has the same skeleton:

pick up safely (read locks) → dedupe (idempotent consumer) → parsesplit per record (don’t fail the batch) → process with bounded retriesquarantine poison dataarchive & emit metrics.

Get those seven moves right and a file integration stops being a source of 2am pages and becomes the boring, reliable plumbing it’s supposed to be.

If you want the parsing side — modelling fixed-length and ACH/NACHA records with Bindy — see Parsing Fixed-Length and Dynamic Banking Files with Apache Camel.