S3 Replication: add functionality

S3 Replication is, on its surface, a very, very cool feature. However, we have found that we cannot use it because it is lacking in one important area: deletion on the target side.

The name “Replication” implies that the system will be making a perfect copy on the target side with some regularity. Unfortunately, since the S3 Replication feature will not delte items from the target that have been deleted from the source, this is not the case. It seems more like a single-shot solution–copy the data once, and assume nothing is ever deleted.

If we look at this over the long term, it becomes an exceedingly expensive proposition, even if you’re using on-prem S3 storage. The data will keep growing. I shudder to think how expensive it would be if one was using Amazon S3.

Additionally, if one needed to restore from that replica, one could not get back an exact copy–instead it would be full of cruft that had been previously deleted or modified.

This limitation makes the feature useless for our purposes.

2 Likes

Pinging @eyal.traitel and @tomer.hagay for this!

Thanks for the feedback @carlilek!
You’re absolutely right that the current Snap-to-S3 feature is suitable for long-term retention rather than strict replication. That said, we do see customers working around this by rotating destination buckets. For example, snapping to Bucket A for a defined period, then switching to Bucket B and retiring A as needed - this process can be automated if needed.

We’re also actively working on a next-generation solution, and I’d be happy to share more details or get your input on that direction via Zoom if you’re open to it.

Regarding recovery: just to clarify, Snap-to-S3 supports browsing and recovering any historical version within the retained snapshots. If you’ve encountered a specific limitation, I’d love to understand it better to ensure we’re aligned or explore whether it’s already been addressed.

2 Likes

Hi Tomer,

I understand the browsing and recovery of historical version in the snapshots, but my concern is excess data growth on the target. We generally have pretty high churn on our storage that we would be considering for replication, and we have a finite amount of on prem-in-a-different-datacenter S3 storage to replicate it to. Since the replication doesn’t take deletions into account (or rather, doesn’t do deletions on the target side) we would fairly quickly exceed the space we have because of all that churn.

There are other issues that probably will result in us not using this functionality (the inability to restore without a Vast cluster of some kind is a big one, and yes, I’m aware we could spin up a virtual cluster in the cloud or something). However I would like to see this as a a possibility. Being able to replicate Vast databases would be awesome, and there I’m a lot less worried about the data growth, since they’re, well, much smaller than the main data storage.

Hi Ken, thanks for the added context and explanation.
As I mentioned earlier, we’re working on a next-generation solution that I believe will align well with your goals. While I can’t share roadmap details in a public forum, I’d be happy to continue the conversation in more detail.
I’ll follow up with you directly on our support Slack to take it from here.