Needs advice
Amazon S3Amazon S3

Hi, I'm building a machine learning pipelines to store image bytes and image vectors in the backend.

So, when users query for the random access image data (key), we return the image bytes and perform machine learning model operations on it.

I'm currently considering going with Amazon S3 (in the future, maybe add Redis caching layer) as the backend system to store the information (s3 buckets with sharded prefixes).

As the latency of S3 is 100-200ms (get/put) and it has a high throughput of 3500 puts/sec and 5500 gets/sec for a given bucker/prefix. In the future I need to reduce the latency, I can add Redis cache.

Also, s3 costs are way fewer than HBase (on Amazon EC2 instances with 3x replication factor)

I have not personally used HBase before, so can someone help me if I'm making the right choice here? I'm not aware of Hbase latencies and I have learned that the MOB feature on Hbase has to be turned on if we have store image bytes on of the column families as the avg image bytes are 240Kb.

4 upvotes·48.3K views
Avatar of Ram Josh