We are developing a system in which we have to collect 10 Million records every day. We need a database solution, NoSQL. data is simple logs. We are using AWS for now. I want to know the cheaper solution from both available techs. Amazon S3 or MongoDB.
We have 30 Tables that are collecting these logs.
I am a big fan of MongoDB and It's great for document storage but I am not really sure that it's the best engine for log storage. If data that you store is "flat" and well-defined than log storage based on engines like Clickhouse or Elasticsearch stach could be much more efficient. Also it's quite important how you reuse collected logs. Do you calculate aggregated metrics? Do you need full search ? And so on.
If logs are really simple and full text search needed than Logstash + Elasticsearch. If you need to calculate a lot of metrics and logs are not just text, but include numbers/values needed for aggregation than Clickhouse.
The way I'd approach this is to carry out a survey. Prioritise a list of important criteria, such as performance, functionality, and cost. For example with MongoDB you can archive documents if the data not immediately required to save on costs at the expense of instant access, but if that fits your use case model then you can use that feature. So create a use case test project that actually uses both services as per your use case and see for yourself the results of the tests. Along the way you'll encounter issues perculiar to each platform that you can factor into your final decision, such as comparing how easy it is to use their API, or that the documentation is sparce or confusing. From there you'll have an informed decision and you'll be confident investing further resources into it.
If you use Amazon DocumentDB instead of DynamoDB, it is compatible with the MongoDB API. That will keep your code cloud agnostic and you have option of switching between DynamoDB and MongoDB in the future based on whichever ends up being cheapest to run.