Need advice about which tool to choose?Ask the StackShare community!
Amazon Redshift vs Microsoft SQL Server: What are the differences?
What is Amazon Redshift? Fast, fully managed, petabyte-scale data warehouse service. Redshift makes it simple and cost-effective to efficiently analyze all your data using your existing business intelligence tools. It is optimized for datasets ranging from a few hundred gigabytes to a petabyte or more and costs less than $1,000 per terabyte per year, a tenth the cost of most traditional data warehousing solutions.
What is Microsoft SQL Server? A relational database management system developed by Microsoft. Microsoft® SQL Server is a database management and analysis system for e-commerce, line-of-business, and data warehousing solutions.
Amazon Redshift can be classified as a tool in the "Big Data as a Service" category, while Microsoft SQL Server is grouped under "Databases".
"Data Warehousing" is the primary reason why developers consider Amazon Redshift over the competitors, whereas "Reliable and easy to use" was stated as the key factor in picking Microsoft SQL Server.
Stack Exchange, Microsoft, and MIT are some of the popular companies that use Microsoft SQL Server, whereas Amazon Redshift is used by Lyft, Coursera, and 9GAG. Microsoft SQL Server has a broader approval, being mentioned in 478 company stacks & 443 developers stacks; compared to Amazon Redshift, which is listed in 269 company stacks and 67 developer stacks.
We need to perform ETL from several databases into a data warehouse or data lake. We want to
- keep raw and transformed data available to users to draft their own queries efficiently
- give users the ability to give custom permissions and SSO
- move between open-source on-premises development and cloud-based production environments
We want to use inexpensive Amazon EC2 instances only on medium-sized data set 16GB to 32GB feeding into Tableau Server or PowerBI for reporting and data analysis purposes.
You could also use AWS Lambda and use Cloudwatch event schedule if you know when the function should be triggered. The benefit is that you could use any language and use the respective database client.
But if you orchestrate ETLs then it makes sense to use Apache Airflow. This requires Python knowledge.
Though we have always built something custom, Apache airflow (https://airflow.apache.org/) stood out as a key contender/alternative when it comes to open sources. On the commercial offering, Amazon Redshift combined with Amazon Kinesis (for complex manipulations) is great for BI, though Redshift as such is expensive.
You may want to look into a Data Virtualization product called Conduit. It connects to disparate data sources in AWS, on prem, Azure, GCP, and exposes them as a single unified Spark SQL view to PowerBI (direct query) or Tableau. Allows auto query and caching policies to enhance query speeds and experience. Has a GPU query engine and optimized Spark for fallback. Can be deployed on your AWS VM or on prem, scales up and out. Sounds like the ideal solution to your needs.
I am a Microsoft SQL Server programmer who is a bit out of practice. I have been asked to assist on a new project. The overall purpose is to organize a large number of recordings so that they can be searched. I have an enormous music library but my songs are several hours long. I need to include things like time, date and location of the recording. I don't have a problem with the general database design. I have two primary questions:
- I need to use either MySQL or PostgreSQL on a Linux based OS. Which would be better for this application?
- I have not dealt with a sound based data type before. How do I store that and put it in a table? Thank you.
Honestly both databases will do the job just fine. I personally prefer Postgres.
Much more important is how you store the audio. While you could technically use a blob type column, it's really not ideal to be storing audio files which are "several hours long" in a database row. Instead consider storing the audio files in an object store (hosted options include backblaze b2 or aws s3) and persisting the key (which references that object) in your database column.
Hi Erin, Chances are you would want to store the files in a blob type. Both MySQL and Postgres support this. Can you explain a little more about your need to store the files in the database? I may be more effective to store the files on a file system or something like S3. To answer your qustion based on what you are descibing I would slighly lean towards PostgreSQL since it tends to be a little better on the data warehousing side.
Hey Erin! I would recommend checking out Directus before you start work on building your own app for them. I just stumbled upon it, and so far extremely happy with the functionalities. If your client is just looking for a simple web app for their own data, then Directus may be a great option. It offers "database mirroring", so that you can connect it to any database and set up functionality around it!
Hi Erin! First of all, you'd probably want to go with a managed service. Don't spin up your own MySQL installation on your own Linux box. If you are on AWS, thet have different offerings for database services. Standard RDS vs. Aurora. Aurora would be my preferred choice given the benefits it offers, storage optimizations it comes with... etc. Such managed services easily allow you to apply new security patches and upgrades, set up backups, replication... etc. Doing this on your own would either be risky, inefficient, or you might just give up. As far as which database to chose, you'll have the choice between Postgresql, MySQL, Maria DB, SQL Server... etc. I personally would recommend MySQL (latest version available), as the official tooling for it (MySQL Workbench) is great, stable, and moreover free. Other database services exist, I'd recommend you also explore Dynamo DB.
Regardless, you'd certainly only keep high-level records, meta data in Database, and the actual files, most-likely in S3, so that you can keep all options open in terms of what you'll do with them.
- Coming from "Big" DB engines, such as Oracle or MSSQL, go for PostgreSQL. You'll get all the features you need with PostgreSQL.
- Your case seems to point to a "NoSQL" or Document Database use case. Since you get covered on this with PostgreSQL which achieves excellent performances on JSON based objects, this is a second reason to choose PostgreSQL. MongoDB might be an excellent option as well if you need "sharding" and excellent map-reduce mechanisms for very massive data sets. You really should investigate the NoSQL option for your use case.
- Starting with AWS Aurora is an excellent advise. since "vendor lock-in" is limited, but I did not check for JSON based object / NoSQL features.
- If you stick to Linux server, the PostgreSQL or MySQL provided with your distribution are straightforward to install (i.e. apt install postgresql). For PostgreSQL, make sure you're comfortable with the pg_hba.conf, especially for IP restrictions & accesses.
I recommend Postgres as well. Superior performance overall and a more robust architecture.
Cloud Data-warehouse is the centerpiece of modern Data platform. The choice of the most suitable solution is therefore fundamental.
Our benchmark was conducted over BigQuery and Snowflake. These solutions seem to match our goals but they have very different approaches.
BigQuery is notably the only 100% serverless cloud data-warehouse, which requires absolutely NO maintenance: no re-clustering, no compression, no index optimization, no storage management, no performance management. Snowflake requires to set up (paid) reclustering processes, to manage the performance allocated to each profile, etc. We can also mention Redshift, which we have eliminated because this technology requires even more ops operation.
BigQuery can therefore be set up with almost zero cost of human resources. Its on-demand pricing is particularly adapted to small workloads. 0 cost when the solution is not used, only pay for the query you're running. But quickly the use of slots (with monthly or per-minute commitment) will drastically reduce the cost of use. We've reduced by 10 the cost of our nightly batches by using flex slots.
Finally, a major advantage of BigQuery is its almost perfect integration with Google Cloud Platform services: Cloud functions, Dataflow, Data Studio, etc.
BigQuery is still evolving very quickly. The next milestone, BigQuery Omni, will allow to run queries over data stored in an external Cloud platform (Amazon S3 for example). It will be a major breakthrough in the history of cloud data-warehouses. Omni will compensate a weakness of BigQuery: transferring data in near real time from S3 to BQ is not easy today. It was even simpler to implement via Snowflake's Snowpipe solution.
We also plan to use the Machine Learning features built into BigQuery to accelerate our deployment of Data-Science-based projects. An opportunity only offered by the BigQuery solution
Pros of Amazon Redshift
- Data Warehousing40
- Backed by Amazon14
- Cheap and reliable1
- Best Cloud DW Performance1
- Fast columnar storage1
Pros of Microsoft SQL Server
- Reliable and easy to use139
- High performance102
- Great with .net95
- Works well with .net65
- Easy to maintain56
- Azure support21
- Full Index Support17
- Always on17
- Enterprise manager is fantastic10
- In-Memory OLTP Engine9
- Easy to setup and configure2
- Security is forefront2
- Faster Than Oracle1
- Decent management tools1
- Great documentation1
- Docker Delivery1
- Columnstore indexes1
Sign up to add or upvote prosMake informed product decisions
Cons of Amazon Redshift
Cons of Microsoft SQL Server
- Expensive Licensing4
Sign up to add or upvote consMake informed product decisions
What is Amazon Redshift?
What is Microsoft SQL Server?
Need advice about which tool to choose?Ask the StackShare community!
What companies use Microsoft SQL Server?
Sign up to get full access to all the companiesMake informed product decisions
What tools integrate with Microsoft SQL Server?
Sign up to get full access to all the tool integrationsMake informed product decisions