Need advice about which tool to choose?Ask the StackShare community!

Apache Kudu

Stacks72

Followers259

+ 1

Votes10

Microsoft SQL Server

Stacks20K

Followers15.4K

+ 1

Votes540

Add tool

Kudu vs Microsoft SQL Server: What are the differences?

Introduction: Here are the key differences between Kudu and Microsoft SQL Server.

Storage Model: Kudu stores data in columnar format, optimized for fast analytics queries, while Microsoft SQL Server stores data in row-based format, suitable for transactional workloads.
Query Language: Kudu uses Apache Impala for querying data, which supports SQL-like queries for faster data retrieval, whereas Microsoft SQL Server uses T-SQL, a powerful query language with a wide range of functionalities.
Scalability: Kudu is designed for horizontal scalability, allowing users to easily add more nodes to increase performance and capacity, whereas Microsoft SQL Server's scalability is limited by the capacity of a single server.
Real-Time Processing: Kudu is well-suited for real-time processing and stream processing applications, with features like Write-Ahead Logging (WAL) for durability and reliability, while Microsoft SQL Server is more focused on traditional batch processing.
Data Consistency: Kudu provides strong consistency guarantees with its MVCC (Multi-Version Concurrency Control) implementation, ensuring that the data is always in a consistent state, whereas Microsoft SQL Server offers different levels of isolation depending on the transaction requirements.
Integration with Ecosystem: Kudu seamlessly integrates with the Apache Hadoop ecosystem, allowing for easy integration with tools like Apache Spark, Apache Hive, and Apache Flume, while Microsoft SQL Server has its ecosystem of tools and services, making it easier to work with other Microsoft products.

In Summary, Kudu and Microsoft SQL Server differ in terms of storage model, query language, scalability, real-time processing capabilities, data consistency, and integration with different ecosystems.

Advice on Apache Kudu and Microsoft SQL Server

Erin G

IT Specialist · Mar 10, 2020 | 7 upvotes · 649.4K views

Needs advice

Microsoft SQL Server

MySQL

and

PostgreSQL

I am a Microsoft SQL Server programmer who is a bit out of practice. I have been asked to assist on a new project. The overall purpose is to organize a large number of recordings so that they can be searched. I have an enormous music library but my songs are several hours long. I need to include things like time, date and location of the recording. I don't have a problem with the general database design. I have two primary questions:

I need to use either MySQL or PostgreSQL on a Linux based OS. Which would be better for this application?
I have not dealt with a sound based data type before. How do I store that and put it in a table? Thank you.

Replies (6)

nrktkt

Mar 13, 2020 | 7 upvotes · 496.4K views

Recommends

Backblaze B2 Cloud Storage

Hi Erin,

Honestly both databases will do the job just fine. I personally prefer Postgres.

Much more important is how you store the audio. While you could technically use a blob type column, it's really not ideal to be storing audio files which are "several hours long" in a database row. Instead consider storing the audio files in an object store (hosted options include backblaze b2 or aws s3) and persisting the key (which references that object) in your database column.

Aaron Westley

COO at Pattern · Mar 13, 2020 | 5 upvotes · 496.3K views

Recommends

PostgreSQL

Hi Erin, Chances are you would want to store the files in a blob type. Both MySQL and Postgres support this. Can you explain a little more about your need to store the files in the database? I may be more effective to store the files on a file system or something like S3. To answer your qustion based on what you are descibing I would slighly lean towards PostgreSQL since it tends to be a little better on the data warehousing side.

Julien DeFrance

Principal Software Engineer at Tophatter · Mar 13, 2020 | 3 upvotes · 494.9K views

Recommends

Amazon Aurora

Hi Erin! First of all, you'd probably want to go with a managed service. Don't spin up your own MySQL installation on your own Linux box. If you are on AWS, thet have different offerings for database services. Standard RDS vs. Aurora. Aurora would be my preferred choice given the benefits it offers, storage optimizations it comes with... etc. Such managed services easily allow you to apply new security patches and upgrades, set up backups, replication... etc. Doing this on your own would either be risky, inefficient, or you might just give up. As far as which database to chose, you'll have the choice between Postgresql, MySQL, Maria DB, SQL Server... etc. I personally would recommend MySQL (latest version available), as the official tooling for it (MySQL Workbench) is great, stable, and moreover free. Other database services exist, I'd recommend you also explore Dynamo DB.

Regardless, you'd certainly only keep high-level records, meta data in Database, and the actual files, most-likely in S3, so that you can keep all options open in terms of what you'll do with them.

Christopher Wray

Web Developer at Soltech LLC · Mar 12, 2020 | 3 upvotes · 495.3K views

Recommends

Directus

Soltech LLC

Hey Erin! I would recommend checking out Directus before you start work on building your own app for them. I just stumbled upon it, and so far extremely happy with the functionalities. If your client is just looking for a simple web app for their own data, then Directus may be a great option. It offers "database mirroring", so that you can connect it to any database and set up functionality around it!

pivert

Mar 14, 2020 | 2 upvotes · 495.1K views

Recommends

PostgreSQL

Hi Erin,

Coming from "Big" DB engines, such as Oracle or MSSQL, go for PostgreSQL. You'll get all the features you need with PostgreSQL.
Your case seems to point to a "NoSQL" or Document Database use case. Since you get covered on this with PostgreSQL which achieves excellent performances on JSON based objects, this is a second reason to choose PostgreSQL. MongoDB might be an excellent option as well if you need "sharding" and excellent map-reduce mechanisms for very massive data sets. You really should investigate the NoSQL option for your use case.
Starting with AWS Aurora is an excellent advise. since "vendor lock-in" is limited, but I did not check for JSON based object / NoSQL features.
If you stick to Linux server, the PostgreSQL or MySQL provided with your distribution are straightforward to install (i.e. apt install postgresql). For PostgreSQL, make sure you're comfortable with the pg_hba.conf, especially for IP restrictions & accesses.

Regards,

Klaus Nji

Staff Software Engineer at SailPoint Technologies · Mar 15, 2020 | 1 upvotes · 495K views

Recommends

PostgreSQL

I recommend Postgres as well. Superior performance overall and a more robust architecture.

Manage your open source components, licenses, and vulnerabilities

Learn More

Pros of Apache Kudu

Pros of Microsoft SQL Server

10
Realtime Analytics

139
Reliable and easy to use
101
High performance
95
Great with .net
65
Works well with .net
56
Easy to maintain
21
Azure support
17
Always on
17
Full Index Support
10
Enterprise manager is fantastic
9
In-Memory OLTP Engine
2
Easy to setup and configure
2
Security is forefront
1
Great documentation
1
Faster Than Oracle
1
Columnstore indexes
1
Decent management tools
1
Docker Delivery
1
Max numar of connection is 14000

Sign up to add or upvote prosMake informed product decisions

Cons of Apache Kudu

Cons of Microsoft SQL Server

1
Restart time

4
Expensive Licensing
2
Microsoft
1
Data pages is only 8k
1
Allwayon can loose data in asycronious mode
1
Replication can loose the data
1
The maximum number of connections is only 14000 connect

Sign up to add or upvote consMake informed product decisions

133

22K

202.8K

828

282

- No public GitHub repository available -

What is Apache Kudu?

A new addition to the open source Apache Hadoop ecosystem, Kudu completes Hadoop's storage layer to enable fast analytics on fast data.

What is Microsoft SQL Server?

Microsoft® SQL Server is a database management and analysis system for e-commerce, line-of-business, and data warehousing solutions.

Need advice about which tool to choose?Ask the StackShare community!

What companies use Apache Kudu?

What companies use Microsoft SQL Server?

Manage your open source components, licenses, and vulnerabilities

Learn More

Sign up to get full access to all the companiesMake informed product decisions

What tools integrate with Apache Kudu?

What tools integrate with Microsoft SQL Server?

Sign up to get full access to all the tool integrationsMake informed product decisions

What are some alternatives to Apache Kudu and Microsoft SQL Server?

Cassandra

Partitioning means that Cassandra can distribute your data across multiple machines in an application-transparent matter. Cassandra will automatically repartition as machines are added and removed from the cluster. Row store means that like relational databases, Cassandra organizes data by rows and columns. The Cassandra Query Language (CQL) is a close relative of SQL.

HBase

Apache HBase is an open-source, distributed, versioned, column-oriented store modeled after Google' Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtable-like capabilities on top of Apache Hadoop.

Apache Spark

Spark is a fast and general processing engine compatible with Hadoop data. It can run in Hadoop clusters through YARN or Spark's standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat. It is designed to perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning.

Apache Impala

Impala is a modern, open source, MPP SQL query engine for Apache Hadoop. Impala is shipped by Cloudera, MapR, and Amazon. With Impala, you can query data, whether stored in HDFS or Apache HBase – including SELECT, JOIN, and aggregate functions – in real time.

Hadoop

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.

See all alternatives

Apache Kudu vs Microsoft SQL Server

Need advice about which tool to choose?Ask the StackShare community!

Kudu vs Microsoft SQL Server: What are the differences?

Pros of Apache Kudu

Pros of Microsoft SQL Server

Sign up to add or upvote prosMake informed product decisions

Cons of Apache Kudu

Cons of Microsoft SQL Server

Sign up to add or upvote consMake informed product decisions

What is Apache Kudu?

What is Microsoft SQL Server?

Need advice about which tool to choose?Ask the StackShare community!

What companies use Apache Kudu?

What companies use Microsoft SQL Server?

Sign up to get full access to all the companiesMake informed product decisions

What tools integrate with Apache Kudu?

What tools integrate with Microsoft SQL Server?

Sign up to get full access to all the tool integrationsMake informed product decisions

Related Comparisons

Trending Comparisons

Top Comparisons