Alternatives to Qubole logo

Alternatives to Qubole

Databricks, Snowflake, Amazon Redshift, Google BigQuery, and Amazon EMR are the most popular alternatives and competitors to Qubole.
13
21
+ 1
50

What is Qubole and what are its top alternatives?

Qubole is a cloud based service that makes big data easy for analysts and data engineers.
Qubole is a tool in the Big Data as a Service category of a tech stack.

Qubole alternatives & related posts

Databricks logo

Databricks

19
10
0
19
10
+ 1
0
A unified analytics platform, powered by Apache Spark
    Be the first to leave a pro
    Databricks logo
    Databricks
    VS
    Qubole logo
    Qubole
    Snowflake logo

    Snowflake

    97
    70
    0
    97
    70
    + 1
    0
    The data warehouse built for the cloud
      Be the first to leave a pro
      Snowflake logo
      Snowflake
      VS
      Qubole logo
      Qubole

      related Snowflake posts

      Snowflake
      Snowflake
      Google BigQuery
      Google BigQuery

      I use Google BigQuery because it makes is super easy to query and store data for analytics workloads. If you're using GCP, you're likely using BigQuery. However, running data viz tools directly connected to BigQuery will run pretty slow. They recently announced BI Engine which will hopefully compete well against big players like Snowflake when it comes to concurrency.

      What's nice too is that it has SQL-based ML tools, and it has great GIS support!

      See more
      Amazon Redshift logo

      Amazon Redshift

      620
      309
      86
      620
      309
      + 1
      86
      Fast, fully managed, petabyte-scale data warehouse service
      Amazon Redshift logo
      Amazon Redshift
      VS
      Qubole logo
      Qubole

      related Amazon Redshift posts

      Julien DeFrance
      Julien DeFrance
      Full Stack Engineering Manager at ValiMail | 16 upvotes 263.6K views
      atSmartZipSmartZip
      Amazon DynamoDB
      Amazon DynamoDB
      Ruby
      Ruby
      Node.js
      Node.js
      AWS Lambda
      AWS Lambda
      New Relic
      New Relic
      Amazon Elasticsearch Service
      Amazon Elasticsearch Service
      Elasticsearch
      Elasticsearch
      Superset
      Superset
      Amazon Quicksight
      Amazon Quicksight
      Amazon Redshift
      Amazon Redshift
      Zapier
      Zapier
      Segment
      Segment
      Amazon CloudFront
      Amazon CloudFront
      Memcached
      Memcached
      Amazon ElastiCache
      Amazon ElastiCache
      Amazon RDS for Aurora
      Amazon RDS for Aurora
      MySQL
      MySQL
      Amazon RDS
      Amazon RDS
      Amazon S3
      Amazon S3
      Docker
      Docker
      Capistrano
      Capistrano
      AWS Elastic Beanstalk
      AWS Elastic Beanstalk
      Rails API
      Rails API
      Rails
      Rails
      Algolia
      Algolia

      Back in 2014, I was given an opportunity to re-architect SmartZip Analytics platform, and flagship product: SmartTargeting. This is a SaaS software helping real estate professionals keeping up with their prospects and leads in a given neighborhood/territory, finding out (thanks to predictive analytics) who's the most likely to list/sell their home, and running cross-channel marketing automation against them: direct mail, online ads, email... The company also does provide Data APIs to Enterprise customers.

      I had inherited years and years of technical debt and I knew things had to change radically. The first enabler to this was to make use of the cloud and go with AWS, so we would stop re-inventing the wheel, and build around managed/scalable services.

      For the SaaS product, we kept on working with Rails as this was what my team had the most knowledge in. We've however broken up the monolith and decoupled the front-end application from the backend thanks to the use of Rails API so we'd get independently scalable micro-services from now on.

      Our various applications could now be deployed using AWS Elastic Beanstalk so we wouldn't waste any more efforts writing time-consuming Capistrano deployment scripts for instance. Combined with Docker so our application would run within its own container, independently from the underlying host configuration.

      Storage-wise, we went with Amazon S3 and ditched any pre-existing local or network storage people used to deal with in our legacy systems. On the database side: Amazon RDS / MySQL initially. Ultimately migrated to Amazon RDS for Aurora / MySQL when it got released. Once again, here you need a managed service your cloud provider handles for you.

      Future improvements / technology decisions included:

      Caching: Amazon ElastiCache / Memcached CDN: Amazon CloudFront Systems Integration: Segment / Zapier Data-warehousing: Amazon Redshift BI: Amazon Quicksight / Superset Search: Elasticsearch / Amazon Elasticsearch Service / Algolia Monitoring: New Relic

      As our usage grows, patterns changed, and/or our business needs evolved, my role as Engineering Manager then Director of Engineering was also to ensure my team kept on learning and innovating, while delivering on business value.

      One of these innovations was to get ourselves into Serverless : Adopting AWS Lambda was a big step forward. At the time, only available for Node.js (Not Ruby ) but a great way to handle cost efficiency, unpredictable traffic, sudden bursts of traffic... Ultimately you want the whole chain of services involved in a call to be serverless, and that's when we've started leveraging Amazon DynamoDB on these projects so they'd be fully scalable.

      See more
      Ankit Sobti
      Ankit Sobti
      CTO at Postman Inc | 10 upvotes 61.2K views
      atPostmanPostman
      dbt
      dbt
      Amazon Redshift
      Amazon Redshift
      Stitch
      Stitch
      Looker
      Looker

      Looker , Stitch , Amazon Redshift , dbt

      We recently moved our Data Analytics and Business Intelligence tooling to Looker . It's already helping us create a solid process for reusable SQL-based data modeling, with consistent definitions across the entire organizations. Looker allows us to collaboratively build these version-controlled models and push the limits of what we've traditionally been able to accomplish with analytics with a lean team.

      For Data Engineering, we're in the process of moving from maintaining our own ETL pipelines on AWS to a managed ELT system on Stitch. We're also evaluating the command line tool, dbt to manage data transformations. Our hope is that Stitch + dbt will streamline the ELT bit, allowing us to focus our energies on analyzing data, rather than managing it.

      See more

      related Google BigQuery posts

      Tim Specht
      Tim Specht
      鈥嶤o-Founder and CTO at Dubsmash | 14 upvotes 38.2K views
      atDubsmashDubsmash
      Google BigQuery
      Google BigQuery
      Amazon SQS
      Amazon SQS
      AWS Lambda
      AWS Lambda
      Amazon Kinesis
      Amazon Kinesis
      Google Analytics
      Google Analytics
      #BigDataAsAService
      #RealTimeDataProcessing
      #GeneralAnalytics
      #ServerlessTaskProcessing

      In order to accurately measure & track user behaviour on our platform we moved over quickly from the initial solution using Google Analytics to a custom-built one due to resource & pricing concerns we had.

      While this does sound complicated, it鈥檚 as easy as clients sending JSON blobs of events to Amazon Kinesis from where we use AWS Lambda & Amazon SQS to batch and process incoming events and then ingest them into Google BigQuery. Once events are stored in BigQuery (which usually only takes a second from the time the client sends the data until it鈥檚 available), we can use almost-standard-SQL to simply query for data while Google makes sure that, even with terabytes of data being scanned, query times stay in the range of seconds rather than hours. Before ingesting their data into the pipeline, our mobile clients are aggregating events internally and, once a certain threshold is reached or the app is going to the background, sending the events as a JSON blob into the stream.

      In the past we had workers running that continuously read from the stream and would validate and post-process the data and then enqueue them for other workers to write them to BigQuery. We went ahead and implemented the Lambda-based approach in such a way that Lambda functions would automatically be triggered for incoming records, pre-aggregate events, and write them back to SQS, from which we then read them, and persist the events to BigQuery. While this approach had a couple of bumps on the road, like re-triggering functions asynchronously to keep up with the stream and proper batch sizes, we finally managed to get it running in a reliable way and are very happy with this solution today.

      #ServerlessTaskProcessing #GeneralAnalytics #RealTimeDataProcessing #BigDataAsAService

      See more
      Google BigQuery
      Google BigQuery
      Amazon Athena
      Amazon Athena

      I use Amazon Athena because similar to Google BigQuery , you can store and query data easily. Especially since you can define data schema in the Glue data catalog, there's a central way to define data models.

      However, I would not recommend for batch jobs. I typically use this to check intermediary datasets in data engineering workloads. It's good for getting a look and feel of the data along its ETL journey.

      See more
      Stitch logo

      Stitch

      52
      38
      9
      52
      38
      + 1
      9
      All your data. In your data warehouse. In minutes.
      Stitch logo
      Stitch
      VS
      Qubole logo
      Qubole

      related Stitch posts

      Ankit Sobti
      Ankit Sobti
      CTO at Postman Inc | 10 upvotes 61.2K views
      atPostmanPostman
      dbt
      dbt
      Amazon Redshift
      Amazon Redshift
      Stitch
      Stitch
      Looker
      Looker

      Looker , Stitch , Amazon Redshift , dbt

      We recently moved our Data Analytics and Business Intelligence tooling to Looker . It's already helping us create a solid process for reusable SQL-based data modeling, with consistent definitions across the entire organizations. Looker allows us to collaboratively build these version-controlled models and push the limits of what we've traditionally been able to accomplish with analytics with a lean team.

      For Data Engineering, we're in the process of moving from maintaining our own ETL pipelines on AWS to a managed ELT system on Stitch. We're also evaluating the command line tool, dbt to manage data transformations. Our hope is that Stitch + dbt will streamline the ELT bit, allowing us to focus our energies on analyzing data, rather than managing it.

      See more
      Cloudera Enterprise logo

      Cloudera Enterprise

      45
      42
      0
      45
      42
      + 1
      0
      Enterprise Platform for Big Data
        Be the first to leave a pro
        Cloudera Enterprise logo
        Cloudera Enterprise
        VS
        Qubole logo
        Qubole
        Alooma logo

        Alooma

        18
        21
        0
        18
        21
        + 1
        0
        Integrate any data source like databases, applications, and any API - with your own Amazon Redshift
          Be the first to leave a pro
          Alooma logo
          Alooma
          VS
          Qubole logo
          Qubole
          Azure HDInsight logo

          Azure HDInsight

          5
          2
          0
          5
          2
          + 1
          0
          A cloud-based service from Microsoft for big data analytics
            Be the first to leave a pro
            Azure HDInsight logo
            Azure HDInsight
            VS
            Qubole logo
            Qubole
            Xplenty logo

            Xplenty

            4
            7
            2
            4
            7
            + 1
            2
            Code-free data integration, data transformation and ETL in the cloud
            Xplenty logo
            Xplenty
            VS
            Qubole logo
            Qubole
            Matillion logo

            Matillion

            3
            0
            0
            3
            0
            + 1
            0
            An ETL Tool for BigData
              Be the first to leave a pro
              Matillion logo
              Matillion
              VS
              Qubole logo
              Qubole
              Dremio logo

              Dremio

              2
              1
              0
              2
              1
              + 1
              0
              Self-service data for everyone
                Be the first to leave a pro
                Dremio logo
                Dremio
                VS
                Qubole logo
                Qubole
                etleap logo

                etleap

                2
                2
                0
                2
                2
                + 1
                0
                user-friendly, sophisticated ETL-as-a-service on AWS
                  Be the first to leave a pro
                  etleap logo
                  etleap
                  VS
                  Qubole logo
                  Qubole