Data Warehouses & ETL Tools Research

Part
01
of one
Part
01

Data Warehouses & ETL Tools Research

Key Findings

  • Xplenty, Informatica PowerCenter, Flydata, and Hevo Data have a drag-and-drop functionality.
  • The ten best data warehouses identified include Amazon Redshift, Snowflake, Teradata Vantage, Yellowbrick Data WarehouseIBM Db2 Warehouse, SAP Data Warehouse Cloud, Panoply, Azure Synapse Analytics, Oracle Autonomous Data Warehouse, and Google BigQuery.
  • The ten best ETL tools identified include Xplenty, Hevo Data, Fivetran, Oracle Data Integrator, FlyData, Alooma, AWS Glue, Informatica Power Center, Pentaho, and Talend.

Introduction

We conducted extensive research to determine the best ETL platforms and best Data Warehouses. The findings are presented on the attached spreadsheet. We have done an overview of the products below.

10 ETL Tools

  • The ETL tools enable businesses to consolidate data from multiple sources into a single centralized location.

Xplenty

Overview

  • Xplenty is an ETL tool that has a drag-and-drop interface and provides personalized customer support. Xplenty requires no deployment or coding and is used to process structured and unstructured data, integrating with databases, cloud storage, cloud services, and analytics. Xplenty retrieves and processes data from relational databases.
  • The Xplenty Ltd company was founded in 2012 and was later acquired by Xenon Ventures in December 2018. Total funding of $7 million since its inception.
  • Xplenty is used in the retail, hospitality, and advertising industries to provide solutions for marketing, developers, support, and sales.
  • Xplenty is rated 4.5/5 stars on the G2 website from 105 reviews.

Integration

  • Xplenty integration includes databases like Amazon Aurora, Azure Synapse, Amazon RDS, Google Big Query, Maria DB, MS SQL, IBM DB2, Mongo DB, MySQL, Oracle, and Snowflake, among others.
  • Xplenty cloud storage integration includes Amazon S3, HDFS, FTPS, Google Cloud Storage, MS Azure Blob, MongoDB Atlas, and SFTP. Xplenty can be integrated with cloud services like Amazon Kinesis, Aftership, BigCommerce, Atlassian, Crunchbase, Domo, Github, Gitlab, Freshdesk, Mailchimp, Oracle Responsys, and Base CRM, etcetera.
  • Other integrations include analytics like Amplitude and Good Data, Adroll, Google AdWords, and YouTube, and BI Tools like Periscope Data, Chartio, and QlikView.
  • Companies using Xplenty include Xenon Partners, American Express, Salesforce, BambooHR, and The Real Deal.

Customer Review

  • A Chief Operating Officer, Bryan T, stated that Xplenty ETL is easy to use for non-experts, has powerful functionality, and has great client support. The support team is quick and helpful; the product provides two-way data flow and is easy to set up "data sources and destinations." The downside is that the product has very little documentation, requiring clients to seek online support for any challenges they face.

Pricing Plan

Talend

Overview

Integration

  • The Talend ETL integrates with cloud service providers, analytics platforms, and data warehouses like AWS, Snowflake, Microsoft Azure, Databricks, and Google Cloud Platform.

Customer Review

  • Alessandro A, a Solution Developer, stated that it's easy to create and test using Talend due to its visualization of migration progression. On the downside, a lead developer gave a one-star rating indicating "bad programmability, "branching, and extending ability.

Pricing Plan

  • Talend offers a 14-day trial for paid subscription plans.
  • Talend Open Source plan is free to all users, and the Stitch Data Loader costs $100 to $1,000 a month, the Cloud Data Integration costs $12,000 annually, the Pipeline Designer plan is charged depending on the hourly usage, while the Data Fabric plan varies from one user.

Pentaho

Overview

  • Pentaho ETL platform helps business users to ingest, prepare, and analyze data from multiple sources. Pentaho is a business intelligence platform that comprises ETL, big data analytics, data mining, reporting, dashboards, visualization, and predictive analytics.
  • The platform support pre-and post-load transformations through its drag-and-drop functionality, and users can customize scripts in Java, JavaScript, Python, and SQL canvas.

Integration

  • Pentaho integrates with analytic databases like NoSQL, Hadoop, and partners such as Melissa Data.

Customer Review

  • According to customer feedback, Pentaho has "the best" BI tool in the market and is well organized for data extraction and transformation. However, the product lacks detailed, informative comments in error messages making it difficult to troubleshoot for non-techies.

Pricing

  • Pentaho offers a free trial to the platform. The platform's pricing plan is flexible and is issued upon request depending on the user's needs.

Hevo Data

Overview

  • Hevo Data is an ETL tool that loads data from any source into the selected data warehouse in real-time. Loads data from sources like SaaS applications, relational databases, NoSQL database, files & S3 buckets to data warehouses like Amazon Redshift, Snowflake, and Google BigQuery.

Integration

  • Hevo Data ETL has pre-built integrations to over 100 data sources. Destinations include Redshift, Snowflake, BigQuery, MariaDB, MySQL, PostgreSQL, TokuDB, MS SQL Server, and MySQL Aurora.
  • Integrated sources include MongoDB Atlas, Google Analytics 360, Instagram Business, Shopify, Xero, Amazon S3, Oracle, Javascript, MongoDB, iOS, Android, etc.

Customer Review

  • Customer reviews indicate "excellent customer support" given to users. The platform allows manual mapping and supports Python, which allows transformation data manipulation by custom scripts.
  • One disadvantage is that there's no functionality to schedule a pipeline job to run at a preferred time of day.

Pricing

  • Hevo Data offers a 14-day free trial and has three pricing plans. The Free plan includes 1 million events and costs $0; the Starter plan costs $149 per month, and the Business plan for which clients get a custom price depending on the custom number of events.

Informatica Power Center

Overview

  • Informatica serves users in different industries, including the automotive, energy & utility, financial services, healthcare, and the media industry. The company was founded in 1993 to provide solutions for cloud management, big data, master data management, data security and quality, and data integration.

Integration

Customer Review

  • Gopichand R, a Senior Infosec Consultant, states that Informatica PowerCenter is the best ETL platform as it pulls data from "heterogeneous sources" and loads it into data warehouses for data analysis. The platform is user-friendly and can be used by novice users.
  • On the downside, a user stated that the implementation costs are high, making it difficult for small enterprises to acquire and that the system slows down while processing large amounts of data.

Pricing

  • The Informatica PowerCenter offers a 30-day free trial. The pricing model is quotation-based and customized for each client.

AWS Glue

Overview

  • AWS Glue is a fully managed ETL service that helps users prepare and load data into destinations like Amazon S3, Amazon Redshift, Amazon RDS, and Amazon EC2. AWS Glue has a drag-and-drop interface.
  • The AWS Glue ETL has both code-based and visual interfaces that make data integration efficient. Users can create, run, and monitor ETL workflows from the AWS Glue Studio.
  • Industries using the AWS Glue tool include financial services, banking, information technology, computer software, retail, automotive, insurance, healthcare, and marketing & advertising.

Integration

  • The AWS Glue is integrated with Apache Spark, Amazon RDS for PostgreSQL, Oracle, MySQL, Apache Kafka, Amazon VPC, Amazon Aurora, Amazon Athena, Amazon EMR, Apache Hive Metastore, etcetera.

Customer Review

Pricing

  • AWS Glue charges an hourly rate, which is billed by the second for ETL jobs and crawlers. DataBrew jobs are billed every minute, while the interactive sessions are billed after every session.
  • Pricing also varies by region.

Alooma

Overview

  • Alooma ETL provides a data pipeline by extracting data from multiple sources and loading it into data warehouses like Redshift, Snowflake, and BigQuery in real-time. The platform has a drag-and-drop functionality.
  • Alooma is a cloud-based ETL service that integrates data between APIs and databases. Alooma generates data in real-time and does not rely on batch processing.
  • Alooma stopped supporting new clients who use Redshift, Azure, and Snowflake after Google acquired it in 2019. Alooma solutions include AI & Machine learning, data cleansing, cloud migration, data mapping, migration and integration, transformation, replication, Internet of Things, marketing data integration, real-time data ingestion, and warehousing.

Integration

  • Alooma integrates with Google BigQuery and Python developer tools. Alooma allows data transformations through native apps like Salesforce, Zendesk, Google Drive, and Azure.

Customer Review

  • From the G2 website review, Alooma ETL provides flexibility through the code engine feature and has responsive client support. On the downside, some features on the Alooma platform are only available on the API and not on the user interface/dashboard.

Pricing

  • Alooma's pricing plans range from $1,000 to $15,000 per month; however, clients must contact the company for a quote. The platform offers a 14-day free trial.

FlyData

Overview

  • FlyData ETL tool is a product of FlyData Sync, LLC, a private company founded in April 2011.
  • FlyData is used by businesses to extract and load data to Amazon Redshift and other data warehouses securely and continuously. FlyData is utilized in various industries, including big data, cloud computing, SaaS, Cloud data services, analytics, data integration, and software development.

Integration

  • FlyData integration includes AWS, MySQL, Looker, Percona, MariaDB, Oracle, Amazon Aurora, enterprise data integration, Heroku, Amazon EC2, integration platform as a service, and Amazon RDS.

Customer Review

  • Gareth Tilley, head of Data Analytics at 99Designs, stated that FlyData helped move data from MySQL to Redshift faster and conveniently.
  • Anderson C, a senior Director of Data Engineering, stated that FlyData has "amazing customer support." According to one reviewer, the number of integrations is limited, and one cannot add different users with different permissions.

Pricing Plan

  • FlyData provides a 14-day trial. Pricing ranges from $159 to $3,733 per month, depending on the number of rows replicated every month.

Oracle Data Integrator

Overview

  • The Oracle Data Integrator tool extracts, loads, and transforms data to improve performance and ease data integration costs. The ODI handles all data integration requirements, including high-performance batch loads, SOA- enables data services, event-driven processes, and trickle-feed integration processes.
  • The ODI tool has ELT architecture, supports heterogeneous platforms for enterprise data integration, service-oriented data management, and integration for SOA environments, and provides knowledge modules for developer optimization.

Integration

  • Oracle GoldenGate, Oracle Warehouse Builder, and Oracle Enterprise Manager 12c.

Customer Review

  • According to Trust Radius reviews, the Oracle Data Integrator handles large volumes of data and has vast use cases and deployment scope compared to other ETL tools. Most reviews indicate that the clients now use ODI are a replacement for PL/SQL. On the downside, ODI requires a certain skill level to use it. But Oracle offers courses to align users with product use.

Pricing

  • Does not have a free trial and doesn't provide a free version. The Oracle Data Integrator pricing model is calculated per feature. Clients are required to contact the vendor to get a quote.

Fivetran

Overview

  • Fivetran is a cloud-based ETL platform that provides automated data integration. Fivetran automates schemas and in-warehouse transformations.
  • Fivetran is focused on data ingestion and ELT and performs full table database replication. The product is HIPAA, GDPR, and SOC 2 compliant. Unlike other ETL tools, Fivetran does not perform data transformation before loading.

Integration

  • Fivetran integrates with databases and platforms like Oracle, SQL Server, Salesforce, Google Ads, PostgreSQL database, MySQL database, NetSuite SuiteAnalytics Amazon S3 ZOHO CRM, WooCommerce, etcetera.

Customer Review

  • Fivetran is a "simple self-serve" platform that requires no code data integration, has easy source-destination configuration, an intuitive user interface, and provides an affordable MAR pricing plan.
  • On the downside, while the pricing plan is affordable, some users find the MAR billing system confusing because of the multiple rules.

Pricing

  • Fivetran offers a 14-day trial and four pricing plans. The starter plan costs $1 per credit, the standard plan $1.50 per credit, the enterprise plan $2 per credit, while the business-critical plan offers customized prices.
  • Fivetran credits are counted as the monthly active rows (MAR) outlined in the platform's scale and consumption table.

10 Data Warehouses

Amazon Redshift

  • Amazon Redshift is a product of Amazon Web Services, Inc.
  • Amazon Redshift pricing plan starts from as low as $0.25 per hour and scales up to petabytes of data. Billing is charged depending on the features used. Examples of companies using the Amazon Redshift data warehouse include Yelp, Nasdaq, MacDonalds, Duolingo, Equinox Fitness, Pfizer, Cocacola.
  • Amazon Redshift provides customer support and training through Email/Help Desk, Chat, webinars, live online training, and documentation.

Snowflake

  • The Snowflake Cloud Data Platform is a product of Snowflake Inc.
  • Snowflake is built with SQL query engine and cloud infrastructure. Snowflake's architecture consists of cloud services, query processing, and database storage, as shown below.

Google BigQuery

  • Google BigQuery features include real-time data collection, data lake, data distribution, Hadoop and spark integrations, data preparation, machine scaling, workload processing, and cloud processing.
  • BigQuery is a serverless data analytics tool that allocates resources on demand. The Google BigQuery charges analysis pricing and storage pricing, That is, the cost to process queries, functions, DML, DDL, scripts, scan tables, and the cost to store data. BigQuery also charges for data extraction and data ingestion.
  • The on-demand query pricing costs $5 per TB monthly. The monthly slot commitment fee is $2,000 for 100 slots. Active storage costs $0.020 per GB, while long-term storage costs $0.010 per GB.

Oracle Autonomous Data Warehouse

  • The Oracle Autonomous Data Warehouse has an overall rating of 4.5/5 on the G2 website and is rated 4.0/5 for ease of use and 4.2/5 for customer service on Capterra.
  • The Oracle ADW costs $1.3441 for vCPU per hour, while the Oracle ADW - Exadata Storage costs $118.40 for Terabytes storage capacity monthly. Oracle ADW "bring your own license" plan costs $0.3226 for vCPU per hour and $0.3226 for vCPU per hour on a dedicated plan.

Azure Synapse Analytics

  • The Azure Synapse Analytics integrates data, big data analytics, ETL pipeline, dashboard and visualization, and enterprise data warehousing, using dedicated or serverless resources. The Azure Synapse Analytics tool is used to ingest, prepare, explore, manage, and serve data for machine learning and business intelligence needs.
  • Azure Synapse Analytics leverages MPP processing to run complex queries across big data.
  • The Azure Synapse Analytics data warehouse charged pricing for computing and storage separately. The price varies depending on the level of service, the location of the data warehouse, the pricing on-demand option, and additional services like threat detection and disaster recovery.
  • Data storage costs $122.88 per Terabyte monthly or $0.17 per Terabyte hourly. Geo-redundant disaster recovery costs $0.12 per Gigabyte monthly, while threat detection costs $0.02 per node monthly.

Panoply

  • Panoply offers three pricing plans; Starter, Pro, and Expert. The Starter plan costs $399 per month, the Pro plan $649 per month, and the Expert plan $999 per month.
  • Panoply performs customer support and training through FAQs, Panoply community forum, email/help desk, phone support, chat support, webinars, documentation, and live online.
  • Companies that use Panoply include Saucey, Agriwebb, Park Dental, Motorsport.com, HoneyBook, Agriwebb.

SAP Data Warehouse Cloud

  • The SAP Data Warehouse Cloud processes manage and analyze data by providing data integration, database, and warehouse. The product is built on the SAP HANA Cloud database to enable its users to understand real-time business processes and data.
  • Users can connect to on-premise repositories and multi-cloud resources. The SAP Data Warehouse Cloud provides self-service access to data and analytics.
  • SAP Data Warehouse Cloud is a product of the SAP SE company.
  • The below image shows the SAP Data Warehouse Cloud solution architecture.

IBM Db2 Warehouse

  • The IBM Db2 Warehouse is an elastic cloud data warehouse equipped with artificial intelligence and high-performance analytics.
  • The IBM Db2 Warehouse can integrate with IBM Cloudant, IBM Cognos Analytics, IBM Netezza, Performance Server, DB2, IBM SPSS, Tableau Desktop, Looker, Apache, Spark, RStudioIBM Cloud PaaS.
  • The IBM Db2 Warehouse has five pricing plans; Flex One, Flex, Flex Performance, Flex for AWS, Flex Performance for AWS. The IBM Db2 data warehouse pricing starts at $1,000, and prices are customized per instance. There is no free trial.

Teradata Vantage

  • Teradata Vantage uses all-time data to analyze, deploy, and deliver important analytics to a business. The Teradata Vantage is a "connected multi-cloud data platform" for enterprise analytics. The platform unifies and streamlines data lakes, analytics, data warehouses, and new data sources. Vantage combines commercial analytic technologies and open-source technologies to operate insights and solve business challenges.
  • The product's cross-engine orchestration layer channels data and analytic requests to the correct engine at high-speed data fabric.
  • Vantage's deployment options are Cloud, SaaS, and Web-based.

Yellowbrick Data Warehouse

  • The Yellowbrick Cloud Data Warehouse provides solutions in data lake augmentation, distributed clouds, data warehouse modernization and serves various industries such as financial services, retail, healthcare & life sciences, and telecommunications.
  • Yellowbrick Cloud Data Warehouse performs data migration, security, monitoring & management, analytics, and cloud disaster recovery. The tool is designed for demanding batch processing that is real-time, interactive, and multichannel.
  • The deployment options for Yellowbrick Data Warehouse are cloud, web-based, SaaS, and on-premises.

Research Strategy

We determined the best/top data warehouses and ETL tools from the already precompiled list from the industry articles and tech space websites like Datamation, EM360 Tech, IT Central Station, Statista, and Server Watch. We also picked the ones that are top-rated on G2, Trust Radius, and Capterra websites. Data about the ideal number of users for each data warehouse is not available. We pivoted the research to provide the number of concurrent user connections and the limit of parallel queries allowed at any given time from users. However, the data available was not from credible sources. For each tool identified, we provided at least three companies that use it in data analysis and BI decision-making.

Research proposal:

Only the project owner can select the next research path.
Need related research? Let's launch your next project!
Sources
Sources