Data Lake and Warehousing Pricing
A standardized pricing comparison of data lake and warehousing options for Amazon Web Services (Lake Formation and Redshift), Azure, Snowflake, Cloudera, Oracle, Google Cloud, and Informatica has been provided in the attached spreadsheet. Note that because of multiple pricing options due to tiering, dependencies, and pricing strategies for each company, only the basic and general pricing details have been provided in the attached spreadsheet, as the research brief contains more specific information. Also, pricing details are highly dependent on the location of the cloud service. Hence, only the provided US regions were used to derive pricing information.
Amazon Web Services
- Amazon Web Services (AWS) offers a pay-as-you-go pricing structure for its over 160 cloud services, including Lake Formation. Hence, there are no long term contracts, complex licensing fees, additional costs, or termination fees.
- It also offers other pricing benefits such as tiered pricing offerings and discounts through reserved capacities.
- As part of its various cloud services, Amazon Redshift and AWS Lake Formation are its data warehousing and data lake options, respectively.
Amazon Web Services (Lake Formation)
- According to AWS, AWS Lake Formation is free as there is no additional charge for using the service. Lake Formation only enables users to build and manage data lakes that are stored in Amazon Simple Storage Service (Amazon S3).
- Amazon S3, which is also used as a data lake, has several plans or packages for storage pricing, but the S3 Standard plan ranges from $0.021 per GB to $0.023 per GB per month.
- Also, "there are per-request ingest fees when moving data into any S3 storage class." This starts at $0.005 to $0.0004 per 1,000 requests for the S3 Standard edition and up to $0.005 to $0.0004 per 1,000 requests for S3 Glacier Deep Archive.
- Lake Formation complements AWS Glue used for the extraction, transformation, and loading (ETL) of data for analytics. It can then be integrated with other services such as Amazon Redshift for data warehousing.
- Hence, Lake Formation users are only charged for some of these above-mentioned services when they access them through Lake Formation.
Amazon Web Services (Redshift)
- Amazon Redshift is a data warehousing service. Amazon Redshift uses an on-demand pricing structure that allows users to pay for capacity by the hour with no commitments and no upfront costs.
- The pricing is dependent on the users' choice of a cluster configuration, which is defined by the various node types — Redshift Managed Storage (RA3 nodes), Dense Compute (DC2 nodes), and Dense Storage (DS2 nodes).
- DC2 nodes cost $0.25 per hour with 0.16 TB SSD addressable storage capacity, while eight or more of the same nodes cost $4.80 per hour with 2.56 TB SSD addressable storage capacity.
- DS2 nodes cost $0.85 per hour with 2TB HDD addressable storage capacity, while eight or more of the same nodes cost $6.80 per hour with 16 TB HDD addressable storage capacity.
- RA3 nodes cost $3.26 per hour with 64 TB RMS addressable storage capacity, while eight or more of the same nodes cost $13.04 per hour with 64 TB RMS addressable storage capacity.
- Additional costs may exist if users access other services such as Amazon S3 and AWS Glue through Redshift — which is based on the standard rates of that service.
- Microsoft Azure offers both data lake and data warehousing services through its Azure Data Lake Storage Gen2 product and Azure Synapse Analytics, respectively.
- Azure Data Lake Storage Gen2 offers the ability to store data in both structured and unstructured directories. For both options, the standard plan, called Archive, costs $0.0010 per GB up to $0.15 per GB for the Premium plan.
- Based on data ingestion, ingression, or transaction (write operations), the costs start at $0.10 for every 4MB per 10,000 for the Archive plan. It is up to $0.0175 for every 4MB per 10,000 for the premium plan.
- Other costs include other operations such as iterative write operations ($0.10 per 100’s operations), Iterative Read Operations ($0.05 per 10,000 operations), etc.
- For Azure Synapse Analytics (formerly Azure SQL Data Warehouse), "data storage is charged at the rate of $122.88 per TB of data processed ($0.17/1 TB/hour)."
- It is commonly offered in two provisioned resources, one with Compute Optimized Gen1 and the other with Compute Optimized Gen2. The cost for both is based on the number of data warehousing units (DWU).
- For the former, it starts with DW100 (100 DWUs) at $1.210 per hour up to DW6000 (6,000 DWUs) at $72.582 per hour.
- For the latter, it starts with DW100c (100 DWUs) at $1.20 per hour up to DW30000c (30,000 DWUs) at $360 per hour.
- Snowflake is an all-in-one cloud solution that includes augmented data lakes and data warehousing capabilities. It is also able to easily integrate with other data lakes and services such as Amazon S3, Azure Data Lake Storage, and Google Cloud Storage, as it appears to focus more on data warehousing.
- Therefore, the pricing is based on the actual usage of three functions that defines its data warehousing functions — storage, virtual warehouses (compute), and cloud services. It is also dependent on the service and/or data lake to be integrated with.
- The Standard plan costs $2 per credit, $3 per credit for the Enterprise plan, and $4 per credit for the Business Critical plan, assuming it would be integrated with AWS. Compute usage is billed on a per-second basis where it costs $0.00056 per second, per credit (with a minimum of 60 seconds), for the Standard plan.
- Data storage costs $40 per TB per month.
- The Cloudera platform is an all-in-one data/cloud solution that provides Edge (Cloudera Data Flow), data warehousing, and AI capabilities.
- Its pricing structure is an annual subscription model for the whole platform. The base price (including price per node plus variable pricing for computing and storage over node caps) is $10,000 across its editions — CDP Data Center, Enterprise Data Hub, and HDP Enterprise Plus.
- For the variable/customizable edition, the storage services cost $25 per TB over 48 TB node cap, while for 'compute services', it is $75 per concurrent users (CCU) over 16 cores, 128 GB node cap, all across the three editions.
- Data lake in Oracle is achieved using the Oracle Big Data Cloud. The Starter Pack — 3 Nodes costs $14,400 per hosted environment per month (48 TB storage and 32 OCPUs inclusive) and would require $4,800 per hosted node per month (48 TB storage and 32 OCPUs inclusive) if additional nodes are required.
- Oracle's data warehousing capabilities are carried out through its Autonomous Data Warehouse services.
- The pricing detail is divided into services based on shared infrastructure and dedicated infrastructure.
- On shared infrastructure, the Autonomous Data Warehouse for storage purposes (Exadata Storage) costs $118.40 per terabyte storage capacity per month, while for operation/processing/computing purposes, it costs $1.3441 per Oracle Compute Unit (OCPU) per hour.
- On dedicated infrastructure, the Autonomous Data Warehouse for storage purposes (Database Exadata Infrastructure) varies in price depending on the number/type of rack for each hosted environment per hour. They range from Quarter Rack — X8 ($14.5162) to Full Rack — X7 ($86.0215). Computing costs $1.3441 per Oracle Compute Unit (OCPU) per hour.
- In Google Cloud Platform, data lakes can be created using Google Cloud Storage. The pricing depends on data storage amount, network usage, operations usage, among others.
- Standard Storage starts at $0.026 per GB per month up to the Archive Storage, which costs $0.004 per GB per month.
- Ingress is free.
- Additional fees, such as a general network usage fee, may be charged for egress to various continents like Asia and Australia. This ranges from $0.08 to $0.23.
- Operation fees are also charged for "changes to or information retrieved about buckets and objects in Cloud Storage." There are Classes A and B operations that are charged per 10,000 operations according to each storage plan or edition — $0.50 for Standard Storage, among others.
- Data warehousing can be carried out in the Google Cloud Platform using the BigQuery product.
- Also, to ingress data is free except a small charge for streamed data called streaming inserts, which is $0.010 per 200 MB.
- Data query costs (the cost of running SQL commands and user-defined functions) can be on-demand or at flat rates. On-demand queries are $5.00 per TB, while flat rates vary from an hourly $20.00 per 500 slots to an annual $8,500 per 500 slots.
- "Informatica offers the only enterprise-class, cloud-native, end-to-end data management solution for data warehouses, data lakes, and lake houses."
- It appears that Informatica does not provide data lakes or data warehousing directly but provides management services that integrate with other data lakes, data warehouse, and services such as Microsoft Azure and AWS. Moreover, no pricing detail was provided.
- Research into tech magazines and databases such as G2, Capterra, and Trust Radius corroborate the unavailability of pricing details. G2, however, provides the pricing data for the software in general. It starts at $1,000 per month up to $4,500 per month for the advanced edition.