Cloud data platforms like Microsoft Fabric, Google BigQuery, AWS Redshift, and Snowflake have transformed how companies analyze and manage data. With the promise of scalability and pay-as-you-go pricing, they have enabled businesses to move fast and scale insights globally.
But this flexibility has a catch: you pay for what you use. And often, you’re using (and paying for) far more than you realize.
Every SQL query consumes platform resources - primarily CPU (compute), memory, and storage I/O - and your cloud data platform bills you for that usage. Thus, the more data a query scans or processes, the higher the cost, so poorly optimized queries can quietly inflate your bill:
👉 Optimizing SQL queries is one of the fastest and most effective levers to reduce costs without compromising insights.
-- Inefficient SELECT * EXCEPT(fare_amount) FROM NYC_Taxi_Dataset; -- Optimized SELECT VendorID, trip_distance, payment_type FROM NYC_Taxi_Dataset;
-- Slower: applying a function means the database must check -- every row, so it cannot skip partitions efficiently WHERE YEAR(partition_date) = 2025 -- Faster: allows partition pruning, database scans only 2025 -- partitions WHERE partition_date BETWEEN '2025-01-01' AND '2025-12-31'
Here are practical techniques to make your SQL cheaper and faster — listed roughly from highest to lowest potential cost savings.
To validate the order of impact, a benchmark was performed on the NYC Taxi Trip Records dataset. Data from July 2024 to December 2024 was combined into a single dataset of roughly 20 million rows and 19 columns, and a series of queries were executed to compare costs before and after optimization. The observed reductions in compute usage and query runtime were then used to rank the techniques below, providing a realistic view of potential cost savings in practice.
-- Less efficient SELECT c.name, SUM(n.amount) AS total_amount FROM NYC_Taxi_Dataset n LEFT OUTER JOIN Customers c ON n.customer_id = c.customer_id WHERE n.tpep_pickup_datetime >= '2024-01-01' GROUP BY c.name; -- Optimized WITH filter_and_aggregated_data AS ( SELECT customer_id, SUM(amount) as total_amount FROM NYC_Taxi_Dataset WHERE order_date >= '2024-01-01' GROUP BY customer_id ) SELECT c.name, f.total_amount FROM filter_and_aggregated_data f INNER JOIN Customers c ON f.customer_id = c.customer_id;
-- Less efficient: Scalar UDF executes per row CREATE OR ALTER FUNCTION dbo.AddAirportFee(@fee DECIMAL(10,2), @passengers INT) RETURNS DECIMAL(10,2) AS BEGIN IF @passengers = 2 SET @fee = @fee + 3; RETURN @fee; END; GO UPDATE NYC_Taxi_Dataset SET Airport_fee = dbo.AddAirportFee(Airport_fee, passenger_count); -- OPTIMIZED: Executes all matching rows in one go UPDATE NYC_Taxi_Dataset SET Airport_fee = Airport_fee + 3 WHERE passenger_count = 2;
CREATE TABLE IF NOT EXISTS NYC_Taxi_Dataset_clustered_partitioned -- Partition the table by the 'partition_date' column (usually a -- DATE) -- 👍 This splits the data into smaller, date-based segments. -- Each query will then only scan the relevant partitions, which -- reduces cost and improves performance. PARTITION BY partition_date -- Cluster the data by 'tpep_pickup_datetime' -- 👍 This organizes data within each partition based on the -- pickup timestamp, -- making queries that filter or sort by this column much faster. CLUSTER BY tpep_pickup_datetime AS SELECT * FROM NYC_Taxi_Dataset
-- Slower SELECT COUNT(*) FROM NYC_Taxi_Dataset WHERE vendorID = 2; -- Faster SELECT CASE WHEN EXISTS ( SELECT 1 FROM NYC_Taxi_Dataset WHERE vendorID = 2 ) THEN CAST(1 AS BIT) ELSE CAST(0 AS BIT) END AS vendor_has_clients;
-- Update stats UPDATE STATISTICS Orders;
-- Create a rowstore clustered index on Orders table -- This will physically sort the table by order_date CREATE CLUSTERED INDEX IX_Orders_OrderDate ON Orders(order_date);
SQL query optimization isn’t just a technical best practice — it’s a direct cost-control strategy. By combining general SQL hygiene with cloud-native features, organizations can:
💡 Want to know how we can help your company reduce cloud operation costs?
Schedule a call with us — let’s work together to make your queries leaner, faster, and cheaper.
Analytics Engineer at Xomnia