Understanding Estimated and Actual Query Plans in SQL Server

Query plans (execution plans) are graphical or textual representations of how SQL Server processes a query. An estimated query plan is a prediction based on available statistics. An actual query plan includes runtime execution details, such as actual rows processed.

How to Generate Query Plans

Estimated Execution Plan:
- Use Ctrl + L in SQL Server Management Studio (SSMS).
- Alternatively, run the query with: SET SHOWPLAN_ALL ON; Execute the query but without actually running it.
Actual Execution Plan:
- Use Ctrl + M in SSMS, then execute the query.
- Alternatively, use: SET STATISTICS PROFILE ON; This includes runtime statistics when the query is executed.

Key Components in Query Plans

Operators:
- Represented as icons or steps in the query plan (e.g., Index Scan, Nested Loop).
- Ordered from right to left in the plan execution sequence.
Estimated Rows vs. Actual Rows:
- Estimated Rows: Number of rows SQL Server expects based on statistics.
- Actual Rows: Number of rows processed during execution (available only in the actual plan).
Execution Order:
- Read plans from right-to-left and top-to-bottom, following the sequence of execution.
Cost:
- Estimated Operator Cost: Percentage of overall query cost attributed to each operation.
- Estimated Subtree Cost: Cumulative cost of the current operator and all preceding operators.
Warnings:
- Indicators like exclamation marks show issues, such as missing statistics or implicit conversions.

Important Points to Look Into Query Plans

Indexes:
- Seek vs. Scan:
  - Index Seek: Efficient; uses index for targeted row retrieval.
  - Index Scan: Reads the entire index or table; may indicate missing indexes or poorly designed queries.
- Check for Missing Index Recommendations in the plan.
Joins:
- Types of join operations:
  - Nested Loop: Good for small datasets but can be slow for large datasets.
  - Merge Join: Efficient for sorted datasets.
  - Hash Match: Used for large unsorted datasets but can be resource-intensive.
Sort and Aggregations:
- Check for sort operations, as they can be expensive if the dataset is large.
- Look for opportunities to use indexed columns to avoid explicit sorts.
Estimated vs. Actual Rows:
- Significant discrepancies between estimated and actual rows indicate outdated or missing statistics.
Parallelism:
- Look for Parallelism operators (e.g., Parallelism (Gather Streams)).
- Consider adjusting the Max Degree of Parallelism (MAXDOP) setting if too many parallel operations occur.
Memory Grants:
- Excessive memory grants for operators like Sort or Hash Match may indicate inefficiency.
- Optimize query or reduce dataset size to minimize memory requirements.
Warnings:
- Missing Statistics: Indicates outdated or unavailable data distribution information.
- Implicit Conversions: Occur when data types mismatch; avoid by ensuring proper data types.
Key Lookups:
- A Key Lookup occurs when a non-clustered index is used, and SQL Server fetches additional data from the base table.
- Address by adding included columns to the non-clustered index.
Expensive Operators:
- Identify operators with high Estimated Operator Cost or Estimated Subtree Cost and optimize them.
Query Hints:
- Analyze if any query hints (e.g., FORCESEEK, NOLOCK) are being used and whether they impact performance positively or negatively.

Example: Query Plan Analysis

Consider the query:

SELECT CustomerID, SUM(TotalAmount)
FROM Sales
WHERE OrderDate >= '2024-01-01'
GROUP BY CustomerID
ORDER BY SUM(TotalAmount) DESC;

Things to Look For:

Where Clause Filtering:
- Ensure the OrderDate column has an index.
- Look for Index Seek instead of Index Scan.
Grouping and Sorting:
- Check if the GROUP BY or ORDER BY leads to a sort or hash aggregation operator.
- Optimize with indexes on frequently grouped or sorted columns.
Warnings:
- Look for warnings like implicit conversions (e.g., OrderDate being compared with a string).

Common Scenarios and Solutions

Scenario	Solution
High cost on `Index Scan`	Add or optimize indexes.
Discrepancy in estimated vs. actual rows	Update statistics using `UPDATE STATISTICS` or `sp_updatestats`.
Key Lookups	Add included columns to non-clustered indexes.
Excessive parallelism	Adjust `MAXDOP` or rewrite the query for efficiency.
Warnings for implicit conversions	Fix data types to match the schema.
Missing Index Recommendations	Use suggested indexes but validate them for overall system performance.

Best Practices

Use Statistics:
- Regularly update statistics to ensure accurate query plans.
- Use AUTO_UPDATE_STATISTICS but monitor large datasets manually.
Test Changes:
- Compare query plans before and after applying optimizations.
Monitor Iteratively:
- Regularly review query plans for mission-critical queries.