SQL Query Optimization
While working with large amount of data, the main problem
with queries is to increase performance of queries by decreasing execution
time. Many factors are affecting on query performance. Some are as bellows:
1. Data volume:
If you are playing with large amount of
data and taking multiple column from multiple table with certain join then
performance of query is get slow down. You need to keep in mind that before
joining table to get another column first reduce size of data. Apply where
clause and take data on conation what you required.
You can use CTE or Temporary table for
capturing limited data and instead on joining actual table join CTE or
Temporary table to it. It will defiantly optimize the performance of query.
2. Indexes:
Creating useful indexes is one of the
most important ways to achieve better query performance.
While creating Indexed you need to
consider, frequency of run the query, type of queries which is using on table,
Data in the datable.
If Select queries are frequently running
on table, Indexes is more helpful for that. If DML Queries are using on table
the Indexes will slow down the performance of queries. If your
application performs many DML operations, you should be conservative with the
number of indexes you create.
Create Index on table and run the query.
If performance is optimize then only take decision to create index on that
particular column or else don't create index for that.
3. Using Correct join and avoid Unnecessary
joins:
Sometime developer not consider the
result set and on the basis of consideration they select incorrect join or add
unnecessary join in there queries.
Avoiding unnecessary join and selection
of correct join can affect performance of queries.
4. Apply best practices:
There is no use of Indexes or Temp table
if you are not applying best practices in your SQL Code.
We all are concentrate how to achieve
business logic in query. We write queries to achieve result set and after SQL
code runs the production data then we will start realization about performance
of queries.
5. Analyze reason of slow performance of
queries:
Several techniques can be used to identify the cause
of long-running queries. The two most commonly used options are:
- Using SET statements.
- Using SQL Query
Analyzer options.
·
Using SET Statements
Use such statements as SET
SHOWPLAN_ALL, SET STATISTICS IO, SET STATISTICS TIME, and SET STATISTICS
PROFILE. For more information about using these SET statements, see the SQL
Server product documentation.
·
Using SQL Query Analyzer
SQL Query Analyzer displays
query execution plans in text mode or graphical mode.
To use SQL Query Analyzer
- Start SQL Query
Analyzer, connect to the server, and select the database that you are
working on.
- Paste the query into
the SQL Query Analyzer window.
- If you are using SQL
Profiler to trace queries, the query text can be copied from the trace
window and used within SQL Query Analyzer.
- On the Query menu,
click Display Estimated Execution Plan. The estimated
execution plan for the query is displayed. If the query window contains
multiple queries, an execution plan is displayed for each query.
- On the Query menu,
click Show Execution Plan, and then run the query in the query
window. Execution plans and query results now appear in separate panes of
the window so you can view them together.
Figure 1 shows
an example of an execution plan along with the related query information.
Some TSQL Best Practices:
Ø
Don't use "SELECT*" in a SQL query
·
Unnecessary columns may get fetched that will add expense to the
data retrieval time.
Ø
Avoid unnecessary columns in the SELECT list and
unnecessary tables in join conditions
·
Selecting unnecessary columns in a Select query adds overhead to
the actual query, especially if the unnecessary columns are of LOB types.
·
Including unnecessary tables in join conditions forces the
database engine to retrieve and fetch unnecessary data and increases the query
execution time.
Ø
Do not use the COUNT() aggregate in a sub query to do
an existence check
·
Do not use:
SELECT column_list FROM table WHERE 0 < (SELECT count(*) FROM table2 WHERE ..)
Instead, use:
SELECT column_list FROM table WHERE EXISTS (SELECT * FROM table2 WHERE ...)
- When you use COUNT() , SQL
Server does not know that you are doing an existence check. It counts all
matching values, either by doing a table scan or by scanning the smallest
non-clustered index.
- When you use EXISTS , SQL Server knows you are doing an existence check. When it finds the first matching value, it returns TRUE and stops looking. The same applies to using COUNT() instead of IN or ANY.
Ø
Try to avoid joining between two types of columns
- When joining between two
columns of different data types, one of the columns must be converted to
the type of the other. The column whose type is lower is the one that is
converted.
- If you are joining tables with
incompatible types, one of them can use an index, but the query optimizer
cannot choose an index on the column that it converts. For example:
SELECT column_list FROM small_table,
large_table WHERE
smalltable.float_column =
large_table.int_column
In this case, SQL Server converts the integer column
to float, because int is lower in the hierarchy than float. It
cannot use an index on large_table.int_column, although it can use
an index on smalltable.float_column.
Write TSQL using "Set based
approach" rather than "Procedural approach"
- The
database engine is optimized for Set based SQL. Hence, Procedural approach
(use of Cursor or UDF to process rows in a result set) should be avoided
when large result sets (more than 1000) have to be processed.
- How
can we get rid of "Procedural SQL"? Follow these simple tricks:
- Use
inline sub queries to replace User Defined Functions.
- Use
correlated sub queries to replace Cursor based code.
- If
procedural coding is really necessary, at least, use a table variable
instead of a cursor to navigate and process the result set.
Try not to use COUNT(*) to
obtain the record count in a table
- To
get the total row count in a table, we usually use the
following Select statement:
SELECT COUNT(*) FROM dbo.orders
This query will perform a full
table scan to get the row count.
- The
following query would not require a full table scan. (Please note that
this might not give you 100% perfect results always, but this is handy
only if you don't need a perfect count.)
SELECT rows FROM sysindexes
WHERE id =
OBJECT_ID('dbo.Orders') AND indid < 2
Try to avoid dynamic SQL
Unless really required, try to
avoid the use of dynamic SQL because:
- Dynamic
SQL is hard to debug and troubleshoot.
- If
the user provides the input to the dynamic SQL, then there is possibility
of SQL injection attacks.
Try to avoid the use of
temporary tables
- Unless
really required, try to avoid the use of temporary tables. Rather use
table variables.
- In
99% of cases, table variables reside in memory, hence it is a lot faster.
Temporary tables reside in the TempDb database. So operating on temporary
tables require inter database communication and hence will be slower.
Instead of LIKE search, use full
text search for searching textual data
Full text searches always
outperform LIKE searches.
- Full
text searches will enable you to implement complex search criteria that
can't be implemented using a LIKE search, such as searching on a single
word or phrase (and optionally, ranking the result set), searching on a
word or phrase close to another word or phrase, or searching on synonymous
forms of a specific word.
- Implementing
full text search is easier to implement than LIKE search (especially in
the case of complex search requirements).
Try to use UNION to implement an
"OR" operation
- Try
not to use "OR" in a query. Instead use "UNION" to
combine the result set of two distinguished queries. This will improve
query performance.
- Better
use UNION ALL if a distinguished result is not
required. UNION ALL is faster than UNION as it does
not have to sort the result set to find out the distinguished values.
Implement a lazy loading strategy
for large objects
- Store
Large Object columns
(like VARCHAR(MAX), Image, Text etc.) in a different
table than the main table, and put a reference to the large object in the
main table.
- Retrieve
all the main table data in a query, and if a large object is required to
be loaded, retrieve the large object data from the large object table only
when it is required.
Use VARCHAR(MAX),
VARBINARY(MAX), and NVARCHAR(MAX)
- In
SQL Server 2000, a row cannot exceed 8000 bytes in size. This limitation
is due to the 8 KB internal page size of SQL Server. So to store more data
in a single column, you need to use TEXT, NTEXT,
or IMAGE data types (BLOBs) which are stored in a collection of
8 KB data pages.
- These
are unlike the data pages that store other data in the same table. These
pages are arranged in a B-tree structure. These data cannot be used as
variables in a procedure or a function, and they cannot be used inside
string functions such as REPLACE, CHARINDEX, or SUBSTRING.
In most cases, you have to use READTEXT,WRITETEXT,
and UPDATETEXT.
- To
solve this problem, use VARCHAR(MAX), NVARCHAR(MAX),
and VARBINARY(MAX) in SQL Server 2005. These data types can hold
the same amount of data BLOBs can hold (2 GB), and they are stored in the
same type of data pages used for other data types.
- When
data in a MAX data type exceeds 8 KB, an over-flow page is used
(in the ROW_OVERFLOW allocation unit), and a pointer to the page
is left in the original data page in the IN_ROW allocation unit.
Implement the following good
practices in User Defined Functions
- Do
not call functions repeatedly within your Stored Procedures, triggers,
functions, and batches. For example, you might need the length of a string
variable in many places of your procedure, but don't call
the LEN function whenever it's needed; instead, call
the LEN function once, and store the result in a variable for
later use.
Implement the following good
practices in Stored Procedures
- Do not use
"SP_XXX" as a naming convention. It causes additional searches
and added I/O (because the system Stored Procedure names start with
"SP_"). Using "SP_XXX" as the naming convention also
increases the possibility of conflicting with an existing system Stored
Procedure.
- Use
"Set Nocount On" to eliminate extra network trip.
- Use
the WITH RECOMPILE clause in the EXECUTE statement
(first time) when the index structure changes (so that the compiled
version of the Stored Procedure can take advantage of the newly created
indexes).
- Use
default parameter values for easy testing.
Implement the following good
practices in Triggers
- Try
to avoid the use of triggers. Firing a trigger and executing the
triggering event is an expensive process.
- Never
use triggers that can be implemented using constraints.
- Do
not use the same trigger for different triggering events (Insert, Update,
Delete).
- Do
not use transactional code inside a trigger. The trigger always runs
within the transactional scope of the code that fires the trigger.
Implement the following good
practices in Views
- Use
views for re-using complex TSQL blocks, and to enable it for indexed views
(Will be discussed later).
- Use
views with the SCHEMABINDING option if you do not want to let
users modify the table schema accidentally.
- Do
not use views that retrieve data from a single table only (that will be an
unnecessary overhead). Use views for writing queries that access columns
from multiple tables.
Implement the following good
practices in Transactions
- Prior
to SQL Server 2005, after BEGIN TRANSACTION and each subsequent
modification statement, the value of @@ERROR had to be checked.
If its value was non-zero, then the last statement caused an error, and if
an error occurred, the transaction had to be rolled back and an error had
to be raised (for the application). In SQL Server 2005 and onwards,
the Try...Catch block can be used to handle transactions in
TSQL. So try to useTry...Catch based transactional code.
- Try
to avoid nested transactions. Use the @@TRANCOUNT variable to
determine whether a transaction needs to be started (to avoid nested
transactions).
- Start
a transaction as late as possible and commit/rollback the transaction as
fast as possible to reduce the time period of resource locking.
Remember, you need to implement
the good things that you know; otherwise, your knowledge will not add any value
to the system that you are going to build. Also, you need to have a process for
reviewing and monitoring the code (that is written by your team) to see whether
the data access code is being written following the standards and best
practices.
Comments
Post a Comment