SQL Query Optimization


While working with large amount of data, the main problem with queries is to increase performance of queries by decreasing execution time. Many factors are affecting on query performance. Some are as bellows:

1.    Data volume:
If you are playing with large amount of data and taking multiple column from multiple table with certain join then performance of query is get slow down. You need to keep in mind that before joining table to get another column first reduce size of data. Apply where clause and take data on conation what you required.

You can use CTE or Temporary table for capturing limited data and instead on joining actual table join CTE or Temporary table to it. It will defiantly optimize the performance of query.

2.    Indexes:
Creating useful indexes is one of the most important ways to achieve better query performance.
While creating Indexed you need to consider, frequency of run the query, type of queries which is using on table, Data in the datable.

If Select queries are frequently running on table, Indexes is more helpful for that. If DML Queries are using on table the Indexes will slow down the performance of queries.  If your application performs many DML operations, you should be conservative with the number of indexes you create.

Create Index on table and run the query. If performance is optimize then only take decision to create index on that particular column or else don't create index for that.

3.    Using Correct join and avoid Unnecessary joins:

Sometime developer not consider the result set and on the basis of consideration they select incorrect join or add unnecessary join in there queries.
Avoiding unnecessary join and selection of correct join can affect performance of queries.

4.    Apply best practices:

There is no use of Indexes or Temp table if you are not applying best practices in your SQL Code.
We all are concentrate how to achieve business logic in query. We write queries to achieve result set and after SQL code runs the production data then we will start realization about performance of queries.


5.    Analyze reason of slow performance of queries:

Several techniques can be used to identify the cause of long-running queries. The two most commonly used options are:

  • Using SET statements.
  • Using SQL Query Analyzer options.

·         Using SET Statements
Use such statements as SET SHOWPLAN_ALL, SET STATISTICS IO, SET STATISTICS TIME, and SET STATISTICS PROFILE. For more information about using these SET statements, see the SQL Server product documentation.

·         Using SQL Query Analyzer
SQL Query Analyzer displays query execution plans in text mode or graphical mode.
         
            To use SQL Query Analyzer
  1. Start SQL Query Analyzer, connect to the server, and select the database that you are working on.
  2. Paste the query into the SQL Query Analyzer window.
  3. If you are using SQL Profiler to trace queries, the query text can be copied from the trace window and used within SQL Query Analyzer.
  4. On the Query menu, click Display Estimated Execution Plan. The estimated execution plan for the query is displayed. If the query window contains multiple queries, an execution plan is displayed for each query.
  5. On the Query menu, click Show Execution Plan, and then run the query in the query window. Execution plans and query results now appear in separate panes of the window so you can view them together.
Figure 1 shows an example of an execution plan along with the related query information.

  
Some TSQL Best Practices:

Ø   Don't use "SELECT*" in a SQL query

·         Unnecessary columns may get fetched that will add expense to the data retrieval time.

  
Ø   Avoid unnecessary columns in the SELECT list and unnecessary tables in join conditions

·         Selecting unnecessary columns in a Select query adds overhead to the actual query, especially if the unnecessary columns are of LOB types.
·         Including unnecessary tables in join conditions forces the database engine to retrieve and fetch unnecessary data and increases the query execution time.

Ø   Do not use the COUNT() aggregate in a sub query to do an existence check
           
·         Do not use:

SELECT column_list FROM table WHERE 0 < (SELECT count(*) FROM table2 WHERE ..)

Instead, use:

      SELECT column_list FROM table WHERE EXISTS (SELECT * FROM table2 WHERE        ...)

  • When you use COUNT() , SQL Server does not know that you are doing an existence check. It counts all matching values, either by doing a table scan or by scanning the smallest non-clustered index.
  • When you use EXISTS , SQL Server knows you are doing an existence check. When it finds the first matching value, it returns TRUE and stops looking. The same applies to using COUNT() instead of IN or ANY. 
Ø   Try to avoid joining between two types of columns


  • When joining between two columns of different data types, one of the columns must be converted to the type of the other. The column whose type is lower is the one that is converted.
  • If you are joining tables with incompatible types, one of them can use an index, but the query optimizer cannot choose an index on the column that it converts. For example:

      SELECT column_list FROM small_table, large_table WHERE
      smalltable.float_column = large_table.int_column
In this case, SQL Server converts the integer column to float, because int is lower in the hierarchy than float. It cannot use an index on large_table.int_column, although it can use an index on smalltable.float_column.

Write TSQL using "Set based approach" rather than "Procedural approach"

  • The database engine is optimized for Set based SQL. Hence, Procedural approach (use of Cursor or UDF to process rows in a result set) should be avoided when large result sets (more than 1000) have to be processed.
  • How can we get rid of "Procedural SQL"? Follow these simple tricks:
    • Use inline sub queries to replace User Defined Functions.
    • Use correlated sub queries to replace Cursor based code.
    • If procedural coding is really necessary, at least, use a table variable instead of a cursor to navigate and process the result set.
Try not to use COUNT(*) to obtain the record count in a table

  • To get the total row count in a table, we usually use the following Select statement:
SELECT COUNT(*) FROM dbo.orders
This query will perform a full table scan to get the row count.
  • The following query would not require a full table scan. (Please note that this might not give you 100% perfect results always, but this is handy only if you don't need a perfect count.)
SELECT rows FROM sysindexes
WHERE id = OBJECT_ID('dbo.Orders') AND indid < 2

Try to avoid dynamic SQL

Unless really required, try to avoid the use of dynamic SQL because:
  • Dynamic SQL is hard to debug and troubleshoot.
  • If the user provides the input to the dynamic SQL, then there is possibility of SQL injection attacks.
Try to avoid the use of temporary tables

  • Unless really required, try to avoid the use of temporary tables. Rather use table variables.

  • In 99% of cases, table variables reside in memory, hence it is a lot faster. Temporary tables reside in the TempDb database. So operating on temporary tables require inter database communication and hence will be slower.
Instead of LIKE search, use full text search for searching textual data

Full text searches always outperform LIKE searches.
  • Full text searches will enable you to implement complex search criteria that can't be implemented using a LIKE search, such as searching on a single word or phrase (and optionally, ranking the result set), searching on a word or phrase close to another word or phrase, or searching on synonymous forms of a specific word.
  • Implementing full text search is easier to implement than LIKE search (especially in the case of complex search requirements).
Try to use UNION to implement an "OR" operation

  • Try not to use "OR" in a query. Instead use "UNION" to combine the result set of two distinguished queries. This will improve query performance.
  • Better use UNION ALL if a distinguished result is not required. UNION ALL is faster than UNION as it does not have to sort the result set to find out the distinguished values.
Implement a lazy loading strategy for large objects

  • Store Large Object columns (like VARCHAR(MAX), Image, Text etc.) in a different table than the main table, and put a reference to the large object in the main table.
  • Retrieve all the main table data in a query, and if a large object is required to be loaded, retrieve the large object data from the large object table only when it is required.
Use VARCHAR(MAX), VARBINARY(MAX), and NVARCHAR(MAX)

  • In SQL Server 2000, a row cannot exceed 8000 bytes in size. This limitation is due to the 8 KB internal page size of SQL Server. So to store more data in a single column, you need to use TEXT, NTEXT, or IMAGE data types (BLOBs) which are stored in a collection of 8 KB data pages.
  • These are unlike the data pages that store other data in the same table. These pages are arranged in a B-tree structure. These data cannot be used as variables in a procedure or a function, and they cannot be used inside string functions such as REPLACE, CHARINDEX, or SUBSTRING. In most cases, you have to use READTEXT,WRITETEXT, and UPDATETEXT.
  • To solve this problem, use VARCHAR(MAX), NVARCHAR(MAX), and VARBINARY(MAX) in SQL Server 2005. These data types can hold the same amount of data BLOBs can hold (2 GB), and they are stored in the same type of data pages used for other data types.
  • When data in a MAX data type exceeds 8 KB, an over-flow page is used (in the ROW_OVERFLOW allocation unit), and a pointer to the page is left in the original data page in the IN_ROW allocation unit.
Implement the following good practices in User Defined Functions

  • Do not call functions repeatedly within your Stored Procedures, triggers, functions, and batches. For example, you might need the length of a string variable in many places of your procedure, but don't call the LEN function whenever it's needed; instead, call the LEN function once, and store the result in a variable for later use.
Implement the following good practices in Stored Procedures

  • Do not use "SP_XXX" as a naming convention. It causes additional searches and added I/O (because the system Stored Procedure names start with "SP_"). Using "SP_XXX" as the naming convention also increases the possibility of conflicting with an existing system Stored Procedure.
  • Use "Set Nocount On" to eliminate extra network trip.
  • Use the WITH RECOMPILE clause in the EXECUTE statement (first time) when the index structure changes (so that the compiled version of the Stored Procedure can take advantage of the newly created indexes).
  • Use default parameter values for easy testing.
Implement the following good practices in Triggers

  • Try to avoid the use of triggers. Firing a trigger and executing the triggering event is an expensive process.
  • Never use triggers that can be implemented using constraints.
  • Do not use the same trigger for different triggering events (Insert, Update, Delete).
  • Do not use transactional code inside a trigger. The trigger always runs within the transactional scope of the code that fires the trigger.
Implement the following good practices in Views

  • Use views for re-using complex TSQL blocks, and to enable it for indexed views (Will be discussed later).
  • Use views with the SCHEMABINDING option if you do not want to let users modify the table schema accidentally.
  • Do not use views that retrieve data from a single table only (that will be an unnecessary overhead). Use views for writing queries that access columns from multiple tables.

Implement the following good practices in Transactions

  • Prior to SQL Server 2005, after BEGIN TRANSACTION and each subsequent modification statement, the value of @@ERROR had to be checked. If its value was non-zero, then the last statement caused an error, and if an error occurred, the transaction had to be rolled back and an error had to be raised (for the application). In SQL Server 2005 and onwards, the Try...Catch block can be used to handle transactions in TSQL. So try to useTry...Catch based transactional code.
  • Try to avoid nested transactions. Use the @@TRANCOUNT variable to determine whether a transaction needs to be started (to avoid nested transactions).
  • Start a transaction as late as possible and commit/rollback the transaction as fast as possible to reduce the time period of resource locking.

Remember, you need to implement the good things that you know; otherwise, your knowledge will not add any value to the system that you are going to build. Also, you need to have a process for reviewing and monitoring the code (that is written by your team) to see whether the data access code is being written following the standards and best practices.





Comments

Popular posts from this blog

Connecting Remote server with Local Server

MDX Error handling tricks

JSON_Automated_stored_procedure