There
is no any hard-and-fast rule for query tuning or server tuning. All are
depend on your business requirement(logic) and H/W capacity. If all you are
on Linux, your total physical RAM should be larger than your database size on
disk in order to minimize I/O. Eventually if the entire database will be in the
OS read cache and I/O will be limited to committing changes to disk.
This
blog is divided into two parts. In this part, I am writing on some basic ideas
related to improve query performance. In second part I‘ll write on Server
parameter setting concepts.
There are some points we have to take
care when we write any query.
1. Limits your data when you are joining(using) a table in many times in a SQL query(SP).
Example, suppose we have a table XYZ in this table we have millions of records and more than 50 columns and this table XYZ are used many times in SQL query and we need only 5 to 10 columns on the basis of some condition. Then it is always better to create a TEMPORARY table or CTE (like XYZ_TEMP) with 5 to 10 columns and on that conditions used in SQL. After that we use this XYZ_TEMP table in place of XYZ in everywhere in SQL(SP) query.
Create temp table XYZ_TEMP (col1,col2,col3,…) as select col1,col2,col3,… from XYZ
where col1 = < Some condition>
and coln = < Some condition> ….;
Now, your data has been reduced from ( XYZ table ) millions to some less data ( in XYZ_TEMP table) .Use this XYZ_TEMP table in your further SQL query(SP).
Example, suppose we have a table XYZ in this table we have millions of records and more than 50 columns and this table XYZ are used many times in SQL query and we need only 5 to 10 columns on the basis of some condition. Then it is always better to create a TEMPORARY table or CTE (like XYZ_TEMP) with 5 to 10 columns and on that conditions used in SQL. After that we use this XYZ_TEMP table in place of XYZ in everywhere in SQL(SP) query.
Create temp table XYZ_TEMP (col1,col2,col3,…) as select col1,col2,col3,… from XYZ
where col1 = < Some condition>
and coln = < Some condition> ….;
Now, your data has been reduced from ( XYZ table ) millions to some less data ( in XYZ_TEMP table) .Use this XYZ_TEMP table in your further SQL query(SP).
2. Avoid using functions in predicates.
The
index is not used by the database if there is a function on the column.
For
example:
SELECT * FROM abc
WHERE UPPER(COL1)='XYZ'
As
a result of the function UPPER(), the index on COL1 is not used by database
optimizers. If the function cannot be avoided in the SQL, you need to create a
function-based index.
3. For SQL queries with the LEFT OUTER JOIN, pushing predicates of the right table from the WHERE clause into the ON condition helps the database optimizer generate a more efficient query. Predicates of the left table can stay in the WHERE clause.
For example,
Suboptimal SQL query:
SELECT A.COL1, B.COL1 FROM ABC A LEFT OUTER JOIN XYZ B ON A.COL3 = B.COL3 WHERE A.COL1=123 AND B.COL2=456;
Optimized SQL query:
SELECT A.COL1, B.COL1 FROM ABC A LEFT OUTER JOIN XYZ B ON A.COL3 = B.COL3 AND B.COL2=456 WHERE A.COL1=123;
3. For SQL queries with the LEFT OUTER JOIN, pushing predicates of the right table from the WHERE clause into the ON condition helps the database optimizer generate a more efficient query. Predicates of the left table can stay in the WHERE clause.
For example,
Suboptimal SQL query:
SELECT A.COL1, B.COL1 FROM ABC A LEFT OUTER JOIN XYZ B ON A.COL3 = B.COL3 WHERE A.COL1=123 AND B.COL2=456;
Optimized SQL query:
SELECT A.COL1, B.COL1 FROM ABC A LEFT OUTER JOIN XYZ B ON A.COL3 = B.COL3 AND B.COL2=456 WHERE A.COL1=123;
4. Whenever possible Duplicate constant (values) condition for different tables.
Suppose two tables, X and Y, are joined and there is a constant values condition on one of the joined columns, for example, X.id=Y.id and X.id in (10, 20), the constant value should be duplicated for the joined column of the second table. That is, X.id=Y.id and X.id in (10, 20) and Y.id in (10, 20).
e.g, Table XYZ has a LEFT OUTER JOIN relationship with table ABC. If there is a XYZ specific conditions and a cross table condition with ABC, create an extra ABC specific condition based on XYZ requirement and keep cross table conditions in the ON clause:
Suboptimal SQL query:
SELECT XYZ.COL1, ABC.COL1 FROM XYZ LEFT OUTER JOIN ABC ON XYZ.COLn = ABC.COLn WHERE XYZ.COL1 IN (1, 2) AND ABC.COL2=XYZ.COL1;
Optimized SQL query:
SELECT XYZ.COL1, ABC.COL1 FROM XYZ LEFT OUTER JOIN ABC ON XYZ.COLn = ABC.COLn AND ABC.COL2 IN (1, 2) AND ABC.COL2=XYZ.COL1 WHERE XYZ.COL1 IN (1, 2);
In particular, if the constant predicate has only 1 value (e.g COL1=123), the second predicate should be converted to a constant predicate too.
e.g :
Suboptimal SQL query:
SELECT XYZ.COL1, ABC.COL1 FROM XYZ LEFT OUTER JOIN ABC ON XYZ.COLn = ABC.COLn WHERE XYZ.COL1=123 AND ABC.COL2=XYZ.COL1;
Optimized SQL query:
SELECT XYZ.COL1, ABC.COL1 FROM XYZ LEFT OUTER JOIN ABC ON XYZ.COLn = ABC.COLn AND ABC.COL2=123 WHERE XYZ.COL1=123;
5. Avoid using wildcard (%) at the beginning of a predicate.
The predicate LIKE '%xyz' causes full table scan. For example:
SELECT * FROM abc WHERE COL1 LIKE '%xyz';
This is a known performance limitation in all databases.
6. Use inner join, instead of outer join if possible.
The outer join should only be used if it is necessary. Using outer join limits the database optimization options which typically results in slower SQL execution.
7. Avoid to use DISTINCT in SQL query. It should be used only if it is necessary. DISTINCT cause sorting, which slows down the SQL execution.
8. Use UNION ALL instead of UNION, if possible, as it is much more efficient.
9. In DISTINCT and GROUP BY clause use GROUP BY clause.
10. Use common table expression(CTE). Sometime when rewrite query with the use of CTE it improves performance. CTE is work just like temporary tables.
11.Use UNLOGGED keyword when creating a table. If data consistency after a server crash is not an issue, or you’re just gonna deal with a disposable table that needs that extra boost for writing — then unlogged tables are for you. (...) Data written to unlogged tables is not written to the write-ahead log, which makes them considerably faster than ordinary tables. However, they are not crash-safe: an unlogged table is automatically truncated after a crash or unclean shutdown. The contents of an unlogged table are also not replicated to standby servers. Any indexes created on an unlogged table are automatically unlogged as well.
12. Create Index to avoid sequential scan on larger Tables. If table data is less, then sometime using Index in a SQL slow down your query. If we fetch more no of data(record) from a table, then using Index is not better. It may improve your fast data retrieval by proper use of "Index-Only Scans" or "ORDER BY" when creating Index. First you have to check with Explain Plan or by pressing F7 or Shift + F7 that Index is used by query or not. Index is used for fast retrieval of DATA but it will slow down your DML and it is resource consuming.
13. In place of using BETWEEN function use >= and <=. It's comes in your index search.
eg. Location_code BETWEEN 1 and 2 , use Location_code >= 1 and Location_code <= 2 .
14. Avoid to write scalar sub query. For SELECT subqueries in overall it is better to try to join instead of using subquery at select clause. In most cases a join is better than scalar sub query.
15. Avoid to use CURSOR it slow down you query. Try to rewrite your query to avoid using CURSOR. In place of using CURSOR try to join that table (Used in cursor) in FROM clause if possible.
16. If TABLE SIZE is small, in place of using IN or EXISTS (Subquery), JOIN that table in FROM clause.If no. of compare values(Column values) used with IN operator is less ,then use IN operator in place of EXISTS.But first you have to check with your Explain Plan.
17. Partitions large tables. It will improves your performance.
18. Create some TABLESPACE as per your requirement and Use that TABLESPACE with respect to Schemas or Objects.
19. If not required don't use order by, group by, join, having, union. If required to use these, use them in outer most query.
22. On regular basis analyze, vacuum and reindex your Table and Index.
23. Don't add unnecessary raise notice and don't Define unnecessary variables in your functions(SP).
24. If not required never use select * from <table name>
It will takes more time to get out put .
In below example I will create a table abc .
create table abc (srno serial,comp_code smallint,div_code smallint ,active_yn varchar,active_yn_int smallint);
And insert around 50 k dummy records.
When we run
select * from abc; It takes much time in compare when I run
select comp_code, div_code from abc
I will get out put much faster.
In real clusters if you hit ‘select * from table’, it may have billions of records that will run for long long time.
‘select column from table’ is a projection query on the table where query processor has to read all the rows in the table and extract the column value from each row and display it.
If you always need few columns in result, then just use SELECT col1, col2 FROM YourTable. If you SELECT * FROM Table; that is extra useless overhead.
If in future if someone adds Image/BLOB/Text type columns in your table, using SELECT * will worsen the performance for sure.
Let's say if you have Store Procedure and you use INSERT INTO newTable SELECT * FROM TABLE which runs fine BUT again if someone adds few more columns then your SP will fail saying provided columns don't match.
25. For Comparison column in place of using character datatype like 'Y' or 'N' Use Numeric datatype like 1 or 0 . In above abc table for compare between ["Y", "N"] in column active_yn , I used [1 ,0] in column active_yn_int.
e.g.
select * from abc where active_yn = 'N'
explain plan is like below:
"Seq Scan on abc (cost=0.00..885.49 rows=450 width=12) (actual time=11.773..11.920 rows=423 loops=1)"
" Filter: ((active_yn)::text = 'N'::text)"
" Rows Removed by Filter: 48896"
"Planning time: 0.062 ms"
"Execution time: 11.969 ms"
select * from abc where active_yn_int = 0
explain plan is like below:
"Seq Scan on abc (cost=0.00..885.49 rows=450 width=12) (actual time=9.295..9.422 rows=423 loops=1)"
" Filter: (active_yn_int = 0)"
" Rows Removed by Filter: 48896"
"Planning time: 0.058 ms"
"Execution time: 9.463 ms"
Suppose two tables, X and Y, are joined and there is a constant values condition on one of the joined columns, for example, X.id=Y.id and X.id in (10, 20), the constant value should be duplicated for the joined column of the second table. That is, X.id=Y.id and X.id in (10, 20) and Y.id in (10, 20).
e.g, Table XYZ has a LEFT OUTER JOIN relationship with table ABC. If there is a XYZ specific conditions and a cross table condition with ABC, create an extra ABC specific condition based on XYZ requirement and keep cross table conditions in the ON clause:
Suboptimal SQL query:
SELECT XYZ.COL1, ABC.COL1 FROM XYZ LEFT OUTER JOIN ABC ON XYZ.COLn = ABC.COLn WHERE XYZ.COL1 IN (1, 2) AND ABC.COL2=XYZ.COL1;
Optimized SQL query:
SELECT XYZ.COL1, ABC.COL1 FROM XYZ LEFT OUTER JOIN ABC ON XYZ.COLn = ABC.COLn AND ABC.COL2 IN (1, 2) AND ABC.COL2=XYZ.COL1 WHERE XYZ.COL1 IN (1, 2);
In particular, if the constant predicate has only 1 value (e.g COL1=123), the second predicate should be converted to a constant predicate too.
e.g :
Suboptimal SQL query:
SELECT XYZ.COL1, ABC.COL1 FROM XYZ LEFT OUTER JOIN ABC ON XYZ.COLn = ABC.COLn WHERE XYZ.COL1=123 AND ABC.COL2=XYZ.COL1;
Optimized SQL query:
SELECT XYZ.COL1, ABC.COL1 FROM XYZ LEFT OUTER JOIN ABC ON XYZ.COLn = ABC.COLn AND ABC.COL2=123 WHERE XYZ.COL1=123;
5. Avoid using wildcard (%) at the beginning of a predicate.
The predicate LIKE '%xyz' causes full table scan. For example:
SELECT * FROM abc WHERE COL1 LIKE '%xyz';
This is a known performance limitation in all databases.
6. Use inner join, instead of outer join if possible.
The outer join should only be used if it is necessary. Using outer join limits the database optimization options which typically results in slower SQL execution.
7. Avoid to use DISTINCT in SQL query. It should be used only if it is necessary. DISTINCT cause sorting, which slows down the SQL execution.
8. Use UNION ALL instead of UNION, if possible, as it is much more efficient.
9. In DISTINCT and GROUP BY clause use GROUP BY clause.
10. Use common table expression(CTE). Sometime when rewrite query with the use of CTE it improves performance. CTE is work just like temporary tables.
11.Use UNLOGGED keyword when creating a table. If data consistency after a server crash is not an issue, or you’re just gonna deal with a disposable table that needs that extra boost for writing — then unlogged tables are for you. (...) Data written to unlogged tables is not written to the write-ahead log, which makes them considerably faster than ordinary tables. However, they are not crash-safe: an unlogged table is automatically truncated after a crash or unclean shutdown. The contents of an unlogged table are also not replicated to standby servers. Any indexes created on an unlogged table are automatically unlogged as well.
12. Create Index to avoid sequential scan on larger Tables. If table data is less, then sometime using Index in a SQL slow down your query. If we fetch more no of data(record) from a table, then using Index is not better. It may improve your fast data retrieval by proper use of "Index-Only Scans" or "ORDER BY" when creating Index. First you have to check with Explain Plan or by pressing F7 or Shift + F7 that Index is used by query or not. Index is used for fast retrieval of DATA but it will slow down your DML and it is resource consuming.
13. In place of using BETWEEN function use >= and <=. It's comes in your index search.
eg. Location_code BETWEEN 1 and 2 , use Location_code >= 1 and Location_code <= 2 .
14. Avoid to write scalar sub query. For SELECT subqueries in overall it is better to try to join instead of using subquery at select clause. In most cases a join is better than scalar sub query.
15. Avoid to use CURSOR it slow down you query. Try to rewrite your query to avoid using CURSOR. In place of using CURSOR try to join that table (Used in cursor) in FROM clause if possible.
16. If TABLE SIZE is small, in place of using IN or EXISTS (Subquery), JOIN that table in FROM clause.If no. of compare values(Column values) used with IN operator is less ,then use IN operator in place of EXISTS.But first you have to check with your Explain Plan.
17. Partitions large tables. It will improves your performance.
18. Create some TABLESPACE as per your requirement and Use that TABLESPACE with respect to Schemas or Objects.
19. If not required don't use order by, group by, join, having, union. If required to use these, use them in outer most query.
20. Try to use Array, JSON, Analytical Functions in your
query.
21. Avoid to use order by clause.The ORDER BY keyword is used to sort
the result-set by specified columns. Without the ORDER BY clause, the result
set is returned directly without any sorting. The order is not guaranteed. Be
aware of the performance impact of adding the ORDER BY clause, as the database
needs to sort the result set, resulting in one of the most expensive operations
in SQL execution. 22. On regular basis analyze, vacuum and reindex your Table and Index.
23. Don't add unnecessary raise notice and don't Define unnecessary variables in your functions(SP).
24. If not required never use select * from <table name>
It will takes more time to get out put .
In below example I will create a table abc .
create table abc (srno serial,comp_code smallint,div_code smallint ,active_yn varchar,active_yn_int smallint);
And insert around 50 k dummy records.
When we run
select * from abc; It takes much time in compare when I run
select comp_code, div_code from abc
I will get out put much faster.
In real clusters if you hit ‘select * from table’, it may have billions of records that will run for long long time.
‘select column from table’ is a projection query on the table where query processor has to read all the rows in the table and extract the column value from each row and display it.
If you always need few columns in result, then just use SELECT col1, col2 FROM YourTable. If you SELECT * FROM Table; that is extra useless overhead.
If in future if someone adds Image/BLOB/Text type columns in your table, using SELECT * will worsen the performance for sure.
Let's say if you have Store Procedure and you use INSERT INTO newTable SELECT * FROM TABLE which runs fine BUT again if someone adds few more columns then your SP will fail saying provided columns don't match.
25. For Comparison column in place of using character datatype like 'Y' or 'N' Use Numeric datatype like 1 or 0 . In above abc table for compare between ["Y", "N"] in column active_yn , I used [1 ,0] in column active_yn_int.
e.g.
select * from abc where active_yn = 'N'
explain plan is like below:
"Seq Scan on abc (cost=0.00..885.49 rows=450 width=12) (actual time=11.773..11.920 rows=423 loops=1)"
" Filter: ((active_yn)::text = 'N'::text)"
" Rows Removed by Filter: 48896"
"Planning time: 0.062 ms"
"Execution time: 11.969 ms"
select * from abc where active_yn_int = 0
explain plan is like below:
"Seq Scan on abc (cost=0.00..885.49 rows=450 width=12) (actual time=9.295..9.422 rows=423 loops=1)"
" Filter: (active_yn_int = 0)"
" Rows Removed by Filter: 48896"
"Planning time: 0.058 ms"
"Execution time: 9.463 ms"
26. Use LIMIT clause.
When we run select * from abc;
and check explain plan the out put display like below:
"Seq Scan on abc (cost=0.00..760.19 rows=49319 width=12) (actual time=0.013..7.817 rows=49319 loops=1)"
"Planning time: 0.025 ms"
"Execution time: 10.415 ms"
Again we run above query with limit clause like below:
select * from abc limit 10;
The out put of explain plan will display like below:
"Limit (cost=0.00..0.15 rows=10 width=12) (actual time=0.026..0.027 rows=10 loops=1)"
" -> Seq Scan on abc (cost=0.00..760.19 rows=49319 width=12) (actual time=0.025..0.025 rows=10 loops=1)"
"Planning time: 0.124 ms"
"Execution time: 0.047 ms"
So, If possible Limit your select query with limit clause.
When we run select * from abc;
and check explain plan the out put display like below:
"Seq Scan on abc (cost=0.00..760.19 rows=49319 width=12) (actual time=0.013..7.817 rows=49319 loops=1)"
"Planning time: 0.025 ms"
"Execution time: 10.415 ms"
Again we run above query with limit clause like below:
select * from abc limit 10;
The out put of explain plan will display like below:
"Limit (cost=0.00..0.15 rows=10 width=12) (actual time=0.026..0.027 rows=10 loops=1)"
" -> Seq Scan on abc (cost=0.00..760.19 rows=49319 width=12) (actual time=0.025..0.025 rows=10 loops=1)"
"Planning time: 0.124 ms"
"Execution time: 0.047 ms"
So, If possible Limit your select query with limit clause.
Really it helps all DB developer to write a query in proper way.
ReplyDeleteThis is very helpful for IT Professional about ideas of Performance Tunning.
ReplyDeleteRegards,
Jawahar Kumar
Engineer | BTS Team
Tata Business Support Services Limited
Corporate Office,Thane, Mumbai
This comment has been removed by the author.
ReplyDeleteI don't know about performance tuning but by reading this I understand this concepts. So, I surely say that anybody who is not aware of it must read this article. It helps all database developers for writing better script.
ReplyDeleteIts great guidelines for developers, dba & database concern.
ReplyDeletePipe Line table function, With clause
are good concepts in case of Oracle.
Very good Sir.
Thank for sharing this article sir. Your articles very useful to me and our DBA team's. All related concepts are in single article thats amazing. I hope you'll continue this. But i have one request is that, pls publish an article about Replication and WAL. I hope you'll. Thank you so much DBA GURUJEE🙏
ReplyDeleteSure
DeleteGreat and useful information. Thanks for putting everything in one place.
ReplyDeletethank you prakash ji!!! very useful article!!
ReplyDeleteEverytime I read this blog, I found something new always.
ReplyDeleteAs DBA, this article very useful for us.
Very helpful article. Using these tips we can achieve great performance benefit. Thank you.
ReplyDeletewelcome
DeleteVery useful for performance Tuning, nice.
ReplyDeleteVery Informative article
ReplyDeleteThank you