Introduction: This article is compiled by the editor of Programming Notes# for you. It mainly introduces the knowledge related to index seek and index scan to improve SQL efficiency. I hope it has certain reference value for you.
Index seek and index scan improve sql efficiency
Explain index seek and index scan:
The index is a B tree,
index seek is to search from Starting from the root node of the B-tree, find the target row level by level.
Index scan traverses the entire B tree from left to right.
Assume that the only target row is located on the rightmost leaf node of the index tree (assuming it is a non-clustered index, the tree depth is 2, and the leaf node occupies k pages of physical storage).
The IO caused by index seek is 4, while the IO caused by index scan is K, the performance difference is huge.
About indexes, you can carefully read the online documentation about the physical database architecture section
Do not include operations in the query conditions
These operations include string concatenation (such as: select * from Users where UserName + ‘ pig’ = ‘Zhang San pig’), the Like operation in front of the wildcard (such as: select * from tb1 where col4 like ‘%aa’), use other user-defined functions, system built-in functions, scalar functions, etc. (such as: select * from UserLog where datepart(dd, LogTime) = 3).
When SQL Server processes the above statements, it also has no way to estimate the overhead. The final result is of course clustered index scan or table scan.
Do not include operations between different columns in the same table in the query conditions
The so-called “operations” include addition, subtraction, multiplication and division or some functions (such as: select * from tb where col1 – col2 = 1997), including comparison operations (such as: select * from tb where col1 > col2). In this case, SQL Server also has no way to estimate the overhead. Regardless of whether there are indexes on col1 and col2 or the covering index on col1 and col2 is created or the index of col1 include col2 is created.
But there is a solution to this kind of query. You can create an additional calculated field on the table, whose value is set to your “operation” result, and then create an index on this field, and it will be OK.
(result set/total number of rows) is called selectivity, the larger the ratio, the higher the selectivity.
You got it, the point of this article is to be selective.
Statistical information, to put it bluntly, is how many rows of result sets there are when a certain field in the table takes a certain value. Statistical information can be said to be a selective measure, and SQL Server estimates the pros and cons of different query plans based on it.
If the total number of rows in the table is 1w, the number of sampled rows is 1w. There is only 1 row with a provider_no value of 21, and 4824 rows with a value of 500.
We know that SQLServer will cache the query plan, if there is such a stored procedure:
create proc myproc
(
@pno int
)
/>as
select * from charge where provider_no = @pno
The first time we pass in a 21, OK, it will cache the execution plan of the stored procedure as the nonclustered index seek one. Later, we passed in another 500, it’s over, the server found that it had a myproc cache, so, it was executed through nonclustered index seek, and then your accomplices saw that your query took a huge amount of IO, so you were despised up.
What does this mean? Note that if your query selectivity changes drastically, you should tell SQLServer not to cache the query plan, but to reevaluate and compile it every time. The implementation method is very simple, just add an option (recompile) to the tail of the query. And SQL2k5 also has a feature of nb, which can recompile only a part of the stored procedure each time (of course, you can also choose to recompile the entire stored procedure, depending on your needs. See the online documentation for details.)