Notes for big data query optimization in mysql: 1. To optimize the query, try to avoid full table scanning; 2. Try to avoid judging the null value of the field in the where clause; 3. in and Not in should also be used with caution; 4. Try to avoid using or to connect in the where clause; 5. Try to avoid using cursors.
The method of big data query optimization in mysql:
1. To optimize the query, you should try your best To avoid full table scans, first consider building indexes on the columns involved in where and order by.
2. You should try to avoid judging the null value of the field in the where clause, otherwise it will cause the engine to give up using the index and perform a full table scan, such as: select id from t where num is null can be in num Set the default value to 0, make sure there is no null value in the num column in the table, and then query like this:
select id from t where num=0
3. Try to avoid using the != or operator in the where clause, otherwise the engine will give up using the index and perform a full table scan.
4. Try to avoid using or to connect the conditions in the where clause, otherwise the engine will give up using the index and perform a full table scan, such as: select id from t where num=10 or num=20 You can query like this:
select id from t where num=10 union all select id from t where num=20
5.in And not in should also be used with caution, otherwise it will lead to a full table scan, such as: select id from t where num in(1,2,3) For continuous values, you can use between instead of in:
select id from t where num between 1 and 3
6. The following query will also result in a full table scan: select id from t where name like ‘ %Li%’To improve efficiency, you can consider full-text search.
7. If parameters are used in the where clause, it will also cause a full table scan. Because SQL resolves local variables only at runtime, the optimizer cannot defer the choice of an access plan until runtime; it must choose at compile time. However, if the access plan is established at compile time, the value of the variable is still unknown and thus cannot be used as an input for index selection. For example, the following statement will perform a full table scan: select id from t where num=@num
can be changed to force the query to use the index:
select id from t with(index(index name)) where num=@num
8. Try to avoid performing expression operations on fields in the where clause, which will cause the engine to give up using the index and instead Do a full table scan. For example: select id from t where num/2=100
should be changed to: select id from t where num=100*2
9. Should Try to avoid performing functional operations on fields in the where clause, which will cause the engine to give up using indexes and perform full table scans. For example: select id from t where substring(name,1,3)=’abc’, the id whose name starts with abc should be changed to:
select id from t where name like 'abc%'
10. Do not perform functions, arithmetic operations or other expression operations on the left side of “=” in the where clause, otherwise the system may not be able to use the index correctly.
11. When using an index field as a condition, if the index is a composite index, then the system must use the first field in the index as a condition to ensure that the system uses the index, otherwise the index will be will not be used, and the field order should be as consistent as possible with the index order.
12. Don’t write some meaningless queries, if you need to generate an empty table structure:
select col1,col2 into #t from t where 1=0
This kind of code will not return any result set, but it will consume system resources, it should be changed to this:
create table #t( ...)
13. Many times it is a good choice to use exists instead of in:
select num from a where num in(select num from b)
Replace with the following statement:
select num from a where exists(select 1 from b where num=a.num)
14. Not all indexes are valid for queries. SQL is optimized based on the data in the table. When the index column has a large amount of data duplication, the SQL query may not Will use the index. For example, if there are fields sex in a table, male and female are almost half in half, then even if an index is built on sex, it will not affect the query efficiency.
15. The more indexes the better, the index can improve the efficiency of the corresponding select, but it also reduces the efficiency of insert and update, because the index may be rebuilt during insert or update, so How to build an index needs to be carefully considered, depending on the specific situation. It is best not to have more than 6 indexes in a table. If there are too many, you should consider whether it is necessary to build indexes on some columns that are not frequently used.
16. Try to avoid updating the clustered index data column as much as possible, because the order of the clustered index data column is the physical storage order of the table records. Once the value of this column changes, the order of the entire table record will be adjusted. Can consume considerable resources. If the application system needs to update the clustered index data columns frequently, then it is necessary to consider whether the index should be built as a clustered index.
17. Try to use numeric fields, if onlyThe field of numerical information should not be designed as a character type as much as possible, which will reduce the performance of query and connection, and increase storage overhead. This is because the engine will compare each character in the string one by one when processing queries and connections, but for numeric types, only one comparison is enough.
18. Use varchar/nvarchar
instead of char/nchar
as much as possible, because firstly, the storage space of the variable-length field is small, which can save storage space, and secondly For queries, searching in a relatively small field is obviously more efficient.
19. Do not use select * from t
anywhere, replace “*” with a specific field list, and do not return any fields that are not used.
20. Try to use table variables instead of temporary tables. If your table variables contain large amounts of data, be aware that indexes are very limited (only primary key indexes).
21. Avoid frequent creation and deletion of temporary tables to reduce the consumption of system table resources.
22. Temporary tables are not unusable, and using them appropriately can make certain routines more efficient, for example, when you need to repeatedly reference a large table or a data set in a commonly used table. However, for one-time events, it is better to use an export table.
23. When creating a new temporary table, if you insert a large amount of data at one time, you can use select into
instead of create table
to avoid causing a large amount of log to improve the speed; if the amount of data is not large, in order to ease the resources of the system table, create table first, and then insert.
24. If temporary tables are used, all temporary tables must be explicitly deleted at the end of the stored procedure, first truncate table
, then drop table
, which avoids long-term locking of system tables.
25. Try to avoid using cursors, because the efficiency of cursors is poor. If the data operated by cursors exceeds 10,000 rows, then rewriting should be considered.
26. Before using the cursor-based method or the temporary table method, you should first find a set-based solution to solve the problem, and the set-based method is usually more effective.
27. Like temporary tables, cursors are not unusable. Using a FAST_FORWARD cursor on small data sets is generally better than other row-by-row processing methods, especially if several tables must be referenced to obtain the desired data. Routines that include Total in the result set are usually faster than using cursors. If the development time permits, you can try both the cursor-based method and the set-based method to see which method works better.
28. Set SET NOCOUNT ON
at the beginning and SET NOCOUNT OFF
at the end of all stored procedures and triggers. There is no need to send a DONE_IN_PROC
message to the client after executing each statement of stored procedures and triggers.
29. Try to avoid large transaction operations and improve system concurrency.
30. Try to avoid returning a large amount of data to the client. If the amount of data is too large, you should consider whether the corresponding demand is reasonable.
More related free learning recommendations: mysql tutorial(video)
The above are the details of what to pay attention to in query optimization of big data in mysql. For more, please pay attention to other related articles on 1024programmer.com!