1024programmer Nginx Use nginx to limit frequent crawling by web crawlers

Use nginx to limit frequent crawling by web crawlers

The amount of crawling by spiders has increased sharply, resulting in a high server load. Finally, the ngx_http_limit_req_module module of nginx is used to limit the crawling frequency of Baidu Spider. Baidu Spider is allowed to crawl 200 times per minute, and the redundant crawl request returns 503.
nginx configuration:
#Global Configuration
limit_req_zone $anti_spider zOne=anti_spider:60m rate=200r/m;
#In a server
limit_req zOne=anti_spider burst=5 nodelay;
if ($http_user_agent ~* “baiduspider”) {
set $anti_spider $http_user_agent;
}

Parameter description:
The rate=200r/m in the command limit_req_zone means that only 200 requests can be processed per minute.
The burst=5 in the instruction limit_req means that the maximum concurrency is 5. That is, only 5 requests can be processed at the same time.
The nodelay in the instruction limit_req indicates that when the burst value has been reached, when a new request is made, 503 will be returned directly
The IF part is used to judge whether it is the user agent of Baidu Spider. If so, assign a value to the variable $anti_spider. In this way, only Baidu spiders are restricted.
For detailed parameter descriptions, you can view the official documentation.
http://nginx.org/en/docs/http/ngx_http_limit_req_module.html#limit_req_zone

This module uses a leaky bucket algorithm to limit requests.
For the leaky bucket algorithm, see http://baike.baidu.com/view/2054741.htm
For related codes, please check the nginx source code file src/http/modules/ngx_http_limit_req_module.c
The core part of the code is the ngx_http_limit_req_lookup method.

This article is from the internet and does not represent1024programmerPosition, please indicate the source when reprinting:https://www.1024programmer.com/use-nginx-to-limit-frequent-crawling-by-web-crawlers/

author: admin

Previous article
Next article

Leave a Reply

Your email address will not be published. Required fields are marked *

Contact Us

Contact us

181-3619-1160

Online consultation: QQ交谈

E-mail: [email protected]

Working hours: Monday to Friday, 9:00-17:30, holidays off

Follow wechat
Scan wechat and follow us

Scan wechat and follow us

Follow Weibo
Back to top
首页
微信
电话
搜索