web crawler - Scrapy: Stop crawling a domain and hop to the next if a condition is met -
i'd write bfo broad crawler following:
- begin first url
- try find links impressum
regex: '.*mpressum.*'
(translation: imprint) - check if condition met. in case if postal code in range
- if condition met continue crawling page
- if condition not met stop crawling domain blacklist future crawls.
- continue next domain
how can implement behavior in scrapy?
basically i'm doing because want answer following question:
domains in germany in postal code range?
my code mess, learning scrapy @ moment.
you can use allowed_domains
variables in scraper. when condition met remove domain allowed_domains
. not cancel queued downloads believe not let queue new ones.
Comments
Post a Comment