web crawler - Scrapy: Stop crawling a domain and hop to the next if a condition is met -


i'd write bfo broad crawler following:

  • begin first url
  • try find links impressum regex: '.*mpressum.*' (translation: imprint)
  • check if condition met. in case if postal code in range
  • if condition met continue crawling page
  • if condition not met stop crawling domain blacklist future crawls.
  • continue next domain

how can implement behavior in scrapy?

basically i'm doing because want answer following question:
domains in germany in postal code range?

my code mess, learning scrapy @ moment.

you can use allowed_domains variables in scraper. when condition met remove domain allowed_domains. not cancel queued downloads believe not let queue new ones.

ps: refer https://doc.scrapy.org/en/latest/topics/spider-middleware.html#scrapy.spidermiddlewares.offsite.offsitemiddleware


Comments

Popular posts from this blog

python Tkinter Capturing keyboard events save as one single string -

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

javascript - Z-index in d3.js -