python - Following hyperlink and "Filtered offsite request" -

i know there several related threads out there, , have helped me lot, still can't way. @ point running code doesn't result in errors, nothing in csv file. have following scrapy spider starts on 1 webpage, follows hyperlink, , scrapes linked page:

from scrapy.http import request scrapy.spider import basespider scrapy.selector import htmlxpathselector scrapy.item import item, field  class bbritem(item):     year = field()     appraisaldate = field()     propertyvalue = field()     landvalue = field()     usage = field()     landsize = field()     address = field()      class spiderbbrtest(basespider):     name = 'spiderbbrtest'     allowed_domains = [""]     start_urls = [',etage-a,side-a&gade=septembervej&hus_nr=29&ipostnr=2730']      def parse2(self, response):                 hxs = htmlxpathselector(response)         bbrs2 ="id('evaluationcontrol')/div[2]/div")         bbrs = iter(bbrs2)         next(bbrs)         bbr in bbrs:             item = bbritem()             item['year'] ="table/tbody/tr[1]/td[2]/text()").extract()             item['appraisaldate'] ="table/tbody/tr[2]/td[2]/text()").extract()             item['propertyvalue'] ="table/tbody/tr[3]/td[2]/text()").extract()             item['landvalue'] ="table/tbody/tr[4]/td[2]/text()").extract()             item['usage'] ="table/tbody/tr[5]/td[2]/text()").extract()             item['landsize'] ="table/tbody/tr[6]/td[2]/text()").extract()             item['address']  = response.meta['address']             yield item      def parse(self, response):         hxs = htmlxpathselector(response)         parturl = ''.join("id('searchresult')/tr/td[1]/a/@href").extract())         url2 = ''.join(["", parturl])         yield request(url=url2, meta={'address':"id('searchresult')/tr/td[1]/a[@href]/text()").extract()}, callback=self.parse2) 

i trying export results csv file, nothing file. running code, however, doesn't result in errors. know it's simplyfied example 1 url, illustrates problem.

i think problem not telling scrapy want save data in parse2 method.

btw, run spider scrapy crawl spiderbbr -o scraped_data.csv -t csv

you need modify yielded request in parse use parse2 callback.

edit: allowed_domains shouldn't include http prefix eg:

allowed_domains = [""] 

try , see if spider still runs correctly instead of leaving allowed_domains blank


Popular posts from this blog

curl - PHP fsockopen help required -

HTTP/1.0 407 Proxy Authentication Required PHP -

java - More than one row with the given identifier was found: 1, for class: com.model.Diagnosis -