digital-engineering / airbnb-scraper Goto Github PK

View Code? Open in Web Editor NEW

193.0 18.0 65.0 169 KB

Airbnb Scraper: Advanced Airbnb Search using Scrapy

License: GNU General Public License v3.0

Python 100.00%

airbnb-scraper's Introduction

Airbnb Scraper: Advanced Airbnb Search using Scrapy

Disclaimer: No longer maintained

This project is not currently maintained, due to difficulty in using scrapy to make requests to the Airbnb API. Project is on hold until further notice. Currently exploring a simpler approach here: https://github.com/JoeBashe/stl-scraper

Use Airbnb's unofficial API to efficiently search for rental properties. Regex matching, ranged search, open matched properties in a browser, save to CSV, xlsx, or ElasticSearch (alpha).

Notes

Airbnb's API is subject to change at any moment, which would break this scraper. They've already changed it several times in the past. Also, using this probably violates their TOS. Please only use for educational or research purposes.
The scraper was recently updated to work with Airbnb's new v3 GraphQL API. Some features are still being updated.
If you get 403 Forbidden errors when running this scraper, try browsing the Airbnb site in your web browser from the same computer first, then try running the script again.

Requirements

Python 3.10+
Scrapy
openpyxl
ElasticSearch 7+ if using elasticsearch pipeline
see requirements.txt for details

Installation (nix)

# Create venv
python3.10 -m venv env

# Enable venv
. env/bin/activate

# Install required packages
pip install -Ur requirements.txt

# Create settings.py
cp deepbnb/settings.py.dist deepbnb/settings.py

# @NOTE: Don't forget to set AIRBNB_API_KEY in settings.py. To find your API key, 
# search Airbnb using Chrome, open dev tools, and look for to the url parameter  
# named "key" in async requests to /api/v2/explore_tabs under the Network tab.

Configuration

Edit deepbnb/settings.py for settings. I've created some custom settings which are documented below. The rest are documented in https://docs.scrapy.org/en/latest/topics/settings.html.

Example Usage

Minimal scraper usage:

scrapy crawl airbnb -a query="Colorado Springs, CO" -o colorado_springs.csv

Advanced examples:

Madrid, fixed dates

scrapy crawl airbnb \
    -a query="Madrid, Spain" \
    -a checkin=2023-10-01 \
    -a checkout=2023-11-30 \
    -a max_price=1900 \
    -a min_price=1800 \
    -a neighborhoods="Acacias,Almagro,Arganzuela,Argüelles,Centro,Cortes,Embajadores,Imperial,Jerónimos,La Latina,Malasaña,Moncloa,Palacio,Recoletos,Retiro,Salamanca,Sol" \
    -s MUST_HAVE="(atico|attic|balcon|terra|patio|outdoor|roof|view)" \
    -s CANNOT_HAVE="studio" \
    -s MINIMUM_WEEKLY_DISCOUNT=20 \
    -s WEB_BROWSER="/usr/bin/chromium" \
    -o madrid.xlsx

New York ranged date search

scrapy crawl airbnb \
    -a query="New York, NY" \
    -a checkin="2023-01-22+7-0" \
    -a checkout="2023-02-22+14-3" \
    -a max_price=1800 \
    -s CANNOT_HAVE="guest suite" \
    -s MUST_HAVE="(walking distance|short walk|no car needed|walk everywhere|metro close|public transport)" \
    -o newyork.csv

Ranged date queries

If you have flexible checkin / checkout dates, use the ranged search feature to search a range of checkin / checkout dates.

Search checkin date range +5 days -2 days

scrapy crawl airbnb \
    -a query="Minneapolis, MN" \
    -a checkin="2023-10-15+5-2" \
    -a checkout="2023-11-15" \
    -o minneapolis.csv

This search would look for rentals in Minneapolis using Oct 15 2023 as base check-in date, and also searching for rentals available for check-in 2 days before, up to 5 days after. In other words, check-ins from Oct 13 to Oct 20. This is specified by the string +5-2 appended to the checkin date 2023-10-15+5-2. The string must always follow the pattern+[days_after]-[days_before] unless [days_after] and [days_before] are equal, in which case you can use +-[days]. The numbers may be any integer 0 or greater (large numbers untested).

Search checkin date +5 days -2 days, checkout date + or - 3 days

scrapy crawl airbnb \
    -a query="Florence, Italy" \
    -a checkin="2023-10-15+5-2" \
    -a checkout="2023-11-15+-3" \
    -o firenze.csv

Scraping Description

After running the crawl command, the scraper will start. It will first run the search query, then determine the quantity of result pages, and finally iterate through each of those, scraping each of the property listings on each page.

Scraped items (listings) will be passed to the default item pipeline, where, optionally, the description, name, and reviews.description fields will be filtered using either or both of the CANNOT_HAVE and MUST_HAVE regexes. Filtered items will be dropped. Accepted items can be optionally opened in a given web browser, so that you can easily view your search results.

Finally, the output can be saved to an xlsx format file for additional filtering, sorting, and inspection.

Parameters

You can find the values for these by first doing a search manually on the Airbnb site.

query: City and State to search. (required)
checkin, checkout: Check-in and Check-out dates.
min_price, max_price: Minimum and maximum price for the period. The Airbnb search algorithm calculates this based upon search length. It will be either the daily or monthly price, depending on the length of the stay.
neighborhoods: Comma-separated list of neighborhoods within the city to filter for.
output: Name of output file. Only xlsx output is tested.

Settings

These settings can be edited in the settings.py file, or appended to the command line using the -s flag as in the example above.

CANNOT_HAVE="<cannot-have-regex>"
Don't accept listings that match the given regex pattern. (optional)
FIELDS_TO_EXPORT="['field1', 'field2', ...]"
Can be found in settings.py. Contains a list of all possible fields to export, i.e. all fields of AirbnbScraperItem. Comment items to remove undesired fields from output. Applies only to xlsx output.
MINIMUM_MONTHLY_DISCOUNT=30
Minimum monthly discount. (optional)
MINIMUM_WEEKLY_DISCOUNT=25
Minimum weekly discount. (optional)
MUST_HAVE="(<must-have-regex>)"
Only accept listings that match the given regex pattern. (optional)
ROOM_TYPES="['Camper/RV', 'Campsite', 'Entire guest suite']"
Room Types to filter. (optional)
SKIP_LIST="['12345678', '12345679', '12345680']"
Property IDs to filter. (optional)
WEB_BROWSER="/path/to/browser %s"
Web browser executable command. (optional)

Examples:
- MacOS
  WEB_BROWSER="open -a /Applications/Google\ Chrome.app"
- Windows
  WEB_BROWSER="C:\Program Files (x86)\Google\Chrome\Application\chrome.exe"
- Linux
  WEB_BROWSER="/usr/bin/google-chrome"

Elasticsearch

Enable deepbnb.pipelines.ElasticBnbPipeline in settings.py

Credits

This project was originally inspired by this excellent blog post by Luca Verginer.
In converting this to use the unofficial API, https://stevesie.com/apps/airbnb-api was very helpful.
This analysis of Bali Airbnbs provided inspiration for more eloquent code.

airbnb-scraper's People

Contributors

Stargazers

Watchers

Forkers

jodanchen yuhaopp tanpv jeffjsand vsolano liugods alaminshakeel bbrangeo lalit-sh drewthayer mpercich urbanbigdatacentre jeaniek wskwon hanchenresearch domagoj-gusic-sb pyscrape ssitb samueltyh iamdeepak42 thiagobg liljacoozi0 wabsto1 sjemkaa hainakus metallicafan212 kdub454 rl1987 jbertscher garayd python-conda cmagorian artemiydev chi11n offie-dot-co siyuanfoundation paulopperman ashenkin reece323 sohamthirty moabifokotsane jvmastro guyanik littlehorse-brother jxtin captainkw zding1016 turvey87 hdado sprutner vxrnxk numan awesome1128 slepetys one-vs luzdelsol668 misterme000 ahricat jcasselm

airbnb-scraper's Issues

Can't run minimum example.

Hello! Great repo , but i can't figure out why basic example don't work.
scrapy crawl airbnb -a query="Colorado Springs, CO" -o colorado_springs.csv

I got:
HTTP status code is not handled or not allowed

I took api_key from api/v3/ExploreSearch. I tried to change USER_AGENT to different values, but it didn't help.

Can't run examples

2021-12-05 10:59:34 [scrapy.core.engine] ERROR: Error while obtaining start requests
Traceback (most recent call last):
File "/home/maciej/env/lib/python3.8/site-packages/scrapy/core/engine.py", line 129, in _next_request
request = next(slot.start_requests)
File "/home/maciej/airbnb-scraper/deepbnb/spiders/airbnb.py", line 136, in start_requests
yield self.__explore_search.api_request(self.__query, params, self.__explore_search.parse_landing_page)
File "/home/maciej/airbnb-scraper/deepbnb/api/ExploreSearch.py", line 63, in api_request
headers = self._get_search_headers()
File "/home/maciej/airbnb-scraper/deepbnb/api/ApiBase.py", line 51, in _get_search_headers
return required_headers | {
TypeError: unsupported operand type(s) for |: 'dict' and 'dict'

Error for flexible date requests

When entering a flexible data like +-1 or +1 or -1, I get the error

Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/scrapy/core/engine.py", line 129, in _next_request
request = next(slot.start_requests)
File "/Users/ingmarsturm/git/airbnb-scraper/deepbnb/spiders/airbnb.py", line 128, in start_requests
yield from self.__explore_search.perform_checkin_start_requests(
File "/Users/ingmarsturm/git/airbnb-scraper/deepbnb/api/ExploreSearch.py", line 133, in perform_checkin_start_requests
checkin_start_date, checkin_range = self._build_date_range(self.__checkin, checkin_range_spec)
File "/Users/ingmarsturm/git/airbnb-scraper/deepbnb/api/ExploreSearch.py", line 144, in _build_date_range
base_date = date.fromisoformat(iso_date)
ValueError: Invalid isoformat string: '2021-06-15+1-1'

According to the datetime manual, this seems to be the expected behavior:

https://docs.python.org/3/library/datetime.html

classmethod date.fromisoformat(date_string)
Return a date corresponding to a date_string given in the format YYYY-MM-DD:
datetime.date(2019, 12, 4)
This is the inverse of date.isoformat(). It only supports the format YYYY-MM-DD.

How to get AirBNB API key?

I'm not finding my API key with your suggested search string... any recommendation for a new method to find this? thanks...

Cannot get it to run

(env) macca@macca:~/Documents/airbnb-scraper-master$ scrapy crawl airbnb -a query="Colorado Springs, CO" -o colorado_springs.csv 2021-03-29 15:41:56 [scrapy.utils.log] INFO: Scrapy 2.4.1 started (bot: deepbnb) 2021-03-29 15:41:56 [scrapy.utils.log] INFO: Versions: lxml 4.6.2.0, libxml2 2.9.10, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 20.3.0, Python 3.8.5 (default, Jan 27 2021, 15:41:15) - [GCC 9.3.0], pyOpenSSL 20.0.1 (OpenSSL 1.1.1i 8 Dec 2020), cryptography 3.3.1, Platform Linux-5.8.0-45-generic-x86_64-with-glibc2.29 2021-03-29 15:41:56 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.epollreactor.EPollReactor 2021-03-29 15:41:56 [scrapy.crawler] INFO: Overridden settings: {'AUTOTHROTTLE_ENABLED': True, 'BOT_NAME': 'deepbnb', 'CONCURRENT_REQUESTS_PER_DOMAIN': 10, 'COOKIES_ENABLED': False, 'DOWNLOAD_DELAY': 10, 'FEED_EXPORT_FIELDS': ['name', 'url', 'price_rate', 'price_rate_type', 'total_price', 'room_and_property_type', 'min_nights', 'max_nights', 'latitude', 'longitude', 'monthly_price_factor', 'weekly_price_factor', 'room_type', 'person_capacity', 'amenities', 'review_count', 'review_score', 'rating_accuracy', 'rating_checkin', 'rating_cleanliness', 'rating_communication', 'rating_location', 'rating_value', 'star_rating', 'satisfaction_guest', 'description', 'neighborhood_overview', 'notes', 'additional_house_rules', 'interaction', 'access', 'transit', 'response_rate', 'response_time', 'photos'], 'NEWSPIDER_MODULE': 'deepbnb.spiders', 'ROBOTSTXT_OBEY': True, 'SPIDER_MODULES': ['deepbnb.spiders'], 'TELNETCONSOLE_ENABLED': False, 'USER_AGENT': 'deepbnb (+https://digitalengineering.io)'} 2021-03-29 15:41:56 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.memusage.MemoryUsage', 'scrapy.extensions.feedexport.FeedExporter', 'scrapy.extensions.logstats.LogStats', 'scrapy.extensions.throttle.AutoThrottle'] 2021-03-29 15:41:56 [scrapy.middleware] INFO: Enabled downloader middlewares: ['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware', 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats'] 2021-03-29 15:41:56 [scrapy.middleware] INFO: Enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] 2021-03-29 15:41:56 [scrapy.middleware] INFO: Enabled item pipelines: ['deepbnb.pipelines.DuplicatesPipeline', 'deepbnb.pipelines.BnbPipeline'] 2021-03-29 15:41:56 [scrapy.core.engine] INFO: Spider opened 2021-03-29 15:41:56 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2021-03-29 15:41:56 [airbnb] INFO: starting survey for: Colorado Springs, CO 2021-03-29 15:41:56 [scrapy.core.engine] ERROR: Error while obtaining start requests Traceback (most recent call last): File "/home/macca/Documents/airbnb-scraper-master/env/lib/python3.8/site-packages/scrapy/core/engine.py", line 129, in _next_request request = next(slot.start_requests) File "/home/macca/Documents/airbnb-scraper-master/deepbnb/spiders/airbnb.py", line 131, in start_requests yield self.__explore_search.api_request(self.__query, params, self.__explore_search.parse_landing_page) File "/home/macca/Documents/airbnb-scraper-master/deepbnb/api/ExploreSearch.py", line 67, in api_request headers = self._get_search_headers() File "/home/macca/Documents/airbnb-scraper-master/deepbnb/api/ApiBase.py", line 51, in _get_search_headers return required_headers | { TypeError: unsupported operand type(s) for |: 'dict' and 'dict' 2021-03-29 15:41:56 [scrapy.core.engine] INFO: Closing spider (finished) 2021-03-29 15:41:56 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'elapsed_time_seconds': 0.003824, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2021, 3, 29, 7, 41, 56, 608685), 'log_count/ERROR': 1, 'log_count/INFO': 9, 'memusage/max': 68800512, 'memusage/startup': 68800512, 'start_time': datetime.datetime(2021, 3, 29, 7, 41, 56, 604861)} 2021-03-29 15:41:56 [scrapy.core.engine] INFO: Spider closed (finished)

Wrong results

scrapy crawl bnb -a query="Mezmay, RU" -o MezmayRussia.csv

scrapy crawl bnb -a query="Mezmay, Russia" -o MezmayRussia.csv

scrapy crawl bnb -a query="Mezmay, Krasnodar Krai, Russia" -o MezmayRussia.csv

all bring me anything but listings in Mezmay, Krasnodar Krai, Russia, how come?

Symbol not found: _exsltDateXpathCtxtRegister - lxml/etree.cpython-39-darwin.so

Hi guys

Anybody experience this issue perhaps?

Traceback (most recent call last):
  File "/Users/***/workspace/airbnb-scraper/env/bin/scrapy", line 5, in <module>
    from scrapy.cmdline import execute
  File "/Users/***/workspace/airbnb-scraper/env/lib/python3.9/site-packages/scrapy/__init__.py", line 12, in <module>
    from scrapy.spiders import Spider
  File "/Users/***/workspace/airbnb-scraper/env/lib/python3.9/site-packages/scrapy/spiders/__init__.py", line 11, in <module>
    from scrapy.http import Request
  File "/Users/***/workspace/airbnb-scraper/env/lib/python3.9/site-packages/scrapy/http/__init__.py", line 11, in <module>
    from scrapy.http.request.form import FormRequest
  File "/Users/***/workspace/airbnb-scraper/env/lib/python3.9/site-packages/scrapy/http/request/form.py", line 10, in <module>
    import lxml.html
  File "/Users/***/workspace/airbnb-scraper/env/lib/python3.9/site-packages/lxml/html/__init__.py", line 53, in <module>
    from .. import etree
ImportError: dlopen(/Users/***/workspace/airbnb-scraper/env/lib/python3.9/site-packages/lxml/etree.cpython-39-darwin.so, 2): Symbol not found: _exsltDateXpathCtxtRegister
  Referenced from: /Users/***/workspace/airbnb-scraper/env/lib/python3.9/site-packages/lxml/etree.cpython-39-darwin.so
  Expected in: flat namespace
 in /Users/***/workspace/airbnb-scraper/env/lib/python3.9/site-packages/lxml/etree.cpython-39-darwin.so

Some system environment info

System Version: macOS 11.3.1 (20E241)
Kernel Version: Darwin 20.4.0
Python Version: 3.9.2

Invalid API Key

I've looked for the API key under the title explore_tabs but that doesn't exist. I do find an API key under the search get request but it is the same API key everyone uses: d306zoyjsyarp7ifhu67rjxn52tv0t20

See here: https://stackoverflow.com/questions/45822440/airbnb-api-key-not-unique-per-user

When I run the scraper I get KeyError: 'dora' and when I enter the get request URL into my browser, I get the error {"errors":[{"message":"Response not successful: Received status code 400","locations":[{"line":1,"column":1}],"path":[],"extensions":{"response":{"statusCode":400,"body":{"error_code":400,"error":"invalid_key","error_message":"Invalid API key."}},"code":"Bad Request"}}],"data":{}}

If this doesn't mean that the API key is wrong (it shouldn't because everyone is using the same), then how can I fix this?

Cannot crawl any page though the code runs correctly

more than one spider issue and absence of prices

Hello, Thanks for the script!
I got two issues though:

I couldn't use advanced filters: "error: running 'scrapy crawl' with more than one spider is no longer supported". Windows 10 home, command prompt ; Madrid example was used.
Scraper collected everything with minimal query, except columns : price_rate, price_rate_type, total_price, monthly_price_factor, weekly_price_factor. Chrome Version 104.0.5112.82 (Official Build) (64-bit), File was csv, 233 rows.

Forbidden by robots.txt

I get the following message and an output file of size 0.

[scrapy.downloadermiddlewares.robotstxt] DEBUG: Forbidden by robots.txt: <GET https://www.airbnb.com/api/v3/ExploreSearch?operationName=ExploreSearch&locale=en&currency=USD&variables=%7B%22request%22%3A%7B%22metadataOnly%22%3Afalse%2C%22version%22%3A%221.7.9%22%2C%22itemsPerGrid%22%3A20%2C%22tabId%22%3A%22home_tab%22%2C%22refinementPaths%22%3A%5B%22%2Fhomes%22%5D%2C%22source%22%3A%22search_blocks_selector_p1_flow%22%2C%22searchType%22%3A%22search_query%22%2C%22query%22%3A%22Oslo%2C+Norway%22%2C%22roomTypes%22%3A%5B%5D%2C%22cdnCacheSafe%22%3Afalse%2C%22simpleSearchTreatment%22%3A%22simple_search_only%22%2C%22treatmentFlags%22%3A%5B%22simple_search_1_1%22%2C%22flexible_dates_options_extend_one_three_seven_days%22%5D%2C%22screenSize%22%3A%22small%22%7D%7D&extensions=%7B%22persistedQuery%22%3A%7B%22version%22%3A1%2C%22sha256Hash%22%3A%2213aa9971e70fbf5ab888f2a851c765ea098d8ae68c81e1f4ce06e2046d91b6ea%22%7D%7D>
2021-02-19 15:45:35 [scrapy.core.engine] INFO: Closing spider (finished)
2021-02-19 15:45:35 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/exception_count': 1,
'downloader/exception_type_count/scrapy.exceptions.IgnoreRequest': 1,
'downloader/request_bytes': 230,
'downloader/request_count': 1,
'downloader/request_method_count/GET': 1,
'downloader/response_bytes': 5299,
'downloader/response_count': 1,
'downloader/response_status_count/200': 1,
'elapsed_time_seconds': 0.326508,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2021, 2, 19, 23, 45, 35, 938760),
'log_count/DEBUG': 2,
'log_count/INFO': 9,
'memusage/max': 284704768,
'memusage/startup': 284704768,
'response_received_count': 1,
'robotstxt/forbidden': 1,
'robotstxt/request_count': 1,
'robotstxt/response_count': 1,
'robotstxt/response_status_count/200': 1,
'scheduler/dequeued': 1,
'scheduler/dequeued/memory': 1,
'scheduler/enqueued': 1,
'scheduler/enqueued/memory': 1,
'start_time': datetime.datetime(2021, 2, 19, 23, 45, 35, 612252)}

Spider stops before finishing

This may be out of my beginner knowledge. But I only get a fraction of airbnb records before it stops. For example, I get on average 250-280 records. I did a query for "Boone, NC" and just "NC" and I got about the same amount of records. But for completely different listings. I know there are more records even for the city, because when I search on Airbnb for city I get 1000+ records for Boone, NC. Any idea on how this might be fixed? I can drop any files that I am working with here if needed. This is a great script and other than this issue, it works awesome.

MINIMUM_MONTHLY_DISCOUNT on the airbnb site

Hi,

thank you for this great work!

I was wondering if you know how to set up the MINIMUM_MONTHLY_DISCOUNT and MINIMUM_WEEKLY_DISCOUNT parameters while doing a regular search on the airbnb website? I tried adding this : &MINIMUM_MONTHLY_DISCOUNT=30 in the url but it didn’t work.

I also tried to run the example code you provide scrapy crawl airbnb -a query="Colorado Springs, CO" -o colorado_springs.csv after setting everything up on google colab but ran into the error:

2021-04-17 13:22:12 [airbnb] INFO: starting survey for: Colorado Springs, CO
2021-04-17 13:22:12 [scrapy.core.engine] ERROR: Error while obtaining start requests
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/scrapy/core/engine.py", line 129, in _next_request
    request = next(slot.start_requests)
  File "/content/airbnb-scraper/deepbnb/spiders/airbnb.py", line 136, in start_requests
    yield self.__explore_search.api_request(self.__query, params, self.__explore_search.parse_landing_page)
  File "/content/airbnb-scraper/deepbnb/api/ExploreSearch.py", line 63, in api_request
    headers = self._get_search_headers()
  File "/content/airbnb-scraper/deepbnb/api/ApiBase.py", line 61, in _get_search_headers
    'X-CSRF-Without-Token':             '1',
TypeError: unsupported operand type(s) for |: 'dict' and 'dict'
2021-04-17 13:22:12 [scrapy.core.engine] INFO: Closing spider (finished)

Any idea why?
Thanks again!

How can I get all the eligible search results?

Hi,

I was running your code, but every time I could at most get 300 results back. I checked on airbnb, and found that if you searched manually, you would get at most 300 results as well. So maybe they are the same case. But is there any way that I can get all the results using this script? Thanks!

KeyError: 'rate_with_service_fee'

Hi,
Thanks for this cool tool.
I obtain this error on my first launch :

Traceback (most recent call last):
  File "c:\users\sebastien\appdata\local\programs\python\python37-32\lib\site-packages\scrapy\utils\defer.py", line 117, in iter_errback
    yield next(it)
  File "c:\users\sebastien\appdata\local\programs\python\python37-32\lib\site-packages\scrapy\utils\python.py", line 345, in __next__
    return next(self.data)
  File "c:\users\sebastien\appdata\local\programs\python\python37-32\lib\site-packages\scrapy\utils\python.py", line 345, in __next__
    return next(self.data)
  File "c:\users\sebastien\appdata\local\programs\python\python37-32\lib\site-packages\scrapy\core\spidermw.py", line 64, in _evaluate_iterable
    for r in iterable:
  File "c:\users\sebastien\appdata\local\programs\python\python37-32\lib\site-packages\scrapy\spidermiddlewares\offsite.py", line 29, in process_spider_output
    for x in result:
  File "c:\users\sebastien\appdata\local\programs\python\python37-32\lib\site-packages\scrapy\core\spidermw.py", line 64, in _evaluate_iterable
    for r in iterable:
  File "c:\users\sebastien\appdata\local\programs\python\python37-32\lib\site-packages\scrapy\spidermiddlewares\referer.py", line 338, in <genexpr>
    return (_set_referer(r) for r in result or ())
  File "c:\users\sebastien\appdata\local\programs\python\python37-32\lib\site-packages\scrapy\core\spidermw.py", line 64, in _evaluate_iterable
    for r in iterable:
  File "c:\users\sebastien\appdata\local\programs\python\python37-32\lib\site-packages\scrapy\spidermiddlewares\urllength.py", line 37, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "c:\users\sebastien\appdata\local\programs\python\python37-32\lib\site-packages\scrapy\core\spidermw.py", line 64, in _evaluate_iterable
    for r in iterable:
  File "c:\users\sebastien\appdata\local\programs\python\python37-32\lib\site-packages\scrapy\spidermiddlewares\depth.py", line 58, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "c:\users\sebastien\appdata\local\programs\python\python37-32\lib\site-packages\scrapy\core\spidermw.py", line 64, in _evaluate_iterable
    for r in iterable:
  File "C:\Users\Sebastien\Documents\Studio\airbnb-scraper\deepbnb\spiders\bnb.py", line 90, in parse
    listings = self._get_listings_from_sections(tab['sections'])
  File "C:\Users\Sebastien\Documents\Studio\airbnb-scraper\deepbnb\spiders\bnb.py", line 222, in _get_listings_from_sections
    rate_with_service_fee = pricing['rate_with_service_fee']['amount']
KeyError: 'rate_with_service_fee'

Edit: By comment all lines use "rate_with_service_fee" the script works but not all rooms are scraped.

Thanks
Sebastien

How to scrap data directly with listing id, instead of searching?

Hi,

Your code is impressive, but now I have already get the listing ids, so there is no need to do the searching step. I tried to modify the code but I failed. Can anyone provide some hints & suggestions? Thanks!

Mistakes in the README

In addition there are some mistakes in the readme.md.
In particular, in the code examples for Florence, NY and maybe other examples, where instead of checkout parameter, you have written twice checkin.

digital-engineering / airbnb-scraper Goto Github PK

airbnb-scraper's Introduction

Airbnb Scraper: Advanced Airbnb Search using Scrapy

Disclaimer: No longer maintained

This project is not currently maintained, due to difficulty in using scrapy to make requests to the Airbnb API. Project is on hold until further notice. Currently exploring a simpler approach here: https://github.com/JoeBashe/stl-scraper

Notes

Requirements

Installation (nix)

Configuration

Example Usage

Minimal scraper usage:

Advanced examples:

Madrid, fixed dates

New York ranged date search

Ranged date queries

Search checkin date range +5 days -2 days

Search checkin date +5 days -2 days, checkout date + or - 3 days

Scraping Description

Parameters

Settings

Elasticsearch

Credits

airbnb-scraper's People

Contributors

Stargazers

Watchers

Forkers

airbnb-scraper's Issues

Recommend Projects

Recommend Topics

Recommend Org