Code Monkey home page Code Monkey logo

immoscraper's Issues

Bot Detection / Error 405

Probably since the relaunch is24 added a bot detection feature to it's website.

urllib2 and not even selenium works.

Any workaround ideas?

License

Hi balzer82,

thanks for your beautifully commented scraper and analysis notebooks :). They will be giving me a fair headstart for my own small project. Do you mind putting your work under some open licence, so I can just build on your notebooks!

There is a helpful guide at https://www.software.ac.uk/resources/guides/adopting-open-source-licence

But I always thought the open licence jungle is best mapped out in the license-chart by Robbie Morrisson:
license-chart

Thanks,
Jonas

Locations

The url for the search request contains
Bundesland and Stadt
But this often does not work
Examples:
Bayern/Ebersberg
Bayern/Erding
Bayern/Landsberg

immoscout returns error 410

Do you have any hint what is wrong with these cities?

'NoneType' object is not subscriptable

I just got started with Python as I was interessted in Immo data and wanted to produce my own scrapes. I found your instruction very helpful. My Scraper was running perfectly fine for a month. Thank you! But since the beginning of June I get the following error:

'NoneType' object is not subscriptable

The problems stems from looping through the scripts list and finding the script with the term 'IS24.resultList'

Line 60:
soup = BeautifulSoup(urlquery(url), 'html.parser')
scripts = soup.findAll('script')
for script in scripts:
#print script.text.strip()
if 'IS24.resultList' in script.text.strip():
s = script.string.split('\n')

the script.text.strip() command does no longer output any text. I have tried many variations, but don t get it to work. I am currently going through the bs4 documentation to seek alternatives. For me the problem came up after a spyder update. Though, I don t know whether this stands in any relation with the appearence of the problem.
Maybe you can help?

HTTPError

Since a couple of weeks I get the following error, without having changed anything...

Traceback (most recent call last):

File "", line 1, in
page_soup = BeautifulSoup(urllib.request.urlopen("https://www.immobilienscout24.de/Suche/de/berlin/berlin/wohnung-mieten").read(),"lxml")

File "C:\Users\Felix\Anaconda3\lib\urllib\request.py", line 222, in urlopen
return opener.open(url, data, timeout)

File "C:\Users\Felix\Anaconda3\lib\urllib\request.py", line 531, in open
response = meth(req, response)

File "C:\Users\Felix\Anaconda3\lib\urllib\request.py", line 641, in http_response
'http', request, response, code, msg, hdrs)

File "C:\Users\Felix\Anaconda3\lib\urllib\request.py", line 569, in error
return self._call_chain(*args)

File "C:\Users\Felix\Anaconda3\lib\urllib\request.py", line 503, in _call_chain
result = func(*args)

File "C:\Users\Felix\Anaconda3\lib\urllib\request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)

HTTPError

Is this representing a loading error?
The URL is correct and works if implemented manually. Does this mean that Immoscout has implemented means to protect itself against web scrapers?

I have also tried ths script with implementing the changing browser profiles which aims to act like more natural querries. This has also not helped.

Thanks for your help!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.