Code Monkey home page Code Monkey logo

creepy_crawler's Introduction

Creepy Crawler is a full-stack search engine application. It's inspired by popular search engine apps. It allows the user to make queries, see their history, and set their theme.

Python SQLAlchemy Flask JavaScript React Redux Scrapy HTML CSS AWS

Crawl the web ๐Ÿ•ท

search

  • Queries from the frontend are received asynchronously by Flask with help from the Crochet library where they are processed and passed to the Scrapy spiders.
    import crochet
    crochet.setup()
    @crochet.wait_for(timeout=200.0)
    def scrape_with_crochet(raw_query):
      partitioned_query = ...
      query_regex = re.compile(...)
      dispatcher.connect(_crawler_result, signal=signals.item_scraped)
      spiders = [...]
      if len(partitioned_query):
          for spider in spiders: crawl_runner.crawl(spider, query_regex=query_regex)
          eventual = crawl_runner.join()
          return
  • Settings are passed from Flask backend to Scrapy framework through configuration object.
    ...
    from scrapy.utils.project import get_project_settings
    ...
    settings = get_project_settings()
    settings_dict = json.load(open('app/api/routes/settings.json'))
    settings.update(settings_dict)
    crawl_runner = CrawlerRunner(settings)
  • Each spider runs a broad crawl through the web, starting from a seed URL.
    class BroadCrawler2(scrapy.Spider):
      """Broad crawling spider."""
    
      name = 'broad_crawler_2'
      start_urls = ['https://example.com/']
    
      def parse(self, response):
          """Follow links."""
          try:
              all_text = response.css('*:not(script):not(style)::text')
              for text in all_text:
                  query_found = bool(re.search(self.query_regex, text.get()))
                  if query_found: yield { 'url': response.request.url, 'text': text.get() }
                  
          except: print(f'End of the line error for {self.name}.')
    
          yield from response.follow_all(css='a::attr(href)', callback=self.parse)

Create custom themes ๐ŸŽจ

custom themes

  • AWS integration allows users to add backgrounds and profile images of their choice.

Look over your search history ๐Ÿ”

history

  • The user can conveniently switch between 24 and 12 hour time.
  • Moreover, NATO timezone abbreviations are specially parsed for users with altered native settings.

Enjoy advanced interactions with your themes ๐Ÿงฎ

theme interaction

Contact

Errors I encountered and conquered:

creepy_crawler's People

Contributors

mastergrant137 avatar dependabot[bot] avatar

Stargazers

Anzo52 avatar Concrete18 avatar Josue Lugaro avatar  avatar

Watchers

 avatar

Forkers

ayanzino

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.