Code Monkey home page Code Monkey logo

motorbike-crawler's People

Watchers

 avatar  avatar

motorbike-crawler's Issues

Naver Crawler Non-blocking 으로 전환해서 구현해본 결과

Spring rector로 구현 해봄 (related PR #30)

  1. Document를 가져오는 걸 순차적으로 Blocking 방식으로 가져오던 걸 Non-blocking 방식으로 바꿔서 구현 해봄
  2. Document를 크롤링 작업을 구독(Subscription)하고 있다가 작업이 완료되면 저장하는 방식으로 변경

PaxxoSaleItemCrawler Async Connection 구현

Request - Response latency가 pageNumber 증가와 비례해서 증가함.
Request를 비동기로 전송하고 Response가 오면 처리하는 Listener를 구현하면 좋을 듯.

Paxxo url redirection 문제

...(생략)&idx=682096&num=1

num (query parameter) 가 바이크 정보가 렌더링 되는 리스트의 위치를 표시.

검색어 생성은 어떻게 하면 좋으려나 (Query)

파일당 필수 포함 단어(Primary Keyword) + 선택 단어 조합(Optional Keyword) 으로 조합 가능한 경우의 수만큼 쿼리문을 생성한다.

예)
Primay Keyword :서울 맛집
Optional Keyword : {한식, 중식, 양식}

조합 쿼리 : {서울 맛집, 서울 맛집+한식, 서울 맛집+중식 , 서울 맛집+양식, 서울 맛집+한식+중식, .....
서울 맛집+한식+중식+양식}

추가사항 : 단어 조합을 트리 구조로 구성하는것도 의미 있을 듯

ERROR 6048 --- AbstractElasticsearchRepository : failed to load elasticsearch nodes

2017-11-13 03:01:23.346 ERROR 6048 --- [ main] .d.e.r.s.AbstractElasticsearchRepository : failed to load elasticsearch nodes : org.elasticsearch.client.transport.NoNodeAvailableException: None of the configured nodes are available: [{#transport#-1}{AGbEwZZ-R-umZFtFDFqhnA}{127.0.0.1}{127.0.0.1:9200}]
2017-11-13 03:01:23.397 ERROR 6048 --- [ main] .d.e.r.s.AbstractElasticsearchRepository : failed to load elasticsearch nodes : org.elasticsearch.client.transport.NoNodeAvailableException: None of the configured nodes are available: [{#transport#-1}{AGbEwZZ-R-umZFtFDFqhnA}{127.0.0.1}{127.0.0.1:9200}]
2017-11-13 03:01:23.551 WARN 6048 --- [ main] ationConfigEmbeddedWebApplicationContext : Exception encountered during context initialization - cancelling refresh attempt: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'requestMappingHandlerAdapter' defined in class path resource [org/springframework/boot/autoconfigure/web/WebMvcAutoConfiguration$EnableWebMvcConfiguration.class]: Bean instantiation via factory method failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter]: Factory method 'requestMappingHandlerAdapter' threw exception; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'mvcContentNegotiationManager' defined in class path resource [org/springframework/boot/autoconfigure/web/WebMvcAutoConfiguration$EnableWebMvcConfiguration.class]: Bean instantiation via factory method failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [org.springframework.web.accept.ContentNegotiationManager]: Factory method 'mvcContentNegotiationManager' threw exception; nested exception is java.lang.AbstractMethodError

Next step

  • Save items from paxxo in repository with target url
  • Index items by maker and model

20170917 Current Status

Add PaxxoApiClient

  • Crawl all of sale items registered in Paxxo (DONE)

Add NaverCafeApiClient

  • Search Items with a given query input from Naver Cafe (DONE)

  • Crawl Paxxo items with page link(url) (DONE)

Next step
Add PaxxoSalesItemRepository (Done)

  • save meta data of items such as maker-id, model-id
  • save sales item data with link
  • Fix version issue in elasticsearch and spring-data #15

Add NaverCrawlingService

  • Crawling Items from Naver Cafe and analyze items, which are type of sales items.

Find common type of sale items from Paxxo and Naver

[Spring & ES Integration Issue] ationConfigEmbeddedWebApplicationContext :

2017-11-19 18:48:03.781 WARN 4813 --- [ main] ationConfigEmbeddedWebApplicationContext : Exception encountered during context initialization - cancelling refresh attempt: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'requestMappingHandlerAdapter' defined in class path resource [org/springframework/boot/autoconfigure/web/WebMvcAutoConfiguration$EnableWebMvcConfiguration.class]: Bean instantiation via factory method failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter]: Factory method 'requestMappingHandlerAdapter' threw exception; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'mvcContentNegotiationManager' defined in class path resource [org/springframework/boot/autoconfigure/web/WebMvcAutoConfiguration$EnableWebMvcConfiguration.class]: Bean instantiation via factory method failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [org.springframework.web.accept.ContentNegotiationManager]: Factory method 'mvcContentNegotiationManager' threw exception; nested exception is java.lang.AbstractMethodError

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.