motorbike-crawler's People
motorbike-crawler's Issues
Naver Crawler Non-blocking 으로 전환해서 구현해본 결과
Spring rector로 구현 해봄 (related PR #30)
- Document를 가져오는 걸 순차적으로 Blocking 방식으로 가져오던 걸 Non-blocking 방식으로 바꿔서 구현 해봄
- Document를 크롤링 작업을 구독(Subscription)하고 있다가 작업이 완료되면 저장하는 방식으로 변경
PaxxoSaleItemCrawler Async Connection 구현
Request - Response latency가 pageNumber 증가와 비례해서 증가함.
Request를 비동기로 전송하고 Response가 오면 처리하는 Listener를 구현하면 좋을 듯.
Paxxo url redirection 문제
...(생략)&idx=682096&num=1
num (query parameter) 가 바이크 정보가 렌더링 되는 리스트의 위치를 표시.
검색어 생성은 어떻게 하면 좋으려나 (Query)
파일당 필수 포함 단어(Primary Keyword) + 선택 단어 조합(Optional Keyword) 으로 조합 가능한 경우의 수만큼 쿼리문을 생성한다.
예)
Primay Keyword :서울 맛집
Optional Keyword : {한식, 중식, 양식}
조합 쿼리 : {서울 맛집, 서울 맛집+한식, 서울 맛집+중식 , 서울 맛집+양식, 서울 맛집+한식+중식, .....
서울 맛집+한식+중식+양식}
추가사항 : 단어 조합을 트리 구조로 구성하는것도 의미 있을 듯
ERROR 6048 --- AbstractElasticsearchRepository : failed to load elasticsearch nodes
2017-11-13 03:01:23.346 ERROR 6048 --- [ main] .d.e.r.s.AbstractElasticsearchRepository : failed to load elasticsearch nodes : org.elasticsearch.client.transport.NoNodeAvailableException: None of the configured nodes are available: [{#transport#-1}{AGbEwZZ-R-umZFtFDFqhnA}{127.0.0.1}{127.0.0.1:9200}]
2017-11-13 03:01:23.397 ERROR 6048 --- [ main] .d.e.r.s.AbstractElasticsearchRepository : failed to load elasticsearch nodes : org.elasticsearch.client.transport.NoNodeAvailableException: None of the configured nodes are available: [{#transport#-1}{AGbEwZZ-R-umZFtFDFqhnA}{127.0.0.1}{127.0.0.1:9200}]
2017-11-13 03:01:23.551 WARN 6048 --- [ main] ationConfigEmbeddedWebApplicationContext : Exception encountered during context initialization - cancelling refresh attempt: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'requestMappingHandlerAdapter' defined in class path resource [org/springframework/boot/autoconfigure/web/WebMvcAutoConfiguration$EnableWebMvcConfiguration.class]: Bean instantiation via factory method failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter]: Factory method 'requestMappingHandlerAdapter' threw exception; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'mvcContentNegotiationManager' defined in class path resource [org/springframework/boot/autoconfigure/web/WebMvcAutoConfiguration$EnableWebMvcConfiguration.class]: Bean instantiation via factory method failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [org.springframework.web.accept.ContentNegotiationManager]: Factory method 'mvcContentNegotiationManager' threw exception; nested exception is java.lang.AbstractMethodError
Next step
- Save items from paxxo in repository with target url
- Index items by maker and model
20170917 Current Status
Add PaxxoApiClient
- Crawl all of sale items registered in Paxxo (DONE)
Add NaverCafeApiClient
- Search Items with a given query input from Naver Cafe (DONE)
- Crawl Paxxo items with page link(url) (DONE)
Next step
Add PaxxoSalesItemRepository (Done)
- save meta data of items such as maker-id, model-id
- save sales item data with link
- Fix version issue in elasticsearch and spring-data #15
Add NaverCrawlingService
- Crawling Items from Naver Cafe and analyze items, which are type of sales items.
Find common type of sale items from Paxxo and Naver
[Spring & ES Integration Issue] elascticsearch 5.6
elasticsearch 5.6 is supported by spring-data-elasticsearch 3.0.3
spring-data-elasticsearch 3.0.3 is compatible spring-boot 2.x.x
Check PR #27
spring data elasticsearch | elasticsearch |
---|---|
3.0.0.RC2 | 5.5.0 |
3.0.0.M4 | 5.4.0 |
2.0.4.RELEASE | 2.4.0 |
2.0.0.RELEASE | 2.2.0 |
1.4.0.M1 | 1.7.3 |
1.3.0.RELEASE | 1.5.2 |
1.2.0.RELEASE | 1.4.4 |
1.1.0.RELEASE | 1.3.2 |
1.0.0.RELEASE | 1.1.1 |
(ref: https://github.com/spring-projects/spring-data-elasticsearch)
[Spring & ES Integration Issue] ationConfigEmbeddedWebApplicationContext :
2017-11-19 18:48:03.781 WARN 4813 --- [ main] ationConfigEmbeddedWebApplicationContext : Exception encountered during context initialization - cancelling refresh attempt: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'requestMappingHandlerAdapter' defined in class path resource [org/springframework/boot/autoconfigure/web/WebMvcAutoConfiguration$EnableWebMvcConfiguration.class]: Bean instantiation via factory method failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter]: Factory method 'requestMappingHandlerAdapter' threw exception; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'mvcContentNegotiationManager' defined in class path resource [org/springframework/boot/autoconfigure/web/WebMvcAutoConfiguration$EnableWebMvcConfiguration.class]: Bean instantiation via factory method failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [org.springframework.web.accept.ContentNegotiationManager]: Factory method 'mvcContentNegotiationManager' threw exception; nested exception is java.lang.AbstractMethodError
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.