Code Monkey home page Code Monkey logo

higo's People

Contributors

muyannian avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

higo's Issues

扫描的分区数很多的时候(跨6个月查询),报错

2013-03-20 09:59:22 SolrCore [ERROR] org.apache.solr.common.SolrException: java.lang.Exception: 10.246.45.43:51111/solr/rpt_p4padhoc_product
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:294)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1495)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:358)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:254)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.lang.Exception: 10.246.45.43:51111/solr/rpt_p4padhoc_product
at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:461)
at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:418)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: org.apache.solr.common.SolrException: java.lang.Exception: 10.246.45.22:51160/solr/rpt_p4padhoc_product@2012111 org.apache.solr.common.SolrException: java.lang.Exception: 10.246.45.22:51160/solr/rpt_p4padhoc_product@2012111 at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:294) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1495) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:358) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:254) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Caused by: java.lang.Exception: 10.246.45.22:51160/solr/rpt_p4padhoc_product@2012111 at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:461) at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:418) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441

fieldValueCache 可以按照DOCID 分开存储

原先是按照一个分区存储,可以考虑按照docid进行分开存储

比如说 DOCID 150W的存储在一起 50100的存储在一起 100~150的存储在一起

这样具体是否能提升速度,有待研究和进一步的测试

数据改动后的索引重建

bug修正:hoc的数据产生后,会在3~7天后进行一次重新清洗,来补充数据,索引也需要进行重新更新(添加分区后,这个机制丢失),会导致在每月的上中下旬的最后一天的统计结果与hive中的不一致

partion support day and month

分区机制 海狗默认只支持按照每个月的 上中下旬分区, 添加按照天和月的方式进行分区

离线下载 的列头要展示中文

子落,这个地方和行咧沟通了么
能不能展示成中文?
子落 (15:33:53):
可以的

已经传给我了 我今天我明天处理下

张壮 (15:34:06):
OK

机器规模如何达到一千台

    海狗本身是建立在蓝鲸(java版本的storm)之上,无论是蓝鲸还是storm,我见过

的集群规模也就是20台到30台的样子,而且由于频繁的心跳,zookeeper以及调度nimbus也会
成为瓶颈,虽然没有经过测试,个人感觉能够达到千台很难。
hadoop yarn的"hdfs联邦"的给了我一些启发,为何非要只有一个集群呢?我可以
创建很多个小的集群,比如说创建一50个小集群,每个小集群小有20台机器,由一个总控管
理这些小集群的状态。
按照当前海狗一台机器上6个shard,每个小集群为120个shard,全部小集群为6000个shard。
每次查询的时候,根据查询的shards,将任务分发到不同的小集群上,每个小集群的
查询完毕的结果,在进一步合并(海狗本身支持多层次合并,所以这个不是问题)。
关键的一点,海狗在这个基础上改造,改动点小,容易实现。

worker-6601.log:2013-03-25 09:52:34 SolrCore [ERROR] facet_counts is null 10.246.45.44:54363/solr/rpt_p4padhoc_product

[taobao@adhoc7 logs]$ grep ERROR worker-6*
worker-6601.log:2013-03-25 09:52:34 SolrCore [ERROR] facet_counts is null 10.246.45.44:54363/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 09:52:34 SolrCore [ERROR] facet_counts is null 172.24.195.154:51114/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 09:54:55 SolrCore [ERROR] facet_counts is null 10.246.45.42:51112/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 09:54:55 SolrCore [ERROR] facet_counts is null 10.246.45.22:23913/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 09:55:15 SolrCore [ERROR] facet_counts is null 10.246.45.41:51113/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 09:55:15 SolrCore [ERROR] facet_counts is null 10.246.45.21:51117/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 09:55:35 SolrCore [ERROR] facet_counts is null 172.24.195.154:51114/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 09:55:35 SolrCore [ERROR] facet_counts is null 10.246.45.21:51117/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 09:57:11 SolrCore [ERROR] facet_counts is null 172.24.195.154:51114/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 09:57:11 SolrCore [ERROR] facet_counts is null 10.246.45.42:51112/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 09:59:24 SolrCore [ERROR] facet_counts is null 10.246.45.24:51119/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 10:00:06 SolrCore [ERROR] facet_counts is null 10.246.45.43:51118/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 10:03:06 SolrCore [ERROR] facet_counts is null 10.246.45.21:51117/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 10:03:48 SolrCore [ERROR] facet_counts is null 10.246.45.44:54363/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 10:05:05 SolrCore [ERROR] facet_counts is null 172.24.195.154:51114/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 10:08:16 SolrCore [ERROR] facet_counts is null 172.24.195.154:51114/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 10:09:57 SolrCore [ERROR] facet_counts is null 172.24.195.154:51114/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 10:09:57 SolrCore [ERROR] facet_counts is null 10.246.45.21:51117/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 10:14:25 SolrCore [ERROR] facet_counts is null 10.246.45.43:51118/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 10:16:26 SolrCore [ERROR] facet_counts is null 10.246.45.44:54363/solr/rpt_p4padhoc_product
worker-6601.log:2013-03-25 10:16:26 SolrCore [ERROR] facet_counts is null 10.246.45.41:51113/solr/rpt_p4padhoc_product

mergeIds response is null

ust/index]



2013-03-25 14:01:40 SolrCore [ERROR] mergeIds response is null 172.24.195.154:51276/solr/rpt_p4padhoc_product
java.lang.Exception
at org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:829)
at org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:627)
at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:606)
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:303)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1495)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:405)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:307)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
2013-03-25 14:01:40 SolrCore [ERROR] facet_counts is null 172.24.195.154:51276/solr/rpt_p4padhoc_product
java.lang.Exception
at org.apache.solr.handler.component.FacetComponent.countFacets(FacetComponent.java:324)
at org.apache.solr.handler.component.FacetComponent.handleResponses(FacetComponent.java:286)
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:303)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1495)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:405)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:307)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
2013-03-25 14:01:40 SolrCore [INFO] [rpt_p4padhoc_pr

FindSegmentsFile与LinkFSDirectory.readOnlyOpen报错

2013-03-20 12:59:36 SolrCore [INFO] facet read fail from file 'thedate'
2013-03-20 12:59:36 SolrCore [INFO] getSearcher:rpt_p4padhoc_cust@2012113@1363755480106:/disk7/taobao/bluewhile/higo/adhoc/17_16/tablelist/rpt_p4padhoc_cust/solr/data/2012113
2013-03-20 12:59:36 SolrCore [ERROR] org.apache.lucene.util.ThreadInterruptedException: java.lang.InterruptedException: sleep interrupted
at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:730)
at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:75)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:462)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:405)
at org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:38)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1044)
at org.apache.solr.request.SolrQueryRequestBase.getSearcher(SolrQueryRequestBase.java:219)
at org.apache.solr.request.SimpleFacets.(SimpleFacets.java:90)
at org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:70)
at org.apache.solr.handler.component.SearchHandler.handleRequestBody

41:51112/solr/rpt_p4padhoc_cust,10.246.45.22:48680/solr/rpt_p4padhoc_cust,10.246.45.23:51111/solr/rpt_p4padhoc_cust,&isShard=true&fsv=true&fq=thedate:[20121001+TO+20130318]} hits=186846 status=0 QTime=8630
2013-03-20 13:00:53 SolrQueryRequestBase [INFO] ref close rpt_p4padhoc_cust@2013032,0
2013-03-20 13:00:53 SolrCore [ERROR] null:org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@/disk7/taobao/bluewhile/higo/adhoc/17_16/tablelist/rpt_p4padhoc_cust/solr/data/2013023/workerspace/write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:84)
at org.apache.lucene.index.IndexWriter.(IndexWriter.java:1108)
at org.apache.lucene.index.IndexWriter.(IndexWriter.java:989)
at org.apache.lucene.store.LinkFSDirectory.readOnlyOpen(LinkFSDirectory.java:170)
at org.apache.lucene.store.LinkFSDirectory.readOnlyOpen(LinkFSDirectory.java:185)
at org.apache.solr.core.LinksStandardDirectoryFactory.open(LinksStandardDirectoryFactory.java:33)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1043)
at org.apache.solr.request.SolrQueryRequestBase.getSearcher(SolrQueryRequestBase.java:219)
at org.apache.solr.request.SimpleFacets.(SimpleFacets.java:90)
at org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:70)
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:196)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1495)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:358)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:254)

内存管理改进

目前海狗的内存使用都是大块大块的(每个field约40~50MB),当查询一个较大的时间范围或者较多的列的时候,由于内存大小限制,要不断的进行LRU,把过期的数据从内存中淘汰出去。
淘汰的数据会被gc回收,也意味着会发生full gc,full gc的时候程序会暂停。

故 新的改进思路是这样的,像memcache那样,预先申请好固定大小块的内存,每次用的时候,直接从固定大小块的内存中取一个过来,标记为有人使用,不用的时候在放回去,标记为空闲,空闲的下次可以被其他对象使用。

但是有可能程序出现异常 ,这个放回去的操作没有执行,故采用WeakHashMap 依然进行回收管理。

申请固定的块会有一些内存的浪费,但是可以减少full gc的次数。

2013-03-21 21:33:52 SolrDispatchFilter [ERROR] java.lang.NullPointerException

2013-03-21 21:33:52 SolrDispatchFilter [ERROR] java.lang.NullPointerException
at org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:777)
at org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:626)
at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:605)
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:302)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1495)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:404)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:296)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

不应该转义的

之前因为某种原因,数据从hive到higo,数据经过了转义,这个是不符合要求的
原则上海狗不应该对原始的数据做处理,故要取消这个转义,否则用户在使用的时候容易引起歧义。

速度优化

通过这次测试,暴露出很多问题,先前很多字段是我没有测试到的,总结如下:

  1. Count有极大的优化空间
  2. 原先只有2台机器的情况下,内存资源是稀缺的,但现在有10台机器,内存富裕较多,故针对数值型的计算以及所有dist计算,
    可以考虑不在像之前 通过docid->termNum->(类似视频的关键帧压缩)->termValue的准换
    而是直接采用 docid->termNum->termValue的转换,省去关键帧后,像creativeid这种重复值比较低的字段,dist,sum等速度提升不止是一倍两倍的关系
  3. 因硬盘空间富裕,frq文件不再采用zip压缩,测试过程中发现cpu使用率比较高,主要原因就是frq文件的zip解压引起

map端join

让海狗的大表可以跟小表 进行map端的join ,join的数据集小于一百万
并且 只能进行一对一和一对多的join 不能进行多对多的join

2013-03-20 14:45:05 SolrDispatchFilter [ERROR] org.mortbay.jetty.EofException

olr/data/2012102/workerspace_2@_0.tii
2013-03-20 14:45:05 SolrIndexSearcher [INFO] Opening Searcher@33b7b32c partion_rpt_p4padhoc_product@2012102@1363761822038
2013-03-20 14:45:05 SolrCore [INFO] ref clearpartion rpt_p4padhoc_product@2013032,1
2013-03-20 14:45:05 SolrCore [INFO] ref clearpartion rpt_p4padhoc_product@2012111,1
2013-03-20 14:45:05 SolrCore [INFO] BlockBufferInput close /disk6/taobao/bluewhile/higo/adhoc/16_15/tablelist/rpt_p4padhoc_product/solr/data/2012112/workerspace_2@_0.tis
2013-03-20 14:45:05 SolrCore [INFO] SolrIndexSearcher clear:rpt_p4padhoc_product@2012112@1363761822038
2013-03-20 14:45:05 SolrQueryRequestBase [INFO] ref create rpt_p4padhoc_product@2012102,1
2013-03-20 14:45:05 SolrCore [INFO] ref clearpartion rpt_p4padhoc_product@2013032,1
2013-03-20 14:45:05 SolrCore [INFO] ref clearpartion rpt_p4padhoc_product@2012111,1
2013-03-20 14:45:05 SolrCore [INFO] getSearcher:rpt_p4padhoc_product@2012122@1363761822038:/disk6/taobao/bluewhile/higo/adhoc/16_15/tablelist/rpt_p4padhoc_product/solr/data/2012122
2013-03-20 14:45:05 SolrDispatchFilter [ERROR] org.mortbay.jetty.EofException
at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:634)
at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:580)
at org.apache.solr.common.util.FastOutputStream.flushBuffer(FastOutputStream.java:184)
at org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:89)
at org.apache.solr.response.BinaryResponseWriter.write(BinaryResponseWriter.java:47)
at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:338)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:267)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

org.apache.jasper.JasperException: PWC6117: File "/tablelist.jsp" not found

当web UI 执行一段时间后,会提示jsp文件找不到

原因为 操作系统 本身会定期清理 tmp文件,jetty使用war包,要指定专属目录

-Djava.io.tmpdir=/tmp

Change it to some other folder since the OS will delete files in /tmp after a period of time.

拓展阅读
http://stackoverflow.com/questions/7124571/my-jetty-server-will-dead-after-a-long-time-why

完整报错信息

HTTP ERROR 500

Problem accessing /higo/tablelist.jsp. Reason:

PWC6117: File "/tablelist.jsp" not found

Caused by:

org.apache.jasper.JasperException: PWC6117: File "/tablelist.jsp" not found
at org.apache.jasper.compiler.DefaultErrorHandler.jspError(DefaultErrorHandler.java:73)
at org.apache.jasper.compiler.ErrorDispatcher.dispatch(ErrorDispatcher.java:359)
at org.apache.jasper.compiler.ErrorDispatcher.jspError(ErrorDispatcher.java:153)
at org.apache.jasper.compiler.JspUtil.getInputStream(JspUtil.java:894)
at org.apache.jasper.xmlparser.XMLEncodingDetector.getEncoding(XMLEncodingDetector.java:127)
at org.apache.jasper.compiler.ParserController.determineSyntaxAndEncoding(ParserController.java:360)
at org.apache.jasper.compiler.ParserController.doParse(ParserController.java:194)
at org.apache.jasper.compiler.ParserController.parse(ParserController.java:124)
at org.apache.jasper.compiler.Compiler.generateJava(Compiler.java:184)
at org.apache.jasper.compiler.Compiler.compile(Compiler.java:409)
at org.apache.jasper.JspCompilationContext.compile(JspCompilationContext.java:592)
at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:344)
at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:470)
at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:364)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:401)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

Powered by Jetty://

adhoc 离线下载排序出错

select thedate,category_level1_name,sum(e_alipay_direct_cnt),sum(e_alipay_direct_amt) from rpt_p4padhoc_auction where thedate=20130325 group by thedate,category_level1_name order by sum(e_alipay_direct_amt) desc

org.apache.solr.client.solrj.SolrServerException: java.net.SocketTimeoutException: Read timed out

2013-03-20 14:38:26 SolrStartTable [ERROR] org.apache.solr.client.solrj.SolrServerException: java.net.SocketTimeoutException: Read timed out
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:480)
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:246)
at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89)
at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:266)
at org.apache.solr.client.solrj.embedded.JettySolrRunner.checkSolrRecord(JettySolrRunner.java:218)
at com.alipay.bluewhale.core.higo.SolrStartJetty.checkSolr(SolrStartJetty.java:419)
at com.alipay.bluewhale.core.higo.SolrStartTable.checkSolr(SolrStartTable.java:453)
at com.alipay.bluewhale.core.higo.SolrStartTable.heartbeatExecute(SolrStartTable.java:401)
at com.alipay.bluewhale.core.higo.SolrStartTable.heartbeat(SolrStartTable.java:390)
at com.alipay.bluewhale.core.higo.SolrStart.heartbeat(SolrStart.java:90)
at com.alipay.bluewhale.core.higo.ShardsBolt.execute(ShardsBolt.java:86)
at com.alipay.bluewhale.core.task.executer.BoltExecutors.run(BoltExecutors.java:104)
at com.alipay.bluewhale.core.utils.AsyncLoopRunnable.run(AsyncLoopRunnable.java:54)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:77)
at org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:105)
at org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1115)
at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1373)
at org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1832)
at org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1590)
at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:995)
at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:397)
at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:170)
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:396)
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:324)
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:424)
... 13 more

at org.apache.lucene.store.Lock.obtain(Lock.java:84)

r/data/2012103/workerspace_4@_0.tii 4
2013-03-20 14:45:33 ReplicationHandler [WARN] Unable to get IndexCommit on startup
org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: SimpleFSLock@/disk6/taobao/bluewhile/higo/adhoc/16_15/tablelist/rpt_hitfake_auctionall_d/solr/data/index/lucene-1fef2e39-write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:84)
at org.apache.lucene.index.IndexWriter.(IndexWriter.java:1108)
at org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:83)
at org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:101)
at org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:171)
at org.apache.solr.update.DirectUpdateHandler2.forceOpenWriter(DirectUpdateHandler2.java:375)
at org.apache.solr.handler.ReplicationHandler.inform(ReplicationHandler.java:858)
at org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:523)
at org.apache.solr.core.SolrCore.(SolrCore.java:599)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:470)
at org.apache.solr.core.CoreContainer.createTableCore(CoreContainer.java:330)
at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:598)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
2013-03-20 14:45:33 ReplicationHandler [INFO] Commits will be reserved for 10000

二次计算

将第一次查询的结果保存起来,供二次分析

典型使用场景
首先按照某一字段求和
select a,b,c,sum(xxx) from tbl group by a,b,c

然后对求和的值按照区间分段,统计每个区间内组的个数

解决海狗查询时候因内存原因,每次扫描必须限制扫描的数据规模

之前海狗的做法是,所有的分区都并发去请求每个shards,在机器资源有限的情况下
如果分区数量过多,会产生很多次http请求,然后merger server的压力过大。
故一直以来在adhoc项目上,海狗单次扫描的数据量限制在10亿,但这显然不能满足有些需求
,故改进之。

当前的做法是分多次提交,每次只提交固定的分区数量(比如说只提交4个分区),每个shard计算完毕后,将数据dump到hdfs中
最终提交一个merger的操作(并发数量取决于hash的数量),将所有dump到hdfs中的数据,进行merger

distinct实现

1.将每个值转换为整形,可以是md5,也可以是crc32,但一定要能将数据打散的均匀。
2.使用bitset,每个bit位标记是否有值。
3.如果数据量特别大,比如说上千亿,那么使用局部的bitset来估算整体。
具体如何使用,请参见ppt ppt下载

假设我们使用10亿个bitset
那么实际上我们只存储其中的1%,取到1000万
因数据均匀,这1%与其他99%的稀疏程度一致
故最终值在1%的基础上乘以100即可

但对于像类别那种重复度特别高的则不采取局部估算整体 使用准确计算

pt=20130401000000

因网销宝接入,pt=20130401000000 这种格式的目录要兼容

查询明细 时间显示BUG

Tue May 01 08:21:39 CST 2007 Wed Nov 25 03:27:45 CST 2009
2013-04-05 Sat Aug 09 15:16:39 CST 2008 Thu Jun 10 02:32:15 CST 2010
2013-04-05 Mon Apr 16 05:16:02 CST 2007 Sun Apr 18 06:30:01 CST 2010
2013-04-05 Sat Jun 11 20:14:29 CST 2005 Wed Nov 25 04:32:03 CST 2009
2013-04-05 Wed Sep 17 00:25:02 CST 2008 Fri Sep 02 05:49:40 CST 2011
2013-04-05 Wed Apr 29 06:36:33 CST 2009 Sun Jul 18 20:45:26 CST 2010
2013-04-05 Thu Dec 25 23:02:52 CST 2008 Mon Jun 07 19:01:57 CST 2010
2013-04-05 Tue Aug 18 20:40:11 CST 2009 Mon Mar 29 06:55:06 CST 2010
2013-04-05 Wed Oct 24 23:08:49 CST 2007 Tue Dec 15 07:07:12 CST 2009
2013-04-05 Sat Jan 03 21:03:01 CST 2009 Mon Dec 07 04:24:30 CST 2009
2013-04-05 Fri Oct 26 23:48:36 CST 2007 Wed Mar 09 23:32:49 CST 2011
2013-04-05 Wed Feb 07 21:40:47 CST 2007 Mon Dec 07 20:51:52 CST 2009
2013-04-05 Wed Oct 14 08:17:24 CST 2009 Wed Jun 09 06:59:19 CST 2010

无组数限制的group by

当前海狗的group by sort,要求group by的组数小于一万,太少了
本期方案 要对这个组数给予提升, 在不降低性能的情况下能支持一百万的group,分钟级的相应能支撑千万甚至亿级别的group

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.