hanc00l / wooyun_public Goto Github PK
View Code? Open in Web Editor NEWThis repo is archived. Thanks for wooyun! 乌云公开漏洞、知识库爬虫和搜索 crawl and search for wooyun.org public bug(vulnerability) and drops
Home Page: http://www.wooyun.org
This repo is archived. Thanks for wooyun! 乌云公开漏洞、知识库爬虫和搜索 crawl and search for wooyun.org public bug(vulnerability) and drops
Home Page: http://www.wooyun.org
unzip elasticsearch-analysis-ik-1.9.4.zip
这一步应该是
unzip elasticsearch-analysis-ik-1.9.4.zip -d elasticsearch-analysis-ik
不然会解压到当前目录一堆东西
我在原来的上面升级,到了导数据那步会卡住,已经修改了超时时间,一次2小时,一次6小时,都没有倒完,然后重新下虚拟机了,不知道是不是我个人的问题,服务器上跑的,8核,8g内存,性能应该不是瓶颈
心下虚拟机,启动es的时候提示有问题,也不让用root跑,把es的文件夹全部重新给hancool以后,可以跑起来了
如题
公开漏洞纪念版网盘失效
搜索相关厂商,搜索自己的id,自己提交的一个都没搜到,全文搜索就找到几个自己评论过的。
flask作为web server基本能满足一些简单的和小规模的应用需求,如果有更高要求的,可使用tornado/app.py。使用前请先sudo pip install tornado安装依赖包,然后在tornador目录下运行app.py就OK了。
虚拟机百度网盘下载地址都失效了,能否更新一下百度网盘下载地址,谢谢!
我cd wooyun_public再cd elasticsearch-2.3.4/bin显示错误
NAT模式俩机子不通,但是我设置了桥接模式,进去还是不行。能教教我么?
在使用过程中, 发现整体依托于虚拟机有点庞大,随想了解下有制作docker版本的计划吗?
Exception happened during processing of request from ('172.16.80.1', 52437)
Traceback (most recent call last):
File "/usr/lib/python2.7/SocketServer.py", line 593, in process_request_thread
self.finish_request(request, client_address)
File "/usr/lib/python2.7/SocketServer.py", line 334, in finish_request
self.RequestHandlerClass(request, client_address, self)
File "/usr/lib/python2.7/SocketServer.py", line 651, in init
self.finish()
File "/usr/lib/python2.7/SocketServer.py", line 710, in finish
self.wfile.close()
File "/usr/lib/python2.7/socket.py", line 279, in close
self.flush()
File "/usr/lib/python2.7/socket.py", line 303, in flush
self._sock.sendall(view[write_offset:write_offset+buffer_size])
发生这个错误的原因为:服务器在接受一个请求,如果处理时间较长,当服务器还没有处理完而客户端中断了连接时(比如关闭了浏览器),由于flask server在发送数据时没有确认连接状态而直接进行flush,就会导致发生这个错误。这个异常只会影响服务端的单个请求,不会影响并行的其它请求。
感兴趣的同学可以参见stackoverflow上的两个贴子(链接见下面)。flask是一个推荐用于小型产品或开发环境的web框架,如果是具有高并发的环境,可能就不适合了,需要使用其它web框架。
根据stackoverflow上的方法,可以采用app.run(threaded=True),但并不能从根本上解决Broken pipe的异常发生。
http://stackoverflow.com/questions/12591760/flask-broken-pipe-with-requests
http://stackoverflow.com/questions/31265050/how-to-make-an-exception-for-broken-pipe-errors-on-flask-when-the-client-discon
同上
"虚拟机1:在2016年6月底爬的wooyun全部漏洞库和知识库内容"
与readme里的描述不符啊
mongo里也只有一点数据100条+20条
rt
最好不输出还能向apache一样有log
求助,用虚拟机搭建好后映射到外网开放查询,图片地址为虚拟机本地地址,无法正常加载。
..比如我想看猪猪侠发的漏洞 但是..只有搜索功能 怎么找啊?
mysql 才 2s..望改进
按照markdown里面的步骤,每一项都配置成功,中间也没有报错。但是搜索时,只显示2015年之前的文章和漏洞。把elasticsearch的进程kill掉后,虽然搜素速度变慢,但是却能搜索出2016年的内容。莫非是在同步elasticsearch数据时出了问题?如何能重新同步呢?
我输中文全文搜索查公开漏洞页面就卡住了,app.py的命令行里只收到“GET / HTTP……”的请求,大牛求解
你这图片资源和原本图片命名不一样了,是按什么规则修改了嘛?
mongodb搜索的确慢得难以忍受了,所以,这几天一直在研究用elasticsearch引擎来快速搜索内容,经过测试已完成了配置和代码的测试,配置文档和虚拟机准备在这个周末会上传到网上。
要使用elasticsearch搜索,可以直接下载即将打包好的虚拟机(包含wooyun网站的数据),或者根据我测试的配置文档自己手动配置。
家里网速实在太慢了,所以争取周末能传完虚拟机吧。
欢迎大家一起探讨和交流。
可否把那个14Gvm 封装成iso
mongodb数据库备份出来单独打包
请问ubuntu login和pw 是不是就是hancool/qwe123呀?如果是的话请问为什么登入不了
我下载了2017年07月04日的虚拟机。
登陆进去找不到那些目录。。。想请问是不是进入wooyun_publich目录下的flask,运行./app.py,启动web服务就行了
但是那个目录在哪
elasticsearch默认的最大分页数是10000条记录(也就是500页),因此超过500页会报错。解决办法:
在命令行下执行
curl -XPUT "http://localhost:9200/wooyun/_settings" -d '{ "index" : { "max_result_window" : 500000 } }'
修改默认最大分页记录。
根据网盘下载虚拟机4个rar文件,解压出错,虚拟机无法找到一个磁盘文件
那我把新的数据库放在哪里 然后再离线缓存?
打算用4w的那个虚拟机去爬那个8w的虚拟机,以达到完全复原,有什么需要注意的事情?或者任何建议?我不大确定你scrapy里的设置,我还在阅读你的代码
现在用的电脑太错了,虚拟机带不起来;我想问一下您有资源是直接以网页的形式呈现的吗?
好像也没有图形界面,登到那个hancool账号进去以后不知道要干啥了
而且好像因为更新源的问题也没办法安装xinit
当不修改vi /usr/local/lib/python2.7/dist-packages/mongo_connector/doc_managers/elastic2_doc_manager.py
页面的时候,只同步了几分钟,Logging to mongo-connector.log就中断了,bugs只有500个,drops文章一篇也没有.
然后我按照所说修改了
sudo vi /usr/local/lib/python2.7/dist-packages/mongo_connector/doc_managers/elastic2_doc_manager.py
将:
self.elastic = Elasticsearch(hosts=[url],**kwargs.get('clientOptions', {}))
修改为:
self.elastic = Elasticsearch(hosts=[url],timeout=200, **kwargs.get('clientOptions', {}))
删除Elasticsearch data 下的目录, 然后重启服务 service mongodb restart 普通账户运行 elasticsearch-2.3.4/bin/elasticsearch -d
当我sudo mongo-connector -m localhost:27017 -t localhost:9200 -d elastic2_doc_manager
的时候,我发现Logging to mongo-connector.log同步是并不能像您所说的那样,大概完全同步需要30分钟,我只进行了几分钟,导致没有同步完全,bugs只有11000个,drops文章一篇也没有
然后就去看看mongo-connector 同步时的日志,谷歌了一遍没有找到解决办法..........心酸,然后又去修改
将:
self.elastic = Elasticsearch(hosts=[url],timeout=200, **kwargs.get('clientOptions', {}))
修改为:
self.elastic = Elasticsearch(hosts=[url],timeout=20000, **kwargs.get('clientOptions', {}))
删除Elasticsearch data下的目录 重新进行同步,同样只同步了几分钟,Logging to mongo-connector.log就中断了,这次bugs只有9500个,drops文章一篇也没有
cat mongo-connector.log 的日志如下:
cat mongo-connector.log
2016-11-21 01:44:43,039 [CRITICAL] mongo_connector.oplog_manager:630 - Exception during collection dump
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/mongo_connector/oplog_manager.py", line 583, in do_dump
upsert_all(dm)
File "/usr/local/lib/python2.7/dist-packages/mongo_connector/oplog_manager.py", line 567, in upsert_all
dm.bulk_upsert(docs_to_dump(namespace), mapped_ns, long_ts)
File "/usr/local/lib/python2.7/dist-packages/mongo_connector/util.py", line 32, in wrapped
return f(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/mongo_connector/doc_managers/elastic2_doc_manager.py", line 229, in bulk_upsert
for ok, resp in responses:
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/helpers/__init__.py", line 162, in streaming_bulk
for result in _process_bulk_chunk(client, bulk_actions, raise_on_exception, raise_on_error, **kwargs):
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/helpers/__init__.py", line 87, in _process_bulk_chunk
resp = client.bulk('\n'.join(bulk_actions) + '\n', **kwargs)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/utils.py", line 69, in _wrapped
return func(*args, params=params, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/__init__.py", line 785, in bulk
doc_type, '_bulk'), params=params, body=self._bulk_body(body))
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/transport.py", line 327, in perform_request
status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/connection/http_urllib3.py", line 112, in perform_request
raw_data, duration)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/connection/base.py", line 62, in log_request_success
body = body.decode('utf-8')
File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
MemoryError
2016-11-21 01:44:43,054 [ERROR] mongo_connector.oplog_manager:638 - OplogThread: Failed during dump collection cannot recover! Collection(Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True, replicaset=u'rs0'), u'local'), u'oplog.rs')
2016-11-21 01:44:44,009 [ERROR] mongo_connector.connector:304 - MongoConnector: OplogThread <OplogThread(Thread-2, started 140037811336960)> unexpectedly stopped! Shutting down
cat oplog.timestamp 里面什么也没有
求辅助!!!!!!!!!!
不好意思,请问可以单独上传一份mongo数据库文件吗,那个镜像文件也是解压crc出错,但是可以运行。想把数据库文件导出来,但是连接不上,无法删除mongod.lock,提示read-only file system,google了大半天也没有解决。。
Internal Server Error
The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.
咋搞啊.......
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.