wangjiuniu / pagerankdocument Goto Github PK
View Code? Open in Web Editor NEW使用WT10G数据集中的url链接关系进行pagerank,并给出pagerank值的Top100
使用WT10G数据集中的url链接关系进行pagerank,并给出pagerank值的Top100
【运行环境】 Python2.7 【文件结构说明】 1.stringToInt.py:将字符串链接关系映射为整型变量链接关系的程序 2.pageRank.py:pageRank算法函数 3.pagerank_test.py:调用pageRank.py,实现pagerank功能的程序 4.Top100rank.py:根据pageRank分值生成最终结果的程序 5.finalDocidURL.txt:最终结果文件(分值由高到低排列,两列内容为docid和对应URL) 【使用方法】 1.使用stringToInt.py,读取链接关系文件inlinks,将字符串表示的链接关系映射成为整型变量表示的链接关系,保存成文件inlinks_int(将映射关系保存在docIDtoIntId.pkl中)。 2.使用pagerank_test.py,读取整型变量表示的链接关系,使用pagerank算法计算每一个Intdocid(整型文档ID)对应的分值,保存成文件inlinks_int_res。 3.使用Top100rank.py,读取文件inlinks_int_res,使用堆排序的方法找出分值最高的前100个Intdocid(整型文档ID)。根据前面保存的映射关系docIDtoIntId.pkl,以及包含URL信息的文件docid_to_url,将选出的100个Intdocid(整型文档ID)映射回docid和对应URL,并保存为finalDocidURL.txt。 注:程序中的文件名变量需要与本地环境相适应。 【实验结果】 保存在finalDocidURL.txt文档中
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.