Code Monkey home page Code Monkey logo

zz-wordfreq's Introduction

ZZ WordFreq

Current Version

  • WordFreq 更新至 v0.3

    原来的 BNC 数据来自 Adam Kilgarriff, 现标记为BNC.AK

    本次新增来自 Paul Nation 的 BNC 数据, 标记为BNC.PN, 其特点是将所有单词按 family 组织, 按词频每 1000 个 word families 一个大组, 共 14 组, 14000 个最常用 word family, 实际含单词(包括各种单复数形式等)近50000. 比如, society/societal/societies 的词频数都是 1000, 表示此 family 属最常见的1000个 word families.

    btw: "BNC Top-15000" 的版本来源不明, 目前已弃用

Introduction

ZZ WordFreq

top 60000 words from BNC.AK/ANC/COCA, 14000 word families from BNC.PN

  • wordfreq.zz.dsl
  • wordfreq.zz.ann

ZZ's BNC Top-15000 Word List (En)

word & frequency only

  • bnc15000.ann
  • bnc15000.dsl

ZZ's BNC Top-15000 Word List (En-Cn)

word & frequency & very simple Chinese translation

  • bnc15000cn.ann
  • bnc15000cn.dsl

Reference

  • BNC (British National Corpus)

http://www.natcorp.ox.ac.uk

http://www.kilgarriff.co.uk/bnc-readme.html

http://www.victoria.ac.nz/lals/about/staff/paul-nation

http://www.audiencedialogue.net/bnc.html

  • OANC (Open American Naitonal Corpus)

http://www.anc.org/data/anc-second-release/

  • COCA (The Corpus Of Contemporary American English)

http://corpus.byu.edu/coca/

http://www.pdawiki.com/forum/thread-13667-1-1.html

Screenshot

screenshot

screenshot

"[ANC] 6776" 表示在ANC词频中列第6776位

注释

  • 已移除所有含数字/部分标点符号/全部非ASCII字符的单词

  • OANC 中将名词单复数 和 动词原型/过去式/过去分词 合并作为同一个单词处理

zz-wordfreq's People

Contributors

jjzz avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.