Code Monkey home page Code Monkey logo

wordnet-br's Introduction

Open-WordNet-PT

About the files

The file was generated by combining the following data:

  • Princeton WordNet 3.0 was used to obtain English glosses and English terms for synset IDs.

  • The unreleased 2010-12 version UWN and MENTA provided candidate terms in Portuguese, candidate glosses in Portuguese (from Wikipedia), and candidate terms in Spanish.

  • The EuroWordNet base concept list (5000_bc.xml) provides the base concept numbers. The original file was mapped from WordNet 2.0 to 3.0 using the mappings from WN-Map. When multiple mappings for a WordNet 2.0 synset existed, all possible WordNet 3.0 synsets were kept. Hence, there may be multiple entries with the same base concept number.

Guidelines for WordNet translation

  1. Read the English gloss and the English words.
  2. Come up with Portuguese words that express the same meaning as the English gloss and have the part-of-speech indicated by the first letter of the WordNet synset identifer (n: noun, v: verb, a: adjective, r: adverb) and write them into "PT-Words-Man".
  3. Optionally: Write a Portuguese gloss into the "PT-Gloss" field. This may be shorter than the English gloss. If the gloss contains English example sentences, then only translate them if their translations sound natural in Portuguese and if the translation actually contains the Portuguese words added to the synset.

Example:

<row>
 <BC>4</BC>
 <WN-3.0-Synset>n6269</WN-3.0-Synset>
 <PT-Words-Man>vida</PT-Words-Man>
 <PT-Words-Candidates>vida</PT-Words-Candidates>
 <EN-Gloss>living things collectively; "the oceans are teeming with life"</EN-Gloss>
 <EN-Words>life</EN-Words>
 <PT-Gloss>coisas vivas, tomadas coletivamente; "os oceanos estão repletos de vida"</PT-Gloss>
 <PT-Gloss-Prop />
 <Spa-Words-Prop>vida</Spa-Words-Prop>
 <Comments />
</row>

Additional considerations for Step 2:

  • Be careful not to be misguided by English words with multiple meanings. You can use the Portuguese and Spanish candidates as a guide, but keep in mind that they were automatically generated and may be entirely wrong. The main criterion is whether Portuguese word corresponds to the English gloss.
  • The PT-Words-Man field can contain multiple words, separated by comma, or alternatively you can also have more than one element. In either case, the words should ideally be sorted by relevance (the most commonly used ones first).
  • If an entry has been checked and it seems that there are no relevant Portuguese words to express a concept then use If there are expressions that could be used to express the concept, but these expressions are not real words or lexicalized expressions that would appear in a dictionary, then use the following syntax: mover reflexivamente

It might be a good idea to have a WordNet browser open as well when doing the annotation, so that you can check hyponyms/hypernyms (or subclasses/parent classes):

http://www.lexvo.org/uwn/entity/s/n2084071

Another good page to leave open is some online English-Portuguese translation dictionary.

Team

  • Alexandre Rademaker
  • Gerard de Melo
  • Valeria de Paiva
  • Rafael Haeusler

License

Creative Commons License
OpenWN-PT by EMAp, Getulio Vargas Foundation is licensed under a Creative Commons Attribution-ShareAlike 3.0 Brazil License.
Based on a work at github.com.

Take a look in the file LICENSE.

wordnet-br's People

Contributors

arademaker avatar rfhaeusler avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.