Code Monkey home page Code Monkey logo

unitex-pt-br's Introduction

datasets-br

Describing the datasets-br directives and using this project as point of generic discussions.

Dataset-BR directives

  1. To post qualified datasets in the Datahub.io;
  2. To unify, by curatory process, a set of Wikidata fragments if items, or commom instances of an item;
  3. To unify terminology to express CSV colunm names, table and column semantics (SchemaOrg conventions when possible)
  4. Digital preservation (CSV files and data dumps from original soruces) of the curated datasets;
  5. Monitoring/auditing Wikidata and OpenStreetMap changes, in the context of the curated datasets.

Use as an ecosystem of datasets

Example of use with 2 BR's datasets, state-codes and city-codes.

Operating with pure SQL or SQL-unifier will be easy to merge with other datasets... With PopstgreSQL you can offer datasets in an standard API with PostgreREST (or its descendents pREST and PostGraphile), or plug-and-play with SchemaOrg standards, FrictionlessData standards (and tools), etc.

Documentation

... under construction

Conventions for data provenance and prepare.


  Contents and data of this project are dedicated to

unitex-pt-br's People

Contributors

ppkrauss avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

unitex-pt-br's Issues

Get only transducers from Graph

The aim of this respository (unitex-pt-br) is to control versions, to offer all in open formats and to compare dictionaries... Not to be used as source or to produce alternative dictionaries.

This repository not need the layout of the Unitex Graphs, only the compiled transducers from it.

Check generated forms

See V005 graph, using "cortar" as sample.

... onde foram parar as 600

grep abafar, DELAS.csv 
	abafar,N004
	abafar,V005
grep ,abafar Delaf2015v04.dic
	abafar,abafar.V:U1s
	abafar,abafar.V:U3s
	abafar,abafar.V:W
	abafar,abafar.V:W1s
	abafar,abafar.V:W3s


grep cortar, DELAS.csv 
	cortar,V005
	entrecortar,V005
	intercortar,V005
	recortar,V005
grep cortar, Delaf2015v04.dic 
	cortar,cortar.V:U1s
	cortar,cortar.V:U3s
	cortar,cortar.V:W
	cortar,cortar.V:W1s
	cortar,cortar.V:W3s


grep ,beber Delaf2015v04.dic   
  ~2522 linhas! 
	beba-a,beber.V+PRO:Y3s
	beba-as,beber.V+PRO:Y3s
	beba,beber.V:S1s
	beba,beber.V:S3s
	beba,beber.V:Y3s
	bebais,beber.V:S2p
	beba-lhe,beber.V+PRO:Y3s
	beba-lhes,beber.V+PRO:Y3s
	bebam,beber.V:S3p
	bebam,beber.V:Y3p
	beba-me,beber.V+PRO:Y3s
	bebam-lhe,beber.V+PRO:Y3p
	bebam-lhes,beber.V+PRO:Y3p
	bebam-me,beber.V+PRO:Y3p
	bebam-na,beber.V+PRO:Y3p
	...
	bebera,beber.V:Q1s
	bebera,beber.V:Q3s
	bebendo-te,beber.V+PRO:G

	beberada,beberar.V:Kfs
	beberadas,beberar.V:Kfp
	beberado,beberar.V:Kms
	beberados,beberar.V:Kmp
	beberagem,beberagem.N:fs
	beberagens,beberagem.N:fp
	beberai,beberar.V:Y2p
	beberai-la,beberar.V+PRO:P2p
	beberai-las,beberar.V+PRO:P2p
	beberai-lo,beberar.V+PRO:P2p
	beberai-los,beberar.V+PRO:P2p
	beberai-nos,beberar.V+PRO:P2p

Split DELAS into DELAS and DELAS-Pr

There are a lot of "pure named entity" as proper nom, that are not real "dictionary words".

Examples: abel,N004+Pr, abelson,N004+Pr, abélson,N004+Pr, abigail,N104+Pr, abília,N104+Pr, abílio,N004+Pr, abraão,N004+Pr, abraham,N004+Pr, abrantes,N306+Pr, abrão,N004+Pr, zico,N004+Pr, zilda,N104+Pr, zimbábue,N304+Pr, zingarelli,N306+Pr, zoroastro,N004+Pr, zucolotto,N306+Pr, zurique,N104+Pr

Many are usual human given names (modern as zico, zilda or classic as zoroastro) or surnames (zingarelli, zucolotto). Other are commom toponyms, as country names, city names (abrantes,zurique), etc.

So, at DELAS-pr must include a column indicating the type of entity where the name is usually used (ex. Italy is a country-name but in Brasil there is also a female name).

There are other sources of names and its use-statistics, see here datasets-br/prenomes or datasets-br/city-codes, for confirmed Brazilian names, and world-cities, etc. for international.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.