Code Monkey home page Code Monkey logo

sentido's People

Contributors

pedroarvela avatar

Stargazers

 avatar

Watchers

 avatar  avatar

sentido's Issues

Scrap Portuguese Wikipedia

In gitlab by @PedroArvela on Feb 20, 2017, 15:22

Extract articles from Portuguese Wikipedia.

Split articles into paragraphs, use the following as separator.

 . fim-de-parágrafo . 

Overall disambiguation design plan

In gitlab by @PedroArvela on May 30, 2017, 23:50

For word w1 in context c1.

  • Search for co-occurence clusters of w1.
  • Filter co-occurrences in clusters so only co-occurrences of the same type in c1 exist.
  • Compare words in resulting filtered clusters with the words in c1.

Optimize overall performance

In gitlab by @PedroArvela on Jun 3, 2017, 16:20

The database population with values is extremely slow. There are some simple improvements which can be done to improve performance:

  • Batch as many inserts as possible in a single transaction.
  • Use prepared statements where possible.
  • Use simpler primary keys for as many tables as possible.
  • Ensure no SELECTs are executed during writes to prevent slowdowns.
  • Reduce repeated reads to the database as much as possible.

Regenerate Deep DB with context info

In gitlab by @PedroArvela on May 30, 2017, 23:46

Recreate db_deep.db with additional ids for each individual context.

Each sentence/paragraph/snippet should have a unique id used to limit searches of co-occurrences.

Overall induction design plan

In gitlab by @PedroArvela on May 23, 2017, 23:49

For word w1:

  • Find all (wx, wy) co-occurrences in all contexts of w1.
    • Apply restrictions here.
  • Run graph clustering algorithm on (wx, wy) pairs.
  • Reconstruct words from their IDs

State of Art

In gitlab by @PedroArvela on Feb 20, 2017, 00:43

  • Research Word Sense Disambiguation
  • Research word2vec applied to WSI
  • Search for recent graph-based WSI
  • Research neural-networks applied to WSI

Template IST

In gitlab by @PedroArvela on Feb 6, 2017, 16:07

http://academica.tecnico.ulisboa.pt/files/sites/54/guia-de-preparacao-da-dissertacao-1516.pdf

  • capa;
  • agradecimentos (facultativo);
  • resumo e palavras-chave;
  • índice;
  • lista de quadros e figuras e lista de abreviações;
  • referências bibliográficas;
  • anexo(s), se existirem.

Formatação

  • tamanho A4;
  • capa branca com imagem a cores;
  • tipo de letra: Arial (ou semelhante);
  • texto a preto;
  • espaçamento a 1,5 linhas;
  • tamanho de letra: 10 pontos;
  • notas de pé-de-página com espaçamento de 1 linha. Usar moderadamente, com tipo de letra de 9 pontos;
  • margens: 2,5 centímetros nos quatro lados;
  • número de página: em numeração arábica em baixo centrado ou à direita;
  • não usar cabeçalho/rodapé, excepto para número de página em 9 pontos;
  • no caso de ser necessária a inclusão de desenhos de projeto de dimensão superior a A4, estes deverão ser apresentados num volume de anexos.

Capa

  • logótipo do IST
  • nome da Instituição;
  • imagem;
  • título integral da dissertação;
  • subtítulo (facultativo);
  • nome completo do candidato (obrigatório);
  • nome do curso por extenso;
  • orientador(es) (máximo 2), indicando o nome completo;
  • júri: presidente, orientador (apenas um dos indicados anteriormente), vogais;
  • data (mês e ano).

Perguntas 1ª reunião

In gitlab by @PedroArvela on Feb 18, 2017, 15:46

  • Como executar fases do STRING?
  • Onde está o CETEMPúblico e Bick, 2006?
  • Como/Quando devo pedir o LXDSemVectors corpus?
  • Onde colocar o código que estiver a desenvolver?
  • Linguagem de programação preferida?
  • Standards para documentação?
  • Devo experimentar com outras abordagens mais modernas (aka word2vec)?
  • Será possível submeter para algum journal ou conferência?

Future Work

In gitlab by @PedroArvela on May 31, 2017, 15:14

  • Ver se a base de dados preforms melhor com os valores dos corpora merged ou separate
  • Adicionar a informação obtida ao Syntax Deep Explorer
  • Analizar tipo de dependências a incluir
  • Syntax Deep Explorer é cego aos objetivos e pode não incluir a target word na lista de co-occurrências

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.