Code Monkey home page Code Monkey logo

Comments (7)

rspeer avatar rspeer commented on August 23, 2024

I should ask what you specifically mean by "support". I don't believe there's any support lacking for Iberian Portuguese in the code.

Certainly the Portuguese content is mostly in Brazilian Portuguese, as two out of the three data sources for Portuguese are Brazilian projects. But, as I look up vocabulary differences between Brazilian and Iberian Portuguese, it seems that the Iberian words are also represented in ConceptNet:

Is there a case where Iberian Portuguese vocabulary is significantly lacking, or not being handled correctly?

from conceptnet5.

jarodium avatar jarodium commented on August 23, 2024

Hello

I meant that there is no distinction between Brazilian Portuguese and Portuguese. Although we have a international spelling agreement between Portuguese speak countries, there any some significant diffferences.

Some examples of such differences:

http://conceptnet5.media.mit.edu/web/c/pt/empresa

"conector de rede sem fio AtLocation empresa
Um lugar onde você geralmente encontra um(a) conector de rede sem fio é em um(a) empresa."

the proper way is:
Um lugar onde você geralmente encontra um conector de rede sem fios é numa empresa.

Note here: i don't know why the um(a) is present, but this is due to my lack of understanding ConceptNet 5. My guess that there is no way to distinguish a male/female concept. Conector is a male and empresa is a female ( not gender assuming here :p ) hence 'numa'.

http://conceptnet5.media.mit.edu/web/c/pt/faxineiro

faxineiro AtLocation empresa
Um lugar onde você geralmente encontra um(a) faxineiro é em um(a) empresa.

faxineiro is not Iberian for janitor. We have "empregado de limpezas" or "contínuo"

http://conceptnet5.media.mit.edu/web/c/pt/morango

morango UsedFor fazer um suco
Um(a) morango é usado(a) para fazer um suco.

we don't use the word suco for juice. We have sumo.

Also as per the new spelling agreement some words will lack a letter which is not spoken, like "acção" is now "ação" and "contracto" is now "contrato", with some exceptions being "facto".

from conceptnet5.

jarodium avatar jarodium commented on August 23, 2024

As for my comment about faxineiro, here http://www.priberam.pt/dlpo/faxineiro is a correction for it. In Brazil, they do the same that a janitor does, but in Portugal is a soldier which has the same task of a janitor. Also, my correction for 'faxineiro' is incomplete in the 'contínuo' term. They take care of cleaning in a school, but also perform other task as administration and teacher support as well as student surveillance ( at school, of course ).

from conceptnet5.

rspeer avatar rspeer commented on August 23, 2024

we don't use the word suco for juice. We have sumo.

That's there too: http://conceptnet5.media.mit.edu/web/c/pt/sumo

I don't understand what you'd be doing with ConceptNet where you need to deliberately not recognize the word "suco".

If Brazilian and Iberian Portuguese were separated under different language codes, which you might be suggesting, then it looks like instead of 332386 words of unified Portuguese, we would have 332361 words of Brazilian Portuguese and 25 words of Iberian Portuguese (the ones that are specifically marked as such in Wiktionary).

I believe that Iberian Portuguese significantly benefits from having the same language code as Brazilian Portuguese: you can get the benefit of all the Portuguese in general that was entered and curated by Brazilians, plus some specific entries for Iberian Portuguese, as long as you don't see it as some kind of error to also recognize Brazilian vocabulary.

Now, about the text that appears under some assertions when you browse to them:

When an assertion comes with a complete sentence of text, such as Um lugar onde você geralmente encontra um(a) conector de rede sem fio é em um(a) empresa, this generally indicates that it came from Open Mind Common Sense.

This means someone came to the OMCS do Brasil site, and saw a sentence frame like this (with text boxes):

Um lugar onde você geralmente encontra um(a) ________ sem fio é em um(a) ________

and they filled in the sentence. It said um(a) because the site didn't know the gender of what people will fill in.

The sentence isn't important to the structure of ConceptNet. It's just there to keep track of where the data came from. The fact that the site was "OMCS do Brasil" should indicate why all these sentences are in Brazilian Portuguese.

from conceptnet5.

jarodium avatar jarodium commented on August 23, 2024

I would be using the ConceptNet with SuperScript, a new tool for writing chatbots, since I am beginning to check that type of UI to power a proof of concept for e-commerce.

If the sentence is not important to the structure is a relief, since I thought somehow it was related to how the chatbot would interpret a similar sentence and tell my users that it does not understand Portuguese.

Thank you for clearing me on this.
Best regards
Pedro

from conceptnet5.

rspeer avatar rspeer commented on August 23, 2024

I understand.

I would not recommend using ConceptNet to generate text in any language, incidentally. You need something else for that -- something that either explicitly or implicitly understands grammar rules, has a model of how to stay on topic, and so on. ConceptNet is mostly designed to be able to recognize what text is about.

from conceptnet5.

jarodium avatar jarodium commented on August 23, 2024

I think that's the way SuperScript works.
We define a topic, let us say a grocery. A user goes to the website and ask the bot:

  • How much is a watermelon?

The bot should be able to understand the 'how much' is the price and the 'watermelon' is the product. Given these two the bot can infer ( of course we still need to provide these model sentences ) , it is supposed to trigger a search for a price for watermelons.

I think it uses the ConceptNet 5 for this kind of recognition.
But I was worried about the bot not understanding Iberian Portuguese, because it's common North Americans relate Portuguese language with Brazilian Portuguese and I have had my past share of having software in my computer using the BR thesaurus for its auto correction features...

from conceptnet5.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.