Code Monkey home page Code Monkey logo

Comments (2)

fabiospampinato avatar fabiospampinato commented on May 24, 2024

It's hard to say, these neural networks are black boxes.

Definitely the sentence being quite short makes the job tougher for it, the longer it is the easier it should be to get an accurate detection. Also not only it's small, but almost half of it is just some random name, which surely must mess up its internal calculations, the longer the sentence is the less unlikely it would be that like half of it is just names. Here the network sees some ngrams that may just happen to be from frequent in german than elsewhere, ngrams that should just be ignored because they are part of somebody's name.

Maybe it could be fixed by making the network bigger. Mainly I think I need more data, but the problem is I need about the same amount of data for every supported language, and the dataset I'm using doesn't have tens of thousands of sentences or more for every language.

In general it won't be 100% accurate though, the more language are supported, the shorter each sentence, the smaller the network, and the higher the inaccuracy.

from lande.

fabiospampinato avatar fabiospampinato commented on May 24, 2024

Maybe a way to sort of fix this would be to get 100k english sentences, getting them translated for all supported languages, and using those for the training instead 🤔 It might be an interesting approach.

from lande.

Related Issues (2)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.