Code Monkey home page Code Monkey logo

Comments (19)

lbourdois avatar lbourdois commented on August 18, 2024 6

Hi everyone,

I mention you here because I can't include more than two URL links and mention more than 2 people on the Hugging Face forum.

This topic aims to add a "fr" tag to models in French that don't have them at the moment so that they can be visible by the largest number of people via the Hub (see above for more informations).

As I can't add it myself, I'm trying to get in touch with the authors of the concerned models to update their metadata.

Thank you in advance for your cooperation,

from hub-docs.

stefan-it avatar stefan-it commented on August 18, 2024 2

Hi @lbourdois ,

sorry for that! I've uploaded the model cards for our @dbmdz models incl. the correct language tag :)

from hub-docs.

stefan-it avatar stefan-it commented on August 18, 2024 2

@PhilipMay could maybe introduce the language tag for the 'T-Systems-onsite/cross-en-fr-roberta-sentence-transformer` model.

from hub-docs.

punyajoy avatar punyajoy commented on August 18, 2024 2

Hate-speech-CNERG can you add the tag "fr" to the metadata of the following model please ?

This organisation is composed of @punyajoy, @Debjoy10, pinguing them to check this out

Added the language tag 👍

from hub-docs.

lbourdois avatar lbourdois commented on August 18, 2024 2

All the models listed are now well tagged or a Hub PR has been submitted for them to be tagged. I am therefore closing this issue. Thank you all :)

from hub-docs.

lbourdois avatar lbourdois commented on August 18, 2024 1

Hi @julien-c

What you are planning seems to address the problem.

I don't know if until this feature is available, it would be possible to just display a generic message to a user pushing a new model on the hub to tell him to think about filling in the tags (of languages in this case, but we can also think about the task handled by the model for example) (+ think about filling in the model's card?).
I don't think it's very time consuming and this simple reminder would limit future proofreading work.
I say that because for the test I just did looking at the last 30 models added to the hub (https://huggingface.co/models?sort=modified), out of the 28 NLP models, 23 did not have the language tag.

For the French models I indicated, I will as you suggest try to contact the authors and will come back to you in 10-14 days to indicate those who have not responded.

from hub-docs.

PhilipMay avatar PhilipMay commented on August 18, 2024 1

@PhilipMay could maybe introduce the language tag for the 'T-Systems-onsite/cross-en-fr-roberta-sentence-transformer` model.

Thanks for the hint. Done.

from hub-docs.

abhilash1910 avatar abhilash1910 commented on August 18, 2024 1

Hi @lbourdois ,
Have added the french tag for https://huggingface.co/abhilash1910/french-roberta

from hub-docs.

elishowk avatar elishowk commented on August 18, 2024 1

Hate-speech-CNERG can you add the tag "fr" to the metadata of the following model please ?

This organisation is composed of @punyajoy, @Debjoy10, pinguing them to check this out

from hub-docs.

elishowk avatar elishowk commented on August 18, 2024 1

Hi @ydshieh, if you get this message, could you add the tag "fr" to the metadata of the following model please ?

  • huggingface.co/ydshieh/wav2vec2-large-xlsr-53-French

Thanks

from hub-docs.

elishowk avatar elishowk commented on August 18, 2024 1

Hi there, I just reached out by mail to the last user of your list, WikinewsSum.
Let's wait and see.
Regards.

from hub-docs.

osanseviero avatar osanseviero commented on August 18, 2024 1

Thank you for the contribution!! 🔥 🔥

from hub-docs.

julien-c avatar julien-c commented on August 18, 2024

Hi @lbourdois thanks for opening this issue.

We are thinking of implementing a simple-to-use kind of Pull Request workflow that would make sense on models, datasets, and spaces.

We don't want to make it as complex/feature-rich as GitHub PRs for instance, as we want to build the most specific set of tools for ML.

We won't ship this in the super short-term though. In the meantime, what we suggest is to reach out to model authors (can be in a GitHub issue for instance, or on our Forum on discuss.huggingface.co) and ask them to update their metadata. It is a bit tedious, so let us know if we can help automate this 🙂

from hub-docs.

lbourdois avatar lbourdois commented on August 18, 2024

Hi @julien-c,

So it's been two weeks since the attempt to contact the authors of the models.
Some tags could be added but the majority of the models are still not listed.
I don't know if there is another possibility than waiting for the tool you mentioned.

Since my last post, I also noticed a point about datasets and tags: there can be several tags for one language.
An example : there are 4 datasets tagged in "fr-FR" (https://huggingface.co/datasets?languages=languages:fr-FR), which are about French but are not found if you sort the datasets with the tag "fr" (https://huggingface.co/datasets?languages=languages:fr)
This would have to be investigated, but this phenomenon can be also find with other examples:

I think that there is an interest in keeping these sub-tags allowing to take into account the different variants of a language to be able to build extremely specific models (a model in French from France for example) or on the contrary models containing the most varieties possible (a model in French taking into account the most varieties possible: https://en.wikipedia.org/wiki/French_language#Varieties).
This would also allow to give a better visibility to the models/datasets of sign languages.
We can then imagine a simple button system displaying a drop-down menu on the languages page (https://huggingface.co/languages) when we want to see all possible variants for a given language:

image

And then we would have the subnumbers by subtags (the xx) and their sum would be equal to the number displayed for a given language. However, care should be taken to count only once a dataset with several variants of the same language.

I don't know what you think about this idea.

Have a nice day :)

from hub-docs.

elishowk avatar elishowk commented on August 18, 2024

Hi @lbourdois, I didn't find a github user handle for user WikinewsSum, it seems like an anynonymous system user. @osanseviero do you happen to know who's the maintainer ?

from hub-docs.

julien-c avatar julien-c commented on August 18, 2024

@lbourdois I saw you started using the Hub PR feature on hf.co to fix those. Thank you so much!

Please, let us know of any improvement we can make to make this as easy as possible

from hub-docs.

lbourdois avatar lbourdois commented on August 18, 2024

@julien-c
I made my first feedback here: https://huggingface.co/spaces/huggingface/HuggingDiscussions/discussions/1

There is a point on which I wonder though.
To update the tags of the datasets, no problem, it is enough to open a PR bringing a modification to the README file.
However, to update the tags of the models, a PR on the README doesn't seem to be enough according to the feedback I've just received: https://huggingface.co/stanfordnlp/corenlp-french/discussions/1
So I wonder if the PR Hub allows you to make changes to the models tags or not.
If yes, then what would I have misunderstood? If not, is there any way to add this feature?

Edit: It worked well for https://huggingface.co/Felix92/doctr-dummy-tf-sar-resnet31/discussions/1, so there would be models where this is possible and others where it is not? 🤔

from hub-docs.

osanseviero avatar osanseviero commented on August 18, 2024

@lbourdois these model repos are automatically generated from a Stanford repository so in this case they need to fix the script that creates the repo

https://github.com/stanfordnlp/huggingface-models

from hub-docs.

lbourdois avatar lbourdois commented on August 18, 2024

Thank you @osanseviero for enlightening me on this topic

from hub-docs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.