keon / awesome-nlp Goto Github PK
View Code? Open in Web Editor NEW:book: A curated list of resources dedicated to Natural Language Processing (NLP)
License: Creative Commons Zero v1.0 Universal
:book: A curated list of resources dedicated to Natural Language Processing (NLP)
License: Creative Commons Zero v1.0 Universal
Develop models for Hindi from the IIT Bombay English Hindi Parallel Corpus using Cython/spaCy-multilingual.
This is a blogpost and code from DennyBritz's project. Should I add it?
As you mention here the video are not available and I go through the link it's not there so it would be better to update with this link
I think one should add "fasttext" to list.
It is new library developed by FAIR team together with Thomas Mikolov.
It can do everything word2vec can, but faster. It also provides more features: model compression, supervised classification, character n-grams.
workin on it
We need lists for books
The paper link is here.
P.S. There might arise the issue of finding a suitable Hindi dataset.
German NLP is a curated list of German NLP resources including datasets.
P.S. This paper is one of the first to work on Embed, encode, attend, predict.
@keon Thank you.
We can set up a website at something like www.keon.github.io/awesome-nlp
Reference Demo: www.nirantk.github.io/awesome-project-ideas
which auto-updates from the Github repository at https://github.com/NirantK/awesome-project-ideas
The website would automatically pick up content and get up updated from the Markdown on our master/README.md.
This should make it easier for search engines to rank and find us. It would also make it easier for some people e.g. academics, beginners like college students etc.
@keon thoughts?
Create new category Text Summarization in Techniques
textsum
and PyTextRank
gensim
and othersCan we add research topic specific section? For example - What are the latest research in Question Answering?
Awesome.re is a meta-list of awesome repositories.
It has a few guidelines to be included there.
Maybe we can use that as our checklist to improve this. And then raise a PR when we are done?
What do you suggest @keon ?
Machine Translation has explored CNNs, made encode-decoder architectures a norm and other changes since the last update. Please add recent papers, demos, tutorials and 1-2 line explanations for each link.
If you know of better content, please add to this space.
To become the go-to resource, we should be able to curate good tools and datasets in at least the following languages (in addition to English):
Of course, we are willing to accept PRs from all other languages.
Please feel free to raise a PR or simply comment on this issue itself and we will add it on your behalf.
Thoughts? Is this a good direction to take?
Can you take up any of the languages? Sorry for assuming your Asian heritage, the South Korean flag is on your Github account
@NirantK I thought I would check before submitting another PR โ would the following NLP tool fit the list?
https://github.com/amir-zeldes/RFTokenizer
It is a trainable subword tokenizer for morphologically rich languages, such as Afro-Asiatic languages. It comes with pre-trained models for Arabic, Hebrew and Coptic, supports Python 2-3 and is installable from PyPI. Current performance is SOA on this task at least for Hebrew and Coptic (not sure about Arabic, since different papers seem to use different targets and metrics).
Indic
Asian
We should be able to add content regarding Indian/Indic languages as well, keeping in mind the growth of India Stack and need for Indic tools.
Do you think we should add a MIT or CC0 license to the repository?
This would
awesome meta-list recommends CC0 probably because it is more permissive than MIT, AGPL and the like.
What do you suggest?
Hello @keon,
Just wanted to check if we could add @the-ethan-hunt as a collaborator?
It will allow faster contributions.
He has made ample contributions imho, refer author contributions
We need to add Tensorflow and Torch implementations of various models.
P.S. As suggested by Sebastian Ruder in his tweet.
I will be working on this, but first-time contributors are welcome to contribute too!
Considering the recent surge in NLP research, I propose the listing of some of the top labs that are carrying cutting-edge research in NLP (no endorsements, here) like Stanford NLP group, University of Edinburgh NLP group etc.
@NirantK how does this sound?
The Single Exchange Dialogs
section is ambiguous, too broad and out of date. Here is how you can help us improve this:
Here is the academic work I could find:
None of the above work is inspiring but this is what we have.
Hi @keon
I'm the author of Curated Papers a website solely designed for managing, curating and interacting with curated lists of academic papers, projects, links, etc. I saw your list and thought it could be a great addition to the website, are you willing to give it a try? Currently CP is pre-launched but it is available online and I'm working on adding high quality content.
some links can be found here
https://github.com/niderhoff/nlp-datasets
There exists a similar task that is named text classification.
But I want to find a kind of model that the inputs are keyword set. And the keyword set is not from a sentence.
Thank you.
Since the languages section is a nested list, the references to individual languages are currently useless. AFAIF only headers can be linked to.
Great resource, thanks for creating this! I noticed that you don't have any Ruby libraries listed. I maintain a list of Ruby NLP libraries that you could link to if you like: https://github.com/diasks2/ruby-nlp
A Neural Network Approach to Context-Sensitive Generation of Conversational Responses Sordoni 2015. Generates responses to tweets. Uses Recurrent Neural Network Language Model (RLM) architecture of (Mikolov et al., 2010). source code: RNNLM Toolkit
The toolkit source code link is broken
I think it is a good idea to add more maintainers so we can keep up to date.
We are also going to polish the list. Please let me know if anyone is interested.
@NirantK - Are you interested?
Hello @keon,
Github topics allow Github beginners to "discover" us. It also makes it possible for us to feature in the Discover Dashboard.
Since only owners can add Github topics, can you please add some topics? Here are a few suggestions:
awesome-list
awesome
natural-language-processing
text-mining
deep-learning
or machine-learning
Adding a quick screenshot to help you find where to add Github topics:
I haven't tried it.
Standardize all links into some schematic/syntax. Here are few suggestions:
[AVA](โฆ) - JavaScript test runner
This
, here
or that
.
or full stops, this makes it easier to readWe can then consider setting up a CI like Circle CI or Travis and run this linter on every PR automatically.
This linter would be incredibly useful beyond awesome-nlp
to several other awesome-*
repositories. The value created from your effort must be seen in that context.
Link for the University of Pennsylvania research group under prominent NLP research groups seems to be broken
These days, a lot of programs have been initiated by technical organizations to get students acquainted with open source. One such program is the Kharagpur Winter of Code held by KOSS, IIT Kharagpur in the month of December since last two years. I understand that awesome-NLP
is a curated list but I believe we can benefit greatly from this program. I would like to hear thoughts about this from @NirantK and @keon . In case there is no available time, I can act as a mentor. ๐
We need to find and add a banner image similar to awesome-qauntified-self or awesome-electron
This is towards the broader goal of making this list more polished. And raise a PR to awesome.re
Requests to anyone sending a PR:
Any support for Gujarati? I think you are from Surat hence thought some support might be there for Gujarati as well
papers should be unified in one format with author names included.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.