domenicosolazzo / jroc Goto Github PK
View Code? Open in Web Editor NEWWe ain't afraid of Randy with his candy. You Know Am I Sayin'?
License: GNU General Public License v3.0
We ain't afraid of Randy with his candy. You Know Am I Sayin'?
License: GNU General Public License v3.0
We need logs stored in Loggly
It cannot find gunicorn when running the container with docker-compose.
The actual error is:
web_1 | bash: gunicorn: command not found
jroc_web_1 exited with code 127
We should remove the characters below from a text that we send to JRoc
«»
'
*
We should modify both entity recognition and tag recognition.
An example text is:
{"data": "VG henta i fjor ut tal frå inspeksjonane til Statens vegvesen, som viste at 104 bruer i Noreg er svekka. Og brusjefen i Vegdirektoratet vedgår at etterslepet framleis er stort. – Vi har eit relativt stort vedlikehaldsetterslep som vi arbeider med å redusere. Så det er ein stor innsats på gang no for å redusere dette etterslepet, seier han til NRK. Stengde bru. Måndag hastestengde vegvesenet Rauma bru på E136. Dykkarar oppdaga ein usikker brupilar. Dette er ei av landets bruer som sårt trengde oppgradering, og dette arbeidet hadde halde på i lang tid, før brua blei heilt stengt måndag. Det er slik vegvesenet skal jobbe, for å vareta tryggleiken, seier Stensvold. – Dersom vi oppdagar noko vi er usikker på, så må vi stenge til vi har funne ut kor alvorleg det er. Tryggleiken kjem først. Og Stensvold meiner dei har god kontroll på situasjonen, trass det store etterslepet. – Vi har jamlege inspeksjonar, der eventuelle feil ved konstruksjonen vil bli avdekka. Problematisk. Men stenginga av vegen har ikkje vore uproblematisk midt i turistsesongen. Ein av dei som ofte køyrer strekninga, Lars Hardeland frå Nettbuss, ristar oppgitt på hovudet over situasjonen. – Her har dei halde på med oppgraderingsarbeid og lysregulering i to år, og så oppdagar dei at brua er så dårleg at dei stenger den. Eg synest det er heilt ufatteleg, seier han. Lastebileigarforbundet er ei anna gruppe som er råka av vegstenginga, og distriktssjef Dagrunn Krakeli meiner at beredskapen ved brustengingar må bli betre. – Det har skjedd før, og det vil skje igjen, så planen for omkøyringar eller reservebruer frå forsvaret må vere klar, seier ho."}
It recognizes the word Dykkarar as both tag and entity where it should be ignored.
This is an example:
Uri: https:///entities/usa
It works correctly with:
Result:
{
uri: "http://<your-domain>/entities/http://www.ontologyportal.org/WordNet#WN30Word-usa",
data: {
properties_uri: "http://<your-domain>/entities/http://www.ontologyportal.org/WordNet#WN30Word-usa/properties",
types_uri: "http://jroc-t1.herokuapp.com/entities/http://www.ontologyportal.org/WordNet#WN30Word-usa/types",
name: "http://www.ontologyportal.org/WordNet#WN30Word-usa",
redirected_from: "http://<your-domain>/entities/usa"
}
}
Expected:
{
uri: "http://<your-domain>/entities/United_States",
data: {
properties_uri: "http://<your-domain>/entities/United_States/properties",
types_uri: "http://<your-domain>/entities/United_States/types",
name: "United_States",
redirected_from: "<your-domain>/entities/Usa"
}
}
Tag endpoint is not returning all the tags
It needs a better package structure.
JRoc should have the possibility to cluster similar tags together.
What is similar?
In v1: the similarity is based on Levenshtein distance
In v2: the similarity is going to be based on similar concepts.
Geo localization of the content
We will need different trained models for different language
Detect the language of the text sent to JRoc
Search topics using a public SPARQL Endpoint: http://dbpedia.org/fct/facet.vsp
Detect the sentiment of the text
Add property tagger.
It detects aspect words links them with the correct aspect class.
Useful for hotel reviews.
Example
Word found: cleanliness -> bed
Use stopwords from NLTK package
Text summarization in several languages
Remove the character "–" at the beginning of sentences.
JRoc fails to analyze the text when there are newlines inside it.
Example of RegexTagger
JRoc should be able to load from an external JSON File
Adding a polarity tagger
The process crashes without warning when importing NLTK and making an HTTP request.
I can reproduce this error only when using the API. It does not affect the pipeline when it is run as background worker.
The issue should be related to this error in NLTK: nltk/nltk#947
Show only tags with at least two characters
Adding a stemmer task in several languages
Refactoring the code of the Norwegian POS
Add Kafka support
Add RabbitMQ support for using jroc as a background worker in Heroku
Add new documentation about the new architecture for jroc
It should document:
Find similar tags using Wordnet
Add Redis support for using jroc as a background worker in Heroku
Adding opinion detection and opinion tagger.
Error when using the tagger with statistical disambiguation on Heroku.
Error log:
sh: 1: /app/The-Oslo-Bergen-Tagger/OBT-Stat/hunpos/hunpos-1.0-linux/hunpos-tag: not found
/app/The-Oslo-Bergen-Tagger/OBT-Stat/lib/disambiguation_context.rb:21:in `initialize': Inconsistent token count in OBT and Hunpos data. (ArgumentError)
from /app/The-Oslo-Bergen-Tagger/OBT-Stat/lib/disambiguator.rb:153:in `disambiguate'
from /app/The-Oslo-Bergen-Tagger/OBT-Stat/lib/disambiguator.rb:153:in `new'
from /app/modules/tagger/../../The-Oslo-Bergen-Tagger/OBT-Stat/bin/run_obt_stat.rb:29:in `run_disambiguator'
from /app/modules/tagger/../../The-Oslo-Bergen-Tagger/OBT-Stat/bin/run_obt_stat.rb:107:in `<main>'
Add custom stopwords
Add tokenizer
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.