lingua-libre / operations Goto Github PK
View Code? Open in Web Editor NEW⚙️ Configuration files and deployment procedures for LinguaLibre wiki.
License: MIT License
⚙️ Configuration files and deployment procedures for LinguaLibre wiki.
License: MIT License
In lingua-libre/operations/create_datasets.sh, we can read on line 37, 43 and 52 that the fileformat asked is ogg (while the chosen format for Lingua Libre files on Commons is wav).
Is there a reason for using this format in the datasets, and is there a reason preventing us from changing it for wav format?
All the best
Replace the hard coded line #48 by a programmatic query.
Rapid Query to get relevant info from LLQS :
SELECT ?lang ?langLabel ?code ( count(DISTINCT ?record) as ?nb ) WHERE {
?lang prop:P2 entity:Q4 ; rdfs:label ?langLabel . FILTER (lang(?langLabel) = "en").
OPTIONAL { ?record prop:P4 ?lang ; prop:P2 entity:Q2 . }
OPTIONAL { ?lang prop:P13 ?code }
}
GROUP BY ?lang ?langLabel ?code
ORDER BY DESC(?nb)
See also :
Commit: f3e93bb
Source MediaWiki:LanguagesGalleryData.js, https://jsfiddle.net/gxjqunbr/1/ (2022.01.22).
To do for @mickeybarber :
create_datasets.sh
on its server 🕺🏼Explanation :
Je suis retombé sur la question de la génération de datasets.
Suite à l'analyse de Mickey j'ai réalisé qu'une modification 'en dure' pourrait permettre de tester davantage.
J'ai donc modifié le script pour include "en dur" la liste des langues à télécharger dans la parti 3.
Les langues sont organisées de la plus petite à la plus dotée, ce qui nous permettra :
/create_datasets.sh
.org
: done@Jitrixis, @Poslovitch : is one of you able to define what is ./crontab for ? Which skills are needed ?
I suspect it requires a Mediawiki / PHP / Backend expertises.
/crontab
- understanding# Run maintenance scripts on the production instance
00 4 * * * /usr/bin/php7.0 /home/www/lingualibre.fr/maintenance/cleanupUploadStash.php > /dev/null 2>&1
00 5 * * * /usr/bin/php7.0 /home/www/lingualibre.fr/maintenance/rebuildLocalisationCache.php > /dev/null 2>&1
# Run maintenance scripts on the testing instance
15 4 * * * /usr/bin/php7.0 /home/www/v2.lingualibre.fr/maintenance/cleanupUploadStash.php > /dev/null 2>&1
00 5 * * * /usr/bin/php7.0 /home/www/v2.lingualibre.fr/maintenance/rebuildLocalisationCache.php > /dev/null 2>&1
# Other stuff
30 2 * * 1 /opt/letsencrypt/letsencrypt-auto renew >> /var/log/le-renew.log
45 2 * * 1 /bin/systemctl reload nginx
30 4 * * * logrotate /etc/logrotate.conf
/home/www/
actual folder structure and /crontab
file's paths/home/www/
? Please provide the /home/www/
folder's tree structure for 1, 2 or 3 levels, as necessary so see if it match the various ./crontab
file's paths.lingualibre.fr
to new paths values ?@mickeybarber, in the same server exploration you should bump into the following items which we would gain to document better...
home/www/v2.lingualibre.fr/
path still exists on the server ? I think this LL version 2 is the current lingualibre.org. So I expect this v2
path and folder the be missing because it got renamed into a .org
path and folder. Can you see such thing ?https://dev.lingualibre.org/
(dev version) is online and working. To which actual folder and path does this correspond ?Note: the 5 points above are nearly the same question = share with us the directory structure so we may see the possible broken paths. This will help us to know which url are outdated and by which to replace them.
@pamputt reported :
In the name of the archive, the language name is cut when it contains a space. For example, before we had "Q115107-bcl-Central Bikol.zip" and now "Q115107-bcl-Central.zip" (Bikol has dissapeared). Is it possible to fix that quickly or should I open a bug report on Phabricator ?
See https://lingualibre.org/datasets/
The bug is likely from this section create_datasets.sh#L53-L56, the pseudo regex.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.