Comments (3)
Sounds great. Send a pull request?
from corpuscrawler.
Hello Sascha / @brawer,
My Python skills are near zero so far, I do my best to help with my available knowledges and know-how :
- multilingual corpora literature review → sharing
- Wikimedia's API, ecosystems, resources → sharing
- documenting opensource project positively to increase engagements
- clarifying roadmaps
- networking for stronger projects¹
The project also lacks meaningul documentation (#80). It would be inefficient to get a total Python-newbie on Python copy-engineering. I will be more productive on other linguistic diversity issues, here on on @lingua-libre projects.
Given how central to web linguistic diversity is this CLDR/UNILEX/Unicode/Google's CorpusCrawler repository, is there an email contact to which I or/and Wikimedia France or/and Wikimedia Foundation could write to ask for more solid support for CorpusCrawler ? Volunteership can do a lot but is too irregular. A dedicated, versatile, paid maintainer supervising ~20² Google's open sources projects, unblocking most key bottlenecks via 4 hours coding sprints and community support would quickly provide a positive ROI. 2020 opens access to skilled workers all around the world. There is surely a long list of open sources projects which would gain of such tiny yet skilled bottlenecks-kicks to move forward.
I would be interested to coordinate such email with Wikimedia France and the US Wikimedia Foundation to get a hand of names of that email. (If there is a reasonable >5~10% chances to achieved the intended goal of a skilled, paid maintainer here 4hrs/week in next 2 years).
1: see text above
2: depending on projects activity, could be less or more. Current project has about 1 issue / month.
from corpuscrawler.
Thanks for the chat @brawer. Our online chat will help me conceive better the next phases of Lingualibre and collaboration with crawler.
from corpuscrawler.
Related Issues (20)
- Crawl Pali corpora
- Error when crawling Kaqchikel HOT 3
- what sites are crawled? HOT 2
- crawler gets hung after downloading a few hits HOT 2
- Add Norwegian language HOT 1
- Portuguese: doubt about the corpus result HOT 1
- 404 error with Myanmar Zawgyi HOT 2
- Does not run in python3.7 or python 2.7 HOT 1
- Adding New URLs HOT 2
- Add Pali, Mon, and Karen HOT 1
- Add Wikipedia crawler ? (300+ languages) HOT 5
- Improve readme documentation on how to provide a new crawler HOT 5
- Define crawlers' output format
- Shorten project structure HOT 3
- Documentation > Clarify language codes system in uses HOT 4
- Use corpora from Universal Dependencies
- No module named 'corpuscrawler' error HOT 2
- Undefined names
- Use available sentences corpora for Wikipedia (290+ languages)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from corpuscrawler.