alkamid / wiktionary Goto Github PK
View Code? Open in Web Editor NEWAlkamidBot's scripts
Home Page: http://pl.wiktionary.org
License: MIT License
AlkamidBot's scripts
Home Page: http://pl.wiktionary.org
License: MIT License
Right now porzucone.html
is constantly overwritten, so when the script starts, it is blank. A better idea would be to write to porzucone.html.1
and then move to porzucone.html
when the script is finished, as suggested by valhallasw on IRC:
valhallasw`cloud alkamid: the typical solution for these kinds of issues is to write to porzucone.html.1, then move porzucone.html.1 over porzucone.html
An example that should be accepted:
[[ważny|Ważnych]] [[argument]]ów [[na]] [[to]], [[że]] [[Gomułka]] [[powinien]] [[odejść]], [[mieć|miała]] [[dostarczyć]] [[radziecki]]emu [[przywódca|przywódcy]] [[rozmowa]] [[ambasador]]a Aristowa [[z]] [[szef]]em [[polski]]ej [[partia|partii]]. [[dojść|Doszło]] [[do]] [[ona|niej]] [[we]] [[wtorek]] [[15]] [[grudzień|grudnia]] [[po]] [[południe|południu]] [[w]] [[gmach]]u [[KC]]. [[Piotr]] [[Kostikow]] [[relacjonować|relacjonuje]], [[że]] [[Gomułka]] [[zwymyślać|zwymyślał]] Aristowa: - [[co|Co]] [[wy]] [[tu]] [[ja|mnie]] [[egzaminować|egzaminujecie]] - [[krzyknąć|krzyknął]] [[usłyszeć|usłyszawszy]] [[pytanie|pytania]] [[ambasador]]a. - [[mieć|Macie]] [[swój|swoje]] [[informacja|informacje]] [[i]] [[wiedzieć|wiecie]], [[jaki]] [[charakter]] [[mieć|mają]] [[demonstracja|demonstracje]] [[w]] [[Gdańsk]]u.
ERROR: Traceback (most recent call last):
File "/data/project/alkamidbot/scripts/venv/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py", line 372, in _make_request
httplib_response = conn.getresponse(buffering=True)
TypeError: getresponse() got an unexpected keyword argument 'buffering'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/data/project/alkamidbot/scripts/venv/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py", line 374, in _make_request
httplib_response = conn.getresponse()
File "/usr/lib/python3.4/http/client.py", line 1147, in getresponse
response.begin()
File "/usr/lib/python3.4/http/client.py", line 351, in begin
version, status, reason = self._read_status()
File "/usr/lib/python3.4/http/client.py", line 313, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/usr/lib/python3.4/socket.py", line 371, in readinto
return self._sock.recv_into(b)
File "/usr/lib/python3.4/ssl.py", line 746, in recv_into
return self.read(nbytes, buffer)
File "/usr/lib/python3.4/ssl.py", line 618, in read
v = self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/data/project/alkamidbot/scripts/venv/lib/python3.4/site-packages/requests/adapters.py", line 370, in send
timeout=timeout
File "/data/project/alkamidbot/scripts/venv/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py", line 597, in urlopen
_stacktrace=sys.exc_info()[2])
File "/data/project/alkamidbot/scripts/venv/lib/python3.4/site-packages/requests/packages/urllib3/util/retry.py", line 245, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/data/project/alkamidbot/scripts/venv/lib/python3.4/site-packages/requests/packages/urllib3/packages/six.py", line 310, in reraise
raise value
File "/data/project/alkamidbot/scripts/venv/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py", line 544, in urlopen
body=body, headers=headers)
File "/data/project/alkamidbot/scripts/venv/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py", line 376, in _make_request
self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
File "/data/project/alkamidbot/scripts/venv/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py", line 304, in _raise_timeout
raise ReadTimeoutError(self, url, "Read timed out. (read timeout=%s)" % timeout_value)
requests.packages.urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='pl.wiktionary.org', port=443): Read timed out. (read timeout=30)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/shared/pywikipedia/core/pywikibot/data/api.py", line 1954, in submit
body=body, headers=headers)
File "/shared/pywikipedia/core/pywikibot/tools/__init__.py", line 1417, in wrapper
return obj(*__args, **__kw)
File "/shared/pywikipedia/core/pywikibot/comms/http.py", line 322, in request
r = fetch(baseuri, method, body, headers, **kwargs)
File "/shared/pywikipedia/core/pywikibot/comms/http.py", line 477, in fetch
error_handling_callback(request)
File "/shared/pywikipedia/core/pywikibot/comms/http.py", line 395, in error_handling_callback
raise request.data
File "/shared/pywikipedia/core/pywikibot/comms/http.py", line 374, in _http_process
verify=not ignore_validation)
File "/data/project/alkamidbot/scripts/venv/lib/python3.4/site-packages/requests/sessions.py", line 465, in request
resp = self.send(prep, **send_kwargs)
File "/data/project/alkamidbot/scripts/venv/lib/python3.4/site-packages/requests/sessions.py", line 573, in send
r = adapter.send(request, **kwargs)
File "/data/project/alkamidbot/scripts/venv/lib/python3.4/site-packages/requests/adapters.py", line 433, in send
raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='pl.wiktionary.org', port=443): Read timed out. (read timeout=30)
WARNING: Waiting 5 seconds before retrying.
WARNING: API error editconflict: Edit conflict detected
Traceback (most recent call last):
File "/shared/pywikipedia/core/pywikibot/site.py", line 4978, in editpage
result = req.submit()
File "/shared/pywikipedia/core/pywikibot/data/api.py", line 2189, in submit
raise APIError(**result['error'])
pywikibot.data.api.APIError: editconflict: Edit conflict detected [help:See https://pl.wiktionary.org/w/api.php for API usage]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/data/project/alkamidbot/scripts/wymowa.py", line 232, in <module>
main()
File "/data/project/alkamidbot/scripts/wymowa.py", line 189, in main
output_main.put(final, comment = 'Aktualizacja listy', botflag=False)
File "/shared/pywikipedia/core/pywikibot/tools/__init__.py", line 1417, in wrapper
return obj(*__args, **__kw)
File "/shared/pywikipedia/core/pywikibot/page.py", line 1291, in put
**kwargs)
File "/shared/pywikipedia/core/pywikibot/tools/__init__.py", line 1417, in wrapper
return obj(*__args, **__kw)
File "/shared/pywikipedia/core/pywikibot/page.py", line 1208, in save
cc=apply_cosmetic_changes, quiet=quiet, **kwargs)
File "/shared/pywikipedia/core/pywikibot/page.py", line 1233, in _save
raise err
File "/shared/pywikipedia/core/pywikibot/page.py", line 1219, in _save
watch=watch, bot=botflag, **kwargs)
File "/shared/pywikipedia/core/pywikibot/site.py", line 1329, in callee
return fn(self, *args, **kwargs)
File "/shared/pywikipedia/core/pywikibot/site.py", line 4998, in editpage
raise self._ep_errors[err.code](page)
pywikibot.exceptions.EditConflict: Page [[pl:Wikipedysta:AlkamidBot/wymowa]] could not be saved due to an edit conflict
CRITICAL: Closing network session.
In [27]: morfeusz.analyse('gasdgfsd')
Out[27]: [[('gasdgfsd', 'gasdgfsd', 'ign')]]
==> wymowa.py.e607950 <==
r = fetch(baseuri, method, body, headers, **kwargs)
File "/shared/pywikipedia/core/pywikibot/comms/http.py", line 354, in fetch
error_handling_callback(request)
File "/shared/pywikipedia/core/pywikibot/comms/http.py", line 271, in error_handling_callback
raise request.data
ConnectionError: HTTPSConnectionPool(host='commons.wikimedia.org', port=443): Max retries exceeded with url: /w/api.php?titles=File%3APl-genus.OGG&continue=&format=json&prop=imageinfo&iilimit=500&meta=userinfo&indexpageids=&action=query&maxlag=5&iiprop=timestamp%7Cuser%7Ccomment%7Curl%7Csize%7Csha1%7Cmime%7Cmetadata%7Carchivename&uiprop=blockinfo%7Chasmsg (Caused by <class 'httplib.BadStatusLine'>: '')
WARNING: Waiting 5 seconds before retrying.
(but then continues and saves the page so might not be critical)
https://github.com/alkamid/wiktionary/blob/nkjp_examples_lookup/klasa.py#L434
This works for examples only now, but should do for any subsection (odmiana / holonimy / etymologia etc.)
obviously 'type' is a bad name for a variable
Right now Page.getReferences()
is used to count references to a page, and this is a method that operates on-line. This is an overkill and it takes forever to find orphaned pages.
Options:
pywikibot/lonelypages.py
pagelinks
dump, where are links are listedOd [[początek|początku]] [[istnienie|istnienia]] [[parafia|parafii]] [[praca|pracę]] [[duszpasterski|duszpasterską]] [[inspirować|inspirował]] [[i]] [[kierować|kierował]] [[on|nią]] [[długoletni]] [[proboszcz]] [[o]].
Czasem trudno zauważyć, że pozostało jakieś niepodlinkowane słowo. Może warto tak zmodyfikować skrypt, że jeśli nie wszystkie słowa są podlinkowane, nie da się włączyć zielonego przycisku zatwierdzenia, ew. z odpowiednim komunikatem? tscaodp 11:44, 27 lut 2016 (CET)
useful for edit conflicts.
The match doesn't have to be perfect (right now I'm checking for perfect matches), because editors might change "-" to "—" or specific links.
useful e.g. for logging (https://github.com/alkamid/wiktionary/blob/nkjp_examples_lookup/orphaned_examples.py#L308)
Ideally we want to add examples to all meanings that miss them, not only to pages with no examples whatsoever.
The bot is already making a list of orphans: http://tools.wmflabs.org/alkamidbot/porzucone.html
It could also make a list of words without examples and then search NKJP (or somewhere else?) for sentences that include both the orphan and the word without examples.
Then make an interface for users to check these examples (unfortunately words have different meanings and the bot cannot really tell the difference between them*). It could be an external site where the user just sees the meaning and the example, and can "tick" if the example is OK.
http://nkjp.uni.lodz.pl/WordsOfDay?date_key=2015-09-01#kw - facilitate adding them?
Traceback (most recent call last):
File "/data/project/alkamidbot/scripts/wymowa.py", line 232, in <module>
main()
File "/data/project/alkamidbot/scripts/wymowa.py", line 226, in main
f.write('<br />' + item + '\n')
UnicodeEncodeError: 'ascii' codec can't encode character '\u0107' in position 15: ordinal not in range(128)
CRITICAL: Closing network session.
It would be useful to see what languages or what types of pages are the most popular.
Wikisłownik:Dodawanie przykładów/dane/002 "cichaczem"
Wikisłownik:Dodawanie przykładów/dane/001 "pozostać"
In [5]: sjp.phrases_wikilink(sjp.wikilink('a także stało się to dziś w Stolicy Apostolskiej'))
Out[5]: '[[a także]] [[stać|stało]] [[się]] to [[dziś]] [[w]] [[stolica|Stolicy]] [[apostolski|Apostolskiej]]'
By looking at logs for "channel", "domain" and titles (compare those for good_examples and bad_exampes), find out if there are any really bad sources and exclude them.
Traceback (most recent call last):
File "/data/project/alkamidbot/scripts/visits.py", line 113, in <module>
main()
File "/data/project/alkamidbot/scripts/visits.py", line 65, in main
for line in inp:
UnboundLocalError: local variable 'inp' referenced before assignment
CRITICAL: Closing network session.
implement a function similar to rebindFormActions: (https://pl.wiktionary.org/wiki/MediaWiki:Gadget-edit-form-ui.js)
<PeterBowman> mozesz konwertowac tylko raz, przed zapisaniem strony
<PeterBowman> $( '#editform' ).on( 'submit', handler )
<PeterBowman> gdzie handler to funkcja konwertujaca JSON na ciag znakow i wklejajaca go do okna edycji
<PeterBowman> dzialaloby tez dla podgladu zmian
<PeterBowman> nieco wyzej uzylem mw.confirmCloseWindow(), to jest na wypadej, gdyby edytor chcial zamknac okno bez zapisania edycji
<PeterBowman> wyswietla wtedy komunikat proszacy o potwierdzenie
In [2]: orphaned_examples.wikified_proportion('[[tłum|Tłumy]] [[gromadzić|gromadziły]] [[się]] [[przy]] [[ponad]] [[36-metrowy|36-metrowej]] [[daglezjowy|daglezjowej]] [[dłużyca|dłużycy]], [[z]] [[który|której]] [[powstać|powstała]] [[długi|najdłuższa]] [[deska]] [[świat]]a.')
Out[2]: 0.8666666666666667
[[mariacki|Mariacki]] [[hejnał]] [[kojarzyć|kojarzył]] [[się]] [[z]] niedzielno-obiadowymi woniami, [[przed]]e wszystkim [[z]] [[zapach]]em [[rosół|rosołu]] [[bulgotać|bulgoczącego]] [[na]] kuchennej [[blacha|blasze]], kurzo-wołowego, [[pietruszkowy|pietruszkowego]], [[selerowy|selerowego]]...
==> wymowa.py.e1498670 <==
Traceback (most recent call last):
File "/data/project/alkamidbot/scripts/wymowa.py", line 233, in
main()
File "/data/project/alkamidbot/scripts/wymowa.py", line 28, in main
for line in f:
File "/data/project/alkamidbot/scripts/venv/lib/python3.4/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 6: ordinal not in range(128)
CRITICAL: Closing network session.
API doesn't like UTF queries.
use logs to update pages_without_example and orphans (which are generated from dumps)
File "/shared/pywikipedia/core/pywikibot/comms/http.py", line 238, in request
r = fetch(baseuri, method, body, headers, **kwargs)
File "/shared/pywikipedia/core/pywikibot/comms/http.py", line 354, in fetch
error_handling_callback(request)
File "/shared/pywikipedia/core/pywikibot/comms/http.py", line 271, in error_handling_callback
raise request.data
ConnectionError: HTTPSConnectionPool(host='pl.wiktionary.org', port=443): Max retries exceeded with url: /w/api.php (Caused by <class 'httplib.BadStatusLine'>: '')
WARNING: Waiting 5 seconds before retrying.
Sleeping for 5.0 seconds, 2015-07-12 05:09:09
e.g.:
Wikisation is not ideal right now (and will never be). There surely are some words that are never wikified as their alternative base forms, e.g. "lub" always comes from "lub", not from the imperative of "lubić".
A function should look at all verified "good_examples" and compare them with what the bot has put on pages.
[[przed|Przede]] wszystkim [[należeć|należy]] [[przygotować]] [[odpowiedni]]e [[zabezpieczenie|zabezpieczenia]] [[przeciwpożarowy|przeciwpożarowe]] - [[czyli]] mieć [[pod]] [[ręka|ręką]] [[gaśnica|gaśnice]], koce [[przeciwogniowy|przeciwogniowe]].
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.