Code Monkey home page Code Monkey logo

duden's Introduction

Duden Version

duden is a CLI-based program and python module, which can provide various information about given german word. The provided data are parsed from german dictionary duden.de.

duden screenshot

Installation

pip3 install duden

Usage

CLI

$ duden Löffel

Löffel, der
===========
Word type: Substantiv, maskulin
Commonness: 2/5
Separation: Löf|fel
Meaning overview:
 0.  a. [metallenes] [Ess]gerät, an dessen unterem Stielende eine schalenartige Vertiefung sitzt und das zur Aufnahme von Suppe, Flüssigkeiten, zur Zubereitung von Speisen o. Ä. verwendet wird
     b. (Medizin) Kürette

 1. (Jägersprache) Ohr von Hase und Kaninchen

Synonyms:
Ohr; [Ge]hörorgan; (salopp) Horcher, Horchlappen, Lauscher; (Jägersprache) Loser, Teller
Full CLI syntax (expand)
$ duden --help
usage: duden [-h] [--title] [--name] [--article] [--part-of-speech] [--frequency] [--usage]
             [--word-separation] [--meaning-overview] [--synonyms] [--origin] [--grammar-overview]
             [--compounds [COMPOUNDS]] [-i] [--export] [--words-before] [--words-after] [-r RESULT] [--fuzzy]
             [--no-cache] [-V] [--phonetic] [--alternative-spellings]
             word

positional arguments:
  word

options:
  -h, --help            show this help message and exit
  --title               display word and article
  --name                display the word itself
  --article             display article
  --part-of-speech      display part of speech
  --frequency           display commonness (1 to 5)
  --usage               display context of use
  --word-separation     display proper separation (line separated)
  --meaning-overview    display meaning overview
  --synonyms            list synonyms (line separated)
  --origin              display origin
  --grammar-overview    display short grammar overview
  --compounds [COMPOUNDS]
                        list common compounds
  -i, --inflect         display inflections
  --export              export parsed word attributes in yaml format
  --words-before        list 5 words before this one
  --words-after         list 5 words after this one
  -r RESULT, --result RESULT
                        display n-th (starting from 1) result in case of multiple words matching the input
  --fuzzy               enable fuzzy word matching
  --no-cache            do not cache retrieved words
  -V, --version         print program version
  --phonetic            display pronunciation
  --alternative-spellings
                        display alternative spellings

Module usage

>>> import duden
>>> w = duden.get('Loeffel')
>>> w.name
'Löffel'
>>> w.word_separation
['Löf', 'fel']
>>> w.synonyms
'Ohr; [Ge]hörorgan; (salopp) Horcher, Horchlappen, Lauscher; (Jägersprache) Loser, Teller'

For more examples see usage documentation.

Development

Dependencies and packaging are managed by Poetry.

Install the virtual environment and enter it with

$ poetry install
$ poetry shell

Testing and code style

To execute data tests, run

$ pytest

To run python style autoformaters (isort, black), run

$ make autoformat

Localization

Apart from English, this package has partial translations to German, Spanish, and Esperanto languages.

To test duden in other languages, set the LANG environment variable before running duden like so:

LANG=de_DE.UTF-8 duden Kragen
LANG=es_ES.UTF-8 duden Kragen
LANG=eo_EO.UTF-8 duden Kragen

The translations are located in the duden/locale/ directory as the *.po and duden.pot files. The duden.pot file defines all translatable strings in series of text blocks formatted like this:

#: main.py:82
msgid "Commonness:"
msgstr ""

while the individual language files provides translations to the strings identified by msgid like this:

#: main.py:82
msgid "Commonness:"
msgstr "Häufigkeit:"

Note that the commented lines like #: main.py:82 do not have any functional meaning, and can get out of sync.

Publishing

To build and publish the package to (test) PyPI, you can use one of these shortcut commands:

$ make pypi-publish-test
$ make pypi-publish

(these also take care of building the localization files before calling poetry publish)

Poetry configuration for PyPI and Test PyPI credentials are well covered in this SO answer.

Including localization data in the package

In order for the localization data to be included in the resulting python package, the *.po files must be compiled using the

$ make localization

command before building the package with poetry.

Supported versions of Python

  • Python 3.4+

duden's People

Contributors

anetschka avatar jorgesumle avatar mivhqw avatar mnaberez avatar mparienti avatar mundanevision20 avatar pajowu avatar radomirbosak avatar scriptim avatar viewviewview avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

duden's Issues

Parse non-tabular grammar information

>>> python3 -c "import duden; print(duden.get('Kragen').grammar_raw)"
[]

duden.de, however has this text supplied in the grammar section:

der Kragen; Genitiv: des Kragens, Plural: die Kragen, süddeutsch, österreichisch, schweizerisch: Krägen

The script shouldn't omit this. However it is not clear in which form should it present the information in this rare case of non-table-like format.

Add shell completion

It's useful, you don't have to remember or look up usage information. Ideally, you should be able to press tab to autocomplete or see relevant suggestions.

Word "Qat" crashes the program

$ duden Qat
Qat, das
========
Traceback (most recent call last):
  File "/home/user/.local/bin/duden", line 11, in <module>
    sys.exit(main())
  File "/home/user/.local/lib/python3.6/site-packages/duden/main.py", line 610, in main
    display_word(word, args)
  File "/home/user/.local/lib/python3.6/site-packages/duden/main.py", line 554, in display_word
    word.describe()
  File "/home/user/.local/lib/python3.6/site-packages/duden/main.py", line 79, in describe
    print(_('Word type: ') + self.part_of_speech)
TypeError: must be str, not NoneType

Bad output word "durchstellen"

There are some words together in the meaning overview.

$ duden durchstellen
durchstellen
============
Word type: schwaches Verb
Commonness: 2/5
Separation: durch|stel|len
Meaning overview:
(ein Telefongespräch) vom Hauptapparat auf einen Nebenanschluss weiterleitenBeispielbitte [das Gespräch] in die Wohnung, zum Chef durchstellen!

The `--meaning-overview` option contains extra text

Currently, the program parses the text from the correct page paragraph, but includes a lot of unneeded text, such as examples and figure labels.

Example:

python -m duden.main 'Hase' --meaning-overview
 0.  a. wild lebendes Säugetier mit langen Ohren, einem dichten, weichen, bräunlichen Fell und langen Hinterbeinen   © jay - Fotolia.comBeispieleer ist furchtsam wie ein Haseder Hase macht Männchen, hoppelt, schlägt Hakeneinen Hasen hetzen, schießen, abziehen, bratenWendungen, Redensarten, Sprichwörterein alter Hase (umgangssprachlich: jemand, der sehr viel Erfahrung [in einer bestimmten Sache] hat)heuriger Hase (umgangssprachlich: Neuling: es macht ihm Spaß, die heurigen Hasen herumzukommandieren; der ältere Hase hat Erfahrung darin, dem Jäger zu entkommen, im Gegensatz zu einem erst einjährigen Hasen)falscher/Falscher Hase (Hackbraten)sehen, wissen, erkennen, begreifen, wie der Hase läuft (umgangssprachlich: erkennen, vorhersagen können, wie eine Sache weitergeht; nach der Vorstellung, dass ein erfahrener Jäger nach kurzer Zeit beobachtenden Abwartens erkennen kann, in welche Richtung ein Hase flieht, auch wenn er viele Haken schlägt)da liegt der Hase im Pfeffer (umgangssprachlich: das ist der entscheidende Punkt, die eigentliche Ursache; mit Bezug auf den fertig zubereiteten Hasenbraten in einer scharf gewürzten Soße, womit angedeutet wird, dass jemand aus einer bestimmten Lage nicht mehr herauskommt)
     b. männlicher Hase (1a)
     c. Hasenbraten, -gerichtBeispieles gibt heute Hase
     d. Kaninchen   © MEV Verlag, AugsburgGebrauchlandschaftlich

 1. Schrittmacher (3)GebrauchSportjargon

 2.  a. Mädchen, FrauGebrauchsaloppBeispieleein scharfer Hasekennst du die Hasen im Klub?
     b. Kosewort, besonders für Kinder

Add CLI switch for word grammar

Grammar

$ duden Ton --grammar
Singular   Nominativ | der Ton
Plural     Nominativ | die Töne
Singular   Genitiv   | des Tones, Tons
Plural     Genitiv   | der Töne
Singular   Dativ     | dem Ton
Plural     Dativ     | den Tönen
Akkusativ  Singular  | den Ton
Akkusativ  Plural    | die Töne
$ duden Ton --grammar genitiv
Singular | des Tones, Tons
Plural   | der Töne
$ duden Ton --grammar genitiv,plural
die Töne

Write proper documentation

This means:

  • an informative README file
  • functions having useful docstrings / separate documentation page
  • documentation of module features

Use word search by default when using CLI

CLI

This example shows how the CLI is used

$ duden Barmherzigkeit
Barmherzigkeit, die
===================
Word type: Substantiv, feminin
Usage: gehoben
Commonness: 2/5
Separation: Barm|her|zig|keit
Meaning overview:
barmherziges Wesen, Verhalten
Synonyms:
[Engels]güte, Milde, Nachsicht, Nachsichtigkeit; (gehoben) Herzensgüte, Mildtätigkeit, Seelengüte; (bildungssprachlich) Humanität, Indulgenz; (veraltend) Wohltätigkeit; (Religion) Gnade

Searching

Some words have multiple meanings

$ duden Ton
Found 2 matching words
1) Ton_Klang_Schwingung_Aufnahme
2) Ton_Sediment

$ duden Ton --fuzzy
Found 20 matching words
1) Ton_Klang_Schwingung_Aufnahme
2) Ton_Sediment
3) O_Ton
4) Bild_Ton_Kamera
5) Tonpfeife
6) Tonschicht
7) Tonblende
8) Tonsaeule
9) tonkraeftig
10) Tongeschirr
11) Tonstoerung
12) Tondichtung
13) Tonfrequenz
14) Tonschiefer
15) Tongaer_aus_von_Tonga
16) Tongaer_Einwohner_Tonga
17) Tonkunst
18) Tonware
19) Tonnage
20) Tonlage

$ duden Ton --result 2
Ton, der
========
Word type: Substantiv, maskulin
Commonness: 1/5
Separation: Ton
Meaning overview:
besonders zur Herstellung von Töpferwaren verwendetes lockeres, feinkörniges Sediment von gelblicher bis grauer Farbe
Synonyms:
Lehm, Tonerde; (Geologie) Mergel

--help consistency

Some options have a full stop in their descriptions, other don't.For example, after --synonyms description there is no period, but after fuzzy there is a period.

  --synonyms            Synonyme, jedes in einer eigenen Zeile, auflisten
  --origin              Wortherkunft anzeigen
  --compounds [COMPOUNDS]
                        Typische Verbindungen auflisten
  -g [GRAMMAR], --grammar [GRAMMAR]
                        Grammatik anzeigen
  -r RESULT, --result RESULT
                        Display n-th result in case of multple words matching
                        the input. Starts at 1.
  --fuzzy               Enable fuzzy word matching.

Word "in" is not found

When looking for the word "in" via the CLI the word is not found. (Mainly because of there are multiple entries for this word on the duden-website?)
Is there any setting to change or is this a problem in general?

Changes in duden.de broke the parsing?

Hi,
when trying to search any term (for example "Löffel") I don't get any result.

In addition, many of the tests are failing for me:

$ make test
python --version
Python 3.7.4
python -m pytest tests/
========================================= test session starts ==========================================
platform linux -- Python 3.7.4, pytest-5.0.1, py-1.8.0, pluggy-0.12.0
rootdir: /home/user/Documents/workspace/duden
collected 84 items                                                                                     

tests/test_online_attributes.py ..F..F...F..F........FFF.FFFFFF.FFF....F..FFFFFFFFFF.FFFFFF.FFFF [ 76%]
FF.FFF..F..FFFFF.F.F

Is this a known issue due to changes in the duden.de page?

Does not work on python2

For example Loeffel returns an error:

>>> import duden
>>> w = duden.get('Loeffel')
>>> w
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/site-packages/duden/main.py", line 60, in __repr__
    return '{} ({})'.format(self.title, self.part_of_speech)
  File "/usr/local/lib/python2.7/site-packages/duden/main.py", line 94, in title
    return self.soup.h1.get_text().replace('\xad', '')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xad in position 0: ordinal not in range(128)

Add information from 'synonyms' section

For example for 'Petersilie', this would be something like:

>>> petersilie.synonyms
['(schweizerisch) Peterli', '(bayrisch, österreichisch umgangssprachlich) Petersil', '(südwestdeutsch und schweizerisch mundartlich) Peterle']

Unhandled IndexError for word 'neu'

Command:

$ python3 dudendown.py neu

Output:

neu
===
Adjektiv

Traceback (most recent call last):
  File "dudendown.py", line 154, in <module>
    writing_dict[parts[0]] = parts[1].strip()
IndexError: list index out of range

internationalization

The program is in English (output from Duden in German). It would be nice to support other languages, such as Esperanto and German.

Uncaught exception "Name or service not known"

If there is no Internet connection, the uncaught exception below occurs. This should be replaced with a meaningful error message. This may not be the only error that is not handled.

Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/requests/packages/urllib3/connection.py", line 138, in _new_conn
    (self.host, self.port), self.timeout, **extra_kw)
  File "/usr/lib/python3.6/site-packages/requests/packages/urllib3/util/connection.py", line 75, in create_connection
    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
  File "/usr/lib/python3.6/socket.py", line 745, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/requests/packages/urllib3/connectionpool.py", line 594, in urlopen
    chunked=chunked)
  File "/usr/lib/python3.6/site-packages/requests/packages/urllib3/connectionpool.py", line 361, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/lib/python3.6/http/client.py", line 1239, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1285, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1234, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1026, in _send_output
    self.send(msg)
  File "/usr/lib/python3.6/http/client.py", line 964, in send
    self.connect()
  File "/usr/lib/python3.6/site-packages/requests/packages/urllib3/connection.py", line 163, in connect
    conn = self._new_conn()
  File "/usr/lib/python3.6/site-packages/requests/packages/urllib3/connection.py", line 147, in _new_conn
    self, "Failed to establish a new connection: %s" % e)
requests.packages.urllib3.exceptions.NewConnectionError: <requests.packages.urllib3.connection.HTTPConnection object at 0x7fe827a45eb8>: Failed to establish a new connection: [Errno -2] Name or service not known

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/requests/adapters.py", line 423, in send
    timeout=timeout
  File "/usr/lib/python3.6/site-packages/requests/packages/urllib3/connectionpool.py", line 643, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/usr/lib/python3.6/site-packages/requests/packages/urllib3/util/retry.py", line 363, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
requests.packages.urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='www.duden.de', port=80): Max retries exceeded with url: /rechtschreibung/Test (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7fe827a45eb8>: Failed to establish a new connection: [Errno -2] Name or service not known',))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/bin/duden", line 11, in <module>
    load_entry_point('duden==0.10.0', 'console_scripts', 'duden')()
  File "/usr/lib/python3.6/site-packages/duden/main.py", line 475, in main
    word = get(args.word)
  File "/usr/lib/python3.6/site-packages/duden/main.py", line 418, in get
    response = requests.get(url)
  File "/usr/lib/python3.6/site-packages/requests/api.py", line 70, in get
    return request('get', url, params=params, **kwargs)
  File "/usr/lib/python3.6/site-packages/requests/api.py", line 56, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/lib/python3.6/site-packages/requests/sessions.py", line 488, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/lib/python3.6/site-packages/requests/sessions.py", line 609, in send
    r = adapter.send(request, **kwargs)
  File "/usr/lib/python3.6/site-packages/requests/adapters.py", line 487, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='www.duden.de', port=80): Max retries exceeded with url: /rechtschreibung/Test (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7fe827a45eb8>: Failed to establish a new connection: [Errno -2] Name or service not known',))

The word 'Feiertag' throws an error

To reproduce to bug:

$ ./duden.py Feiertag
Feiertag, der
=============
Word type: Substantiv, maskulin
Commonness: 3/5
Separation: Fei|er|tag
Traceback (most recent call last):
  File "./duden.py", line 202, in <module>
    main()
  File "./duden.py", line 198, in main
    word.describe()
  File "./duden.py", line 65, in describe
    if self.meaning_overview:
  File "./duden.py", line 176, in meaning_overview
    if entry.section and entry.section.h3.text == 'Beispiele':
AttributeError: 'NoneType' object has no attribute 'section'

version: v0.4.0

Alternate spelling - Word not found

Searching for alternate spellings gives Word not found.

$ duden Differenzial
Differenzial, Differential, das
===============================
Word type: Substantiv, Neutrum
Commonness: 1/5
Separation: Dif|fe|ren|zi|al, Dif|fe|ren|ti|al
Meaning overview:
 0. (Mathematik) Zuwachs einer Funktion bei einer [kleinen] Änderung ihres Arguments (2)

 1. Kurzform für: Differenzialgetriebe
$ duden Differential
Word 'Differential' not found

Non-existent words cause `duden` to fail

$ duden d
Traceback (most recent call last):
  File "/home/user/.local/bin/duden", line 9, in <module>
    load_entry_point('duden==0.1.0', 'console_scripts', 'duden')()
  File "/home/user/.local/lib/python3.5/site-packages/duden/main.py", line 407, in main
    word.describe()
AttributeError: 'NoneType' object has no attribute 'describe'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.