Code Monkey home page Code Monkey logo

receipt_parser's Introduction


Version GitHub Workflow Status Upload Python Package CodeFactor GitHub

Receipt parser🧾

What is it?

receipt_parser - Python библиотека, помогающая распознавать товарную позицию из чеков. Для это задачи есть хороший сервис от Тинькофф, однако он не справляется с грязными данными, как на картинке выше. Изначально была задумка использовать нейронные сети, однако в процессе работы, понял, что на разметку нужно потратить много времени/денег, да и модель, основанная на правилах и словарях, даёт хороший результат.

Features

  • распознавание продукта;
  • определение категории товара;
  • распознавание брендов;
  • перевод англицизмов (хугарден --> hoegaarden)🍺

Where to get it

Исходный код в размещен на GitHub.

Библиотека размещёна на Python package index:

pip install receipt-parser

Если возникнет ошибка при установке пакета: Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-izdic4qt/youtokentome/ То установите Cython и повторите попытку: pip install Cython

Usage

Для распознавания сейчас доступна только RuleBased модель.

На вход можно подавать как строку:

from receipt_parser import RuleBased

product_desription = 'Нап.пив.ХУГАР.ГРЕЙПФ.н/ф 0.47л'
rb = RuleBased()
rb.parse(product_desription)

output:

name product_norm brand_norm cat_norm
0 Нап.пив.ХУГАР.ГРЕЙПФ.н/ф 0.47л напиток, пиво hoegaarden Воды, соки, напитки

Так и pd.DataFrame (колонка с товарной позицией должна называться name):

from receipt_parser import RuleBased

rb = RuleBased()
rb.parse(df)

Также в библиотеке есть два вспомогательных класса:

  • Normalizer - для нормализации;
  • Finder - для поиска по словарям.

Future work

  • Добавить тесты
  • Дополнить словари и собранные датасеты
  • Поднять сервис
  • Перейти на нейронные сети...

Support the project 🤗

Буду рад, если вы:

  • найдёте баги;
  • сможете оптимизировать код;
  • дополните словари и датасеты;
  • поможете с разметкой.

receipt_parser's People

Contributors

a3agalyan avatar slgero avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

receipt_parser's Issues

Распознавание сумм

Планируется ли добавить распознавание также и сумм, соответствующих позициям чека?

Как мне кажется, это было бы полезной функцией.

Ошибка установки

pip install receipt-parser
Collecting receipt-parser
Using cached receipt_parser-0.0.28-py3-none-any.whl (19 kB)
Collecting numpy>=1.18.3 (from receipt-parser)
Obtaining dependency information for numpy>=1.18.3 from https://files.pythonhosted.org/packages/93/fd/3f826c6d15d3bdcf65b8031e4835c52b7d9c45add25efa2314b53850e1a2/numpy-1.26.0-cp311-cp311-win_amd64.whl.metadata
Using cached numpy-1.26.0-cp311-cp311-win_amd64.whl.metadata (61 kB)
Collecting pandas>=1.0.3 (from receipt-parser)
Obtaining dependency information for pandas>=1.0.3 from https://files.pythonhosted.org/packages/2d/5e/9213ea10ac473e2437dc2cb17323ddc0999997e2713d6a0b683b10773994/pandas-2.1.1-cp311-cp311-win_amd64.whl.metadata
Using cached pandas-2.1.1-cp311-cp311-win_amd64.whl.metadata (18 kB)
Collecting pandarallel>=1.4.8 (from receipt-parser)
Using cached pandarallel-1.6.5.tar.gz (14 kB)
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Collecting pymystem3>=0.2.0 (from receipt-parser)
Using cached pymystem3-0.2.0-py3-none-any.whl (10 kB)
Requirement already satisfied: setuptools in c:\рабочая\venv\lib\site-packages (from receipt-parser) (65.5.0)
Collecting torch (from receipt-parser)
Obtaining dependency information for torch from https://files.pythonhosted.org/packages/74/07/edce54779f5c3fe8ab8390eafad3d7c8190fce68f922a254ea77f4a94a99/torch-2.1.0-cp311-cp311-win_amd64.whl.metadata
Using cached torch-2.1.0-cp311-cp311-win_amd64.whl.metadata (25 kB)
Collecting torchvision (from receipt-parser)
Obtaining dependency information for torchvision from https://files.pythonhosted.org/packages/20/ac/ab6f42af83349e679b03c9bb18354740c6b58b17dba329fb408730230584/torchvision-0.16.0-cp311-cp311-win_amd64.whl.metadata
Using cached torchvision-0.16.0-cp311-cp311-win_amd64.whl.metadata (6.6 kB)
Collecting wget>=3.2 (from receipt-parser)
Using cached wget-3.2.zip (10 kB)
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Collecting youtokentome>=1.0.6 (from receipt-parser)
Using cached youtokentome-1.0.6.tar.gz (86 kB)
Installing build dependencies ... done
Getting requirements to build wheel ... error
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> [20 lines of output]
Traceback (most recent call last):
File "C:\venv\Lib\site-packages\pip_vendor\pyproject_hooks_in_process_in_process.py", line 353, in
main()
File "C:\venv\Lib\site-packages\pip_vendor\pyproject_hooks_in_process_in_process.py", line 335, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\venv\Lib\site-packages\pip_vendor\pyproject_hooks_in_process_in_process.py", line 118, in get_requires_for_build_wheel
return hook(config_settings)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\AppData\Local\Temp\pip-build-env-kroe7r43\overlay\Lib\site-packages\setuptools\build_meta.py", line 355, in get_requires_for_build_wheel
return self._get_build_requires(config_settings, requirements=['wheel'])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\AppData\Local\Temp\pip-build-env-kroe7r43\overlay\Lib\site-packages\setuptools\build_meta.py", line 325, in _get_build_requires
self.run_setup()
File "C:\Users\AppData\Local\Temp\pip-build-env-kroe7r43\overlay\Lib\site-packages\setuptools\build_meta.py", line 507, in run_setup
super(_BuildMetaLegacyBackend, self).run_setup(setup_script=setup_script)
File "C:\Users\AppData\Local\Temp\pip-build-env-kroe7r43\overlay\Lib\site-packages\setuptools\build_meta.py", line 341, in run_setup
exec(code, locals())
File "", line 5, in
ModuleNotFoundError: No module named 'Cython'
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

mystem

Привет, спасибо большое за крутую работу. Я потыкал немного, есть пара багов, пока не смог понять в чем причина. Пока вот тестовый пример запустил, вывод не похож на пример в описании:
image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.