Code Monkey home page Code Monkey logo

tmx_clean_duplicates's Introduction

TMX duplicates cleaner

Short description

  1. Use first the programm "tmx_cleanup.py" to find all duplicates in a given TMX files. Check the semi duplicates you want to delete.
  2. Then use the second programm "delete_segments_by_tuids.py" to delete all the not necessary sgements via the ID in a given TMX File.

Description

tmx_cleanup.py

The program asks you which TMX file you want to edit. It has to be in the input folder. It will delete all duplicates and create a txt file with found duplicates, a new TMx file without duplicates and CSV (Delimiter TAB) with all semi duplicates. The first duplicate found will not be deleted. All of them are saved in the output folder.

Input TMX Format

  • Encoding: UTF-8
  • XML Version: 1.0
  • TMX Version: 1.4

delete_segments_by_tuids.py

The program asks you which TMX file you want to edit, then it will ask you for a TXT file, that contains all the IDs. This file has to be Tab sperated. Both of them has to be in the input folder. It will delete all segments if the ID is found from the imported by TXT file. It will create a new TMX file like the inputed TMX file without the deleted segments. if a row or the ID entry in the importet txt file is empty, the programm skips it.

Input CSV Format

  • Encoding: UTF-8
  • Delimiter: TAB (\t)
  • Extension: .txt
  • Header: ID Lang_1 (e.g: en_US) Lang_2 (e.g:de_DE)
  • Rows: 9LASxy3fxC34qd91O80Qksps4 You are welcome. Gern geschehen.

Folder structure

  • input (the programms looks in this folder for the input files to select)
  • lib (programm helpful files, do not delete)
  • output (all the generated files can be found here)

Python version used while coding

  • 3.9.6

Note

  • A semi duplicate is the same translation for different input strings.
  • When importing the semi_duplicates.csv in Excel select the UTF-8 Encoding.
  • To create the input file to import the IDs, click in Excel "Save as" and select the file type the "Text (tab delimited) (*.txt)".

tmx_clean_duplicates's People

Contributors

codecaprese avatar

Stargazers

Philippe Reicher avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.