Code Monkey home page Code Monkey logo

marathi-shabd's Introduction

For English, scroll down

"मुक्त स्रोत" हे मराठी भाषेतील एक माहिती व साधनांचा संग्रह आहे.

प्रकल्प

इतर कल्पना ज्या सध्या माझ्या डोक्यात आहेत.

  • सॉफ्टवेअर/माहिती भाषांतर - सॉफ्टवेअर व माहिती इंग्रजीतून मराठीत भाषांतर करणाऱ्या प्रकल्पांची यादी व मार्गदर्शन
  • मराठी भौगोलिक नकाशे (डिजिटल आणि छापण्यायोग्य)
  • मराठी खगोलशास्त्रीय नकाशे (डिजिटल आणि छापण्यायोग्य)
  • शालेय मुलांसाठी छापण्यायोग्य (शक्य असल्यास बहुभाषिक) वर्णमाला (चित्रांसह वर्णमाला सारणी)
  • एक छापण्यायोग्य बहुभाषिक "गर्भवती आई आणि मुलांच्या काळजीसाठी पुस्तिका". (बहुभाषिक म्हणजे आपण इच्छित असलेल्या कोणत्याही भारतीय भाषांमध्ये पुस्तक निवडू आणि छापू शकता.)


Mukta-strot is a collection of information and tools in Marathi.

Projects

Other ideas that are right now in my head.

  • Software/information localisation - List of software/data projects for localisation from English to Marathi, and some guidance.
  • geographical maps in Marathi (digital and printable)
  • star maps (astronomical maps) in Marathi (digital and printable)
  • a printable वर्णमाला (alphabet chart with pictures) for school kids (if possible multilingual)
  • a printable multilingual "handbook for pregnant mother and child care". (multilingual means you can choose and print the book in any (or multiple) Indic languages you want.)

Goal

  1. To improve and promote the use of unadulterated Marathi among its existing speakers, especially in the urban population.
  2. To develop information resources and tools which can be used in Marathi.

Note -

  • We are looking for contributors in various areas, especially with writing in Marathi.
  • We are open to other Indian languages also.

Please contact us if you would like to help with our cause.

Contact

About

Mukta-strot is Marathi translation of English "open-source".

All the information and tools you find here are free (as in freedom) and their source is open.

Open source means you can see how something was made, and free means you can modify it as per your needs and distribute it to others.

Inspiration

marathi-shabd's People

Contributors

masonwoodford avatar sanketgarade avatar the-kaustubh avatar zarbod avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

marathi-shabd's Issues

topics parser

  • parse the database db.csv file tags column and extract the individual tags (topics) from it.
  • if there are multiple tags (separated by semicolons) those also need to be separately extracted
  • duplicates should be removed
  • the output of this parser should be a list of tags in a string array separated by a delimiter (comma for now)
  • this string array can be later used in the main.py script to generated output files by topic (which are currently hardcoded.)

topic specific md files do not contain words which have multiple tags

words in the database db.csv which have multiple tags that are separated by semicolon (e.g. work;engineering), do not appear in either of the markdown files (i.e. word.md and engineering.md) in the browse/topics folder.

But words that have only single tag do appear in the same files.

db to markdown file script

  • input file will be the db.csv
  • output will be a markdown file which will be used on the github pages website. for now it will the be the home page of the site.
  • for now, a user will have to manual search for a word of interest (or can also use the browser's search function.)

Search by tags

Each word has one or more tags, which are like categories under which the word falls.

On the "search" page, Give user the option of searching such categories. Since the categories will be limited by those in the database, only those should be made available to search. The search (more like selection) can be done via a drop down menu.

(The list for the drop down menu itself can be created by parsing the tags present in the database.)

Once user selects a particular tag from the menu, they should be shown all the word-blocks for the words that fall under that tag.

remove python cache folder __pycache__

a cache folder gets created everytime a python script is run. Is there a way to prevent this from being created? If not then can be deleted after the scripts are executed? If that too is not possible, then probably this can be added to gitignore.

each browse md file must have a its file name in its heading

  • currently the topic files being created have only the word blocks for the words under that topic.
  • we need to add a title/heading to each topic file, so that it is easy to understand which topic this file is about.
  • this will also be helpful when combining (concatenating) multiple topic files to each other, whenever such a feature is needed in the future.
  • the title should be same as name of the file

eg. the title for topic file science.md will look like

science

by using

# science

... followed by the individual word blocks

database to markdown file

This script is needed to create a single markdown file containing all the words from the database csv file.

The output format for each word and it's Marathi equivalent is given in the example present in the template folder.

Word info graphic creator

This is a feature to create card like images which can be shared on social media for creating awareness.

Additional to the search functionality, a user can get an info graphic (like the one shown below) when he or she search for a specific word. It can be downloaded as an image (worst case, take a screenshot) and can be shared further.

If anyone has any idea how this can be done and wants to take it up please let me know here. So that we can discuss further.

image

temporary form to add missing/incorrect words

make a google/zoho or any other form(s) which can be used to collect -

  • missing words in database
  • (not related to this database but) any other incorrect usage of hindi/english words/grammar in marathi.

  • and to add a link to these forms from the home page.

CSV Database filter script

Program to take a database (csv format currently) as input, keep only the necessary data (as per a filter criteria which is another input), and output this data in the same format as the input database.

add content to database

in below priority -

  1. missing marathi words (translations)
  2. topics for each word
  3. more words (focus on daily use words)

संख्या

हजारांच्या, लाखांच्या -
सहस्त्रावधी, लक्षावधी, कोट्यावधी, अब्जावधी
असे संख्यावाचक मराठी शब्द आहेत.

वरील शब्द डेटाबेस मध्ये जोडणे.

delete the "filtered.csv" file after it is no longer necessary

on running main.py , the intermediate file "filtered.csv" is left as it is after the md files are generated.
This file should be removed OR even better - never be created in the first place and instead the filtered csv "data" should be passed as data structure (like a row list or something).

this is low priority

Looking for Marathi language contributors

Currently most of the documentation is in English but I would like to have the Marathi version for each.

So anyone who is good at and interested in creating stuff in Marathi is most welcome.

Marathi will primarily be needed in -

  • The repo documentation
  • The website pages

Form for users to submit missing Marathi words for EXISTING English words

  • There are some words in the database where English word is present but the Marathi word is absent.
  • For such missing Marathi words, we can have a page where all the English words are displayed along with a text box beside it (to enter the missing Marathi word) and press a submit button at the top or bottom of the page to submit the words to the database.
  • These user submitted words will be collected in another database and will enter the main database upon review.

sorting of db file

  • the db file needs to be sorted using the english word column
  • sort order will be A at top and Z at bottom

Shouldn't this be a web application with a database instead of a plain csv file.

A web app so that layman users can search for words that they need, or adding new words could be simplified.

Possible Problem:
Everytime a new english word has to be added we have to make changes to csv file and open a PR. Even if in future a github.io page is added, it will still be diffiicult to work with a single csv file.

Proposed Solution:
Web app with its database which can be used to fetch words through a Github page (frontend). And this app can also provide a simple UI to add new English words and their alternative Marathi words.

Website styling

  • all pages to have same styling
  • for now, use the home page style as a reference. (No explicit styling has been applied there. It's the default style without any theme being applied.)
  • static pages must be able to be viewed without any issues even if the user turns on "reader mode" on their browser.

Create a form for users to submit NEW words

  • Users should be able to submit new words
  • Submission should happen on a dedicated page/form
  • The submitted words should get added to a database of "user suggested words" and not the main database. From there they will be reviewed by the maintainer and added to the main database.
  • the main database is currently in a simple csv file, but any other suitable type of database (of user suggested words) for this feature is ok.

search function - example sentence is not output

on the search page, entered "experience" in the text box.

output was -

Results:
experience : अनुभव
English example currently not available for this wordMarathi example currently not available for this word

but the csv contains both the example sentences.

Markdown Word block update

Template file (word-template.MD) is updated in template folder.
@zarbod can you please update the gen block script to output the block in the updated format?

This block basically is updated due to the new columns in the database.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.