Code Monkey home page Code Monkey logo

org-roam-similarity's Introduction

Org-roam-similarity

Welcome to my fork of org-similarity by Bruno Arine. As you can assume from the name, this fork is about integrating org-similarity and org-roam more. org-similarity is a very cool package for helping Emacs discover files which are related to a certain file. In its current implementation it has an input of a file (which has to be the current buffer) and an output of a list of links in text pointing to files which are similar to the current file.

This isn’t a problem in itself, however, if you want to integrate it more with org-roam you need to be able to do more than that. I plan to make the output of the low-level function (the one that exists in the original) to be a list of org-roam nodes as then, from the list of nodes you can write functions which do various useful actions with that list. I also want to make the input more flexible by allowing the user to select any node from their system instead of only the current buffer. This is all going to be on the basis of Bruno’s original code (who I would like to thank for his amazing work if he ever sees this) but with more elisp under the hood to manipulate the python script’s output in a way that org-roam can use it better.

Development on this fork started at December 19th 2022 and I will try to have most of the functionality ready in the coming months. Once its fleshed out, I plan to document it better and then release it on MELPA for public usage. If you would like to collaborate in this project, feel free to email me at [email protected] or here. I will try to be very active here, especially during Christmas holidays as this idea interests me a lot.

Below is the original README which contains important instructions for installing and using this package. Since I am going to change a lot of it, it will need to be updated, but for now, it will have to do.

org-similarity

org-similarity is a package to help Emacs org-mode users discover similar or related files. Under the hood, it uses Python and scikit-learn for text feature extraction, and nltk for text pre-processing. More specifically, this package provides a function to recursively scan a given directory for org files, clean their content by stripping the front matter and some undesired characters, tokenize them, replace each token with its respective linguistic stem, generate a TF-IDF sparse matrix, and calculate the cosine similarity between these documents and the buffer you are currently working on.

./example.gif

Dependencies

To ensure you have the latest packages, run:

$ pip install -U scikit-learn nltk

Installation

Via straight.el

(straight-use-package
 '(org-similarity :type git :host github :repo "brunoarine/org-similarity"))

Doom Emacs

(package! org-similarity
  :recipe (:host github :repo "brunoarine/org-similarity"))

Via cloning

You can also clone the repository somewhere in your load-path. If you would like to assist with development, this is the way to go.

To do that:

  1. Create a directory where you’d like to clone the repository, e.g. mkdir ~/projects.
  2. cd ~/projects
  3. git clone https://github.com/brunoarine/org-similarity.git

You now have the repository cloned in ~/projects/org-similarity/. Now paste the following sexps in your init-file (e.g. ~/.emacs.d/init.el):

;; If you cloned the repository
(add-to-list 'load-path "~/projects/org-similarity/") ;Modify with your own path
(require 'org-similarity)

You can also copy org-similarity/ somewhere where load-path can access it, but you’d have to update the file manually.

Configuration

There are a few variables that can be set to customize how org-similarity operates and generates the list of similar documents:

;; directory to scan for possibly similar documents.
;; org-roam users might want to change it to org-roam-directory
(setq org-similarity-directory org-directory)
;; the language passed to nltk's Snowball stemmer
(setq org-similarity-language "english")
;; how many similar entries to list at the end of the buffer
(setq org-similarity-number-of-documents 10)
;; whether to prepend the list entries with their cosine similarity score
(setq org-similarity-show-scores nil)

Usage

Use M-x org-similarity-insert-list RET to insert a list of org-mode links to similar files at the end of your current buffer.

License

Copyright 2020 Bruno Arine

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
  3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

org-roam-similarity's People

Contributors

vidianos-giannitsis avatar brunoarine avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.