A moderate hack for syncing notebooks off the reMarkable, converting them to PDF form, and running OCR on the pages. Attempts to only convert changed pages. Uses AWS textract for OCR. Can sync from cloud using rmapi or directly with SSH-over-USB. Switches seamlessly between the two sync mechanisms. Only syncs one-way: down.
Loosely tested with a reMarkable2 on Linux and OSX(intel).
I admit this is not entirely end-user-friendly. If you know your way around a Unix shell, you should be ok.
I wrote this in a couple of evenings and don't have the time to support it properly. If you like it, help me make it better.
[brew/apt/dnf] install imagemagick jq awscli
pip install boto3 pypdf2
- rm2pdf built and installed in your path
- rmapi built and installed in your path
aws configure
This may help (also look at pricing for OCR)
The first time you run, the script will prompt you to get an authorization code from remarkable. That's all.
Set up passwordless ssh and rsync on your tablet
Example .ssh/config
section:
Host remarkable
User root
ControlMaster no
ControlPath none
Hostname 10.11.99.1
Enter the names of the notebooks you want to sync, exactly as shown on the device, in notebooks.conf
. Make sure you add a newline at the end of the file or the last notebook won't be processed. Example:
Quick sheets
My Other Notebook
Work Notes
From the repo folder, update the notebook list per the above instructions and try running ./rmocrsync.sh ssh
or ./rmocrsync.sh web
. It should work out of the box.
If it completes successfully, take a look in the notebooks
folder. You should have a folder of OCR text files (one file per page), and an annotated PDF that embeds the text in each page.
Note: The files in the meta
folder are used to track changed pages across sync sessions. You probably shouldn't mess with these.