akaalias / obsidian-extract-pdf Goto Github PK

Extract PDFs to Markdown within Obsidian

JavaScript 99.92% TypeScript 0.08%

obsidian-extract-pdf's Introduction

Extract PDF text to Markdown

Allows you to extract the basic textual content of a PDF into a Markdown file. Works well with headings, paragraphs and lists.

Demo

How to use this plugin

After you've installed and activated the plugin:

Drag your PDF into Obsidian
Open the PDF within Obsidian
Make sure the pane with your PDF is focused
Click the "PDF to Markdown" button in the sidebar
Edit the generated markdown file to your needs

Tips & Tricks for editing the generated markdown file

I just went ahead and turned a 500 page PDF into markdown and found that it worked better and faster than I expected.

Bulk-removing page footers

The book I used had the same footer on every page. That means they got copied into the markdown file over and over, too.

For bulk search-and-replace I use the Atom editor (https://atom.io):

Copy the footer text into your clipboard
Download and install Atom
Open Atom and open the Markdown file inside
Use "Find -> Find in Buffer" and paste the footer text
Use the button "Replace" or "Replace All" to remove footer text

Remove a single space before a new line of text

Weirdly, sometimes, new lines of text had a space infront of them. Such as:

Some text

...which resulted in Obisidian treating it as a sub-block of the preceding line.

To remove the space for those lines, I used a regular expression search-and-replace:

In "Find in current buffer" activate "Regex Search" (The .* icon)
Enter ^([ ]|\t)+ into the search field
Use the button "Replace" or "Replace All" to remove the space

Known issues

First-time use

If you had a PDF open in Obisidian before you installed and activated the plugin, hitting the button may not work. I've had this issue with other plugins as well. The code just doesn't hook up to already-open files.

The solution is to simply close the PDF note and re-open it. That will allow the plugin to hook into it.

Limited PDF parsing

Please understand that this is a basic, best-effort tool to get basic text and headings from a PDF. It really just gets the text from a pdf and turns it into Markdown. The plugin doesn't handle anything more complex, like tables, images, annotations etc:

Does not turn PDF highlights and annotations into MD highlights
Does not retain PDF numbered lists
Does not skip text in headers and footers

obsidian-extract-pdf's People

Contributors

Stargazers

Watchers

obsidian-extract-pdf's Issues

Cannot Install Because of the wrong manifest.json link

The manifest.json link to : https://github.com/akaalias/obsidian-extract-pdf/releases/download/0.0.3/manifest.json

But the real link in this reposity is https://github.com/akaalias/obsidian-extract-pdf/releases/download/0.0.03/manifest.json

I think 0.0.3 and 0.0.03 is different that made people cannot install the plugin

Including figures from the PDF in the extracted markdown note

What the title says.

A manual alternative to this is taking screenshots/screenclips of the figures and pasting them into the note.

Thanks in advance!

Breaking PDFs opening on mobile

Hi,
We have received several reports that when this plugin is installed PDFs fo not open on mobile.
Perhaps is because you bundle your own pdf.js. You could use the one included in obsidian.
However, I am not 100% this is the issue.

Doesn't work

Running latest obsidian and updated add-on. First time using. Extract pdf button says "no highlights found". Tested on multiple pdfs within obsidian, marked up in both Mac and windows with with preview, or pdf x-change editor. nothing works, but highlights visual.

PDF to MD icon not working

Hi akaalias!

I really apreciate your plugins!

There is an issue when I'm trying to extract md with the PLUGIN PDF TO MD

It doesn't work, it doesn't show any other page with de md

I tried:

Reloading
Restarting the app
Uninstalling and Reinstalling
The same problem with the PDF Highlights to MD

Please help!

request: add to command palette

Pretty self-explanatory, I want to be able to call the PDF to Markdown command from the command palette.

This is because I don't use the sidebar ribbon and therefore have no access to the plugin without turning it on again, not even a hotkey.

Thanks for the plugin!

There is a problem with the spelling of PDF extraction markdown

Hello, there is a problem with the spelling of markdown extracted by PDF when using your plug-in As shown in the figure: the file is as follows:
sluubc1d.pdf

Plugin prevents PDF preview in Obsidian

Hi,

I recently noticed the PDFs are not being previewed in Obsidian. By trial and error, I disabled/enabled plugins one by one, and apparently, this plugin is causing the problem.

I'm on Obsidian v0.10.7 and using plugin version 0.0.6.

Plugin not found in Community Plugins Tab

I clicked the link on the plugin website but it doesn't show up...

https://publish.obsidian.md/hub/02+-+Community+Expansions/02.05+All+Community+Expansions/Plugins/pdf-to-markdown-plugin

do i have to update the plugin index somewhere or might there be another issue?

Breaking "Better PDF Plugin" - Pdf not showing

Embedded PDFs do not show up. In Better PDF Plugin you can embed PDFs in obsidian pages using text-boxes and a pretty simple syntax (as you can see on the main github page of the project).

I use it a lot, and after disabling and enabling all the plugins I have installed, it turned out that for some strange reason, obsidian-extract-pdf, is the one which does not interact perfectly with the previous plugin.

The PDFs, just do not show up. In edit-mode I can correctly see the text, while in preview-mode, it just vanishes, like the text-box didn't exist.

Example:

############
##Edit Mode##
############
First text

        "url" : "Obsidian/helloworld.pdf"

Second text

###############
##Preview Mode##
###############
First text
Second text

[Feature Request] Include pictures from the PDF

Its a kind request for this plugin to have a way to extract images as well. It could be made optional for a user, but a quick extraction with images and graphs would really help :)

Extract PDF button not appearing in my side bar

I have tried with various different themes, but don't have any new buttons appearing post installation and activation.

Plugin opens a dialogue box then does nothing obvious

Steps to reproduce:

Drag your PDF into Obsidian
Open the PDF within Obsidian
Make sure the pane with your PDF is focused
Click the "PDF to Markdown" button in the sidebar

I have let it run for 30 min with no result.
I have dragged in a new PDF with same issue
I have closed and reopened vault -- no change
I have rebooted -- no change

I am on

PopOS 20.10
Obsidian 0.12.4

Extract math formulas from the PDF and including in the markdown note

Many papers have (LaTeX) math formulas and expressions, either inline or as separate equations. These are not extracted properly.

For instance, the following bit of a paper I'm reading:

Is extracted as:

 In the multilevel literature it has been well recognized that using the model inEquation 2as the basis of a multilevel regression analysis will lead to a slope estimate that represents neither the within-cluster slope�( w ), nor the between-cluster slope�( b ).Ifwe refer to this estimated slope as�ˆ, it has been shown that it is a weighted sum of the estimated slopes at both levels, that is 

 ��� ˆ ˆ �( b )�(1��)� ˆ ( w ) �� ˆ ( w )��(� ˆ ( b )� ˆ �( w )) 

##### (3) 

 where�can be thought of as a measure indicating the relative amount of variability at the between-cluster level compared to the total variability in the data across all clusters and time points (Mundlak, 1978;Neuhaus & Kalbfleisch, 1998;Raudenbush & Bryk, 2002). Hence, the weight�is not the intraclass correlation, which is independent of the number of time points; instead it is a direct function of the number of time points and becomes smaller as the number of time points increases, as we will see later. The weighted sum�is represented as the dashed line inFigure 2, which lies somewhere between the within-person slope�( w )and the between-person slope�( b ).

And rendered as:

It would be great if solved. Thanks in advance!

Plugin broken in Ver 0.12.12 of Obsidian

Just opens modal saying it is converting and modal never closes. After closing it myself the no md file was created.

Plugin button doesn't appear

This plugin sounds great, thanks for creating it. I just installed on Linux and there is no button appearing on the sidebar. The plug is turned on. I tried restarting. Any suggestions? Thanks.

Hi Am Gloria Am not sure if this is how to request for this plugin but I can't find it in obsidian community of plugins? Please avail it somewhere or direct me? I have some books I'd like to convert to markdothat obsidian can use?

Thanks

Garbled problem

When the pdf is in Chinese, the extracted files are all garbled

icon too clear

Hello !
This plugin is great, but I cannot see the icon with light mode :

(VS dark mode: