Code Monkey home page Code Monkey logo

Comments (10)

pfalcon avatar pfalcon commented on August 28, 2024

Looking at the current source code, this functionality is not available.

from xapers.

jrollins avatar jrollins commented on August 28, 2024

This is available in the curses UI. xapers show id:..., then Alt-U to "yank" the document source URL.

We should provide a way to print doc source URLs from the CLI as well.

from xapers.

pfalcon avatar pfalcon commented on August 28, 2024

Thanks for the response.

Interesting. But I don't see how that would work, given the contents of https://gitlab.com/jrollins/xapers/-/blob/master/xapers/sources/doi.py . There's no function like query_possible_ids_by_title(). Again, I have just 1960 Recursive Programming - Dijkstra.pdf, and I'd like a tool which would be able to find a DOI (etc.) by just that. (Oh, and of course, there's no DOI in the document text ;-) ).

from xapers.

jrollins avatar jrollins commented on August 28, 2024

Sources identifiers (i.e. DOI IDs) have to be added to individual documents. See the add command. If your document is id:1, then you could do something like:

$ xapers add --source=doi:... id:1

Usually when I add a new document I do so by adding the PDF and the DOI at the same time:

$ xapers add --source=doi:... --file=/path/to/pdf

I've been trying to streamline the interface, and it will be improved in the next release.

from xapers.

jrollins avatar jrollins commented on August 28, 2024

Interesting. But I don't see how that would work, given the contents of https://gitlab.com/jrollins/xapers/-/blob/master/xapers/sources/doi.py . There's no function like query_possible_ids_by_title().

This is not the appropriate place to look. That file just describes how to interact with a remote source. When a document is indexed the source metadata is indexed as well, and searches are done through the internal xapian database.

from xapers.

pfalcon avatar pfalcon commented on August 28, 2024

Sources identifiers (i.e. DOI IDs) have to be added to individual documents.

Umm, no... ;-). Not as far as this user story is concerned. It explicitly says "user doesn't add that boring metadata, instead software automates adding it" (a user can supervise it, actually, that's implied - I for one don't want some stupid AI to contaminate handcrafted metadata of my 1K docs collection).

Usually when I add a new document I do so by adding the PDF and the DOI at the same time:

Umm, and I don't. And that's where conceptual difference between my papersman and xapers lie: my software is "local-first". It's intended to be run by a mere human for their mere-human needs. It's intended to be run by humans who have no idea what DOI is, and couldn't care less. But when, years later, they possibly learn what the heck DOI is, and think that they need some, the software should help to get them (not require humans to enter them manually).

from xapers.

jrollins avatar jrollins commented on August 28, 2024

hrm, sorry, I was just describing how xapers works, not what you personally should or should not do. xapers needs to know the source of the paper to retrieve it's metadata, and there's no good way to figure that out other than by asking the user to supply it. If you have some other suggestion about how xapers could learn the metadata other than through a bunch of fragile ad hoc heuristics, i would be thrilled to learn.

I feel like you may be jumping to conclusions about what xapers is intended to be. It is absolutely a personal paper management system, intended for "mere humans". But for it to be really useful metadata is needed, and xapers has to learn about it somehow. what i absolutely did not want was for xapers to require the user to enter all the metadata manually. that would be a non-starter. Most journals support DOI, which contains all the metadata in a structured format. So that is by far the easiest way to get the metadata into xapers. It's just a single URL, that is clearly provided in most articles. Other source identifiers are supported as well (such as arxiv).

I suggest trying the interactive add option, which scans documents for source identifiers and presents the user with suggestions for which source ID might be appropriate.

I fully acknowledge that your papers might not have DOI or other sources that are supported by xapers. the source support is modular, so users can easily add their own source modules, and i would be happy to include new ones in xapers.

from xapers.

pfalcon avatar pfalcon commented on August 28, 2024

xapers needs to know the source of the paper to retrieve it's metadata, and there's no good way to figure that out other than by asking the user to supply it

Yeah, I know ;-). That's why my cute system doesn't do that ("retrieve metadata"), and I'm looking for an alternative which might do that ("without asking the user") before jumping to implement it myself.

If you have some other suggestion about how xapers could learn the metadata other than through a bunch of fragile ad hoc heuristics, i would be thrilled to learn.

Fragility depends on a particular case. In my collection for example, all papers have full title and pub year (both as part of the filename, I renamed them manually, and that's as much as I'm willing to do manually). Then, from 15min of research yesterday, https://www.crossref.org/ appears to be a service allowing title -> DOI mapping. They have API: https://api.crossref.org/works?query=1960%20Recursive%20Programming%20-%20Dijkstra&filter=until-pub-date:1960 . Bad news for that link is that it returns whole bunch of results. Good news is that #0 is exactly what's needed. That's my plan on how to tackle the problem so far.

I feel like you may be jumping to conclusions about what xapers is intended to be.

I'm definitely trying to build conceptual model of xapers, and have a bunch of hypotheses. I try to avoid jumping to conclusions, but instead present my usecases, discuss, query, suggest...

Most journals support DOI

Even if that's true, what about those which don't? My collection now has 1000+ papers, and 500 of them not having embedded DOI doesn't go with me well. But I doubt "most" is even remotely true. Most of my papers are definitely author preprints and by definition don't have DOI, which gets assigned by publisher when an article is published.

I suggest trying the interactive add option

But I don't start creating my papers collection with xapers. I do have my papers collection, and it grew to such a size that I need tool(s) to help me manage it. In particular, I don't need a tool which will try to "own" my data. And that's another big conceptual difference of my papersman and any other similar tool I spotted so far (including xapers) - they try to create some opaque "database" behind user's back and "host" user's data there. That's google-syndrome - to try to hoard user's data behind their back (and we know what they use that data for - to then spy after users). Such approach so discredited itself (user control/migration/error recovery/longevity) that some people are wary of any attempt to put their data into database without them explicitly asking so ;-).

from xapers.

pfalcon avatar pfalcon commented on August 28, 2024

Btw, I decided to bite the bullet and give a try to another tool I had long in my queue - Zotero. It waited very long in my queue because I knew that my minimalist aspirations unlikely will be satisfied by its big GUI bloat, but... we need to fish those DOIs somehow, right.

As expected, it's pretty cool in its GUIshness. It's also hilariously adhoc at places. But I notice that even it has got it semi-right: Stored Files and Linked Files. I.e. it will definitely own your metadata (but it will own it in sqlite database, which is not bad choice at all to re-own it back), but at least it curbed its appetites regarding owning user data - you can tell it "hands off my data files, link to them, don't try to own them".

Oh, and of course, it retrieves metadata (like DOI) automatically whenever you throw a PDF at it ;-).

All in all, I'm glad I finally gave it a try. That finally answers a question "why there's no not just a clear leader, but even active projects at all in this 'paper management' area". Because there's a clear leader - Zotero. And everyone who's seriously in this area is apparently on it for years already (current major version of Zotero, 5, was initially released in 2017).

So, I guess any tool which wants to work in this area (I mean for nuts like me who can't just use only Zotero) should seriously consider interoperability with Zotero.

from xapers.

pfalcon avatar pfalcon commented on August 28, 2024

For reference, posted on Reddit regarding what people use to organize their personal libraries: https://old.reddit.com/r/ProgrammingLanguages/comments/lbxblp/meta_my_proglangcompilertheoryproganalysis/ . The hypothesis was that majority will reply "Zotero". And indeed, that was a response immediately posted, but in the end, it can't be said that it's a def-facto standard.

Today also stumbled upon https://github.com/neuml/paperai "AI-powered literature discovery and review engine for medical/scientific papers". That's also exactly kinda of buzzwords I was looking for in regard to DOI/other metadata acquisition (Everyone understands it will be of subpar quality, but that's exactly the reason to spend own time on that and reuse others' toys) ;-).

from xapers.

Related Issues (10)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.