Comments (10)
Article Content API - Unfortunately, pocket does not provide extracted article content to api users without partner privileges.
I'm open to other ideas though. Maybe use a custom extraction method, via BeautifulSoup, or something?
from pockyt.
Thanks for your attention to this detail !
I see what you mean with the api issue, it makes sense.
But I'm still confused how there seems to be other ways to get the 'whole article' text directly from Pocket.For example with calibre, http://calibre-ebook.com , and it's python 'news recipe' scripts called 'readitlater.recipe' (1)
I'm no python expert, I can barely code some shell scripts and grasp a little bit of python.
I was wondering then,
how is it that using that script and calibre's command line tool 'ebook-convert' , http://manual.calibre-ebook.com/cli/ebook-convert.html I do get the entire text of my Pocket articles.
When i used this like for example,
ebook-convert ./readitlater.recipe outputfile.txt --username [email protected] --password my-pocket-account-password
or
ebook-convert ./readitlater.recipe output.OEB --username [email protected] --password my-pocket-account-password
I can get either a text file, or just a bunch of html files,
with all my articles exactly as they are rendered by Pocket
(1)
a. as it is distributed when you install calibre,
https://gist.github.com/m040601/a4258870759f9ad8a6ee
it works for me
b. another fork of the same script (that was not working for me)
tbunnyman/ReadItLater-Calibre-Plugin
https://github.com/tbunnyman/ReadItLater-Calibre-Plugin
This is an updated & modified version of the official Calibre plugin for Pocket (Formerly ReadItLater)
from pockyt.
Interesting. I'll check this out and think of a possible lightweight implementation.
Do you have the time to work on this, by any chance?
from pockyt.
Cool ! Thanks for your interest.
Do you have the time to work on this, by any chance?
Time yes, unfortunately not the skills to do it.
The only thing I can contribute is with research and feedback, as I like to thoroughly investigate and
compare all the available (python and others) solutions and implementations for this problem.
from pockyt.
Newspaper seems to provide Pocket-like functionality.
If this seems like a good enough alternative, I'm willing to integrate it. Thoughts?
EDIT: Actually, the PyPi distribution of newspaper
is outdated, and depends on a lot of heavy libraries - see requirements.
Instead, a better alternative seems to be readability-lxml. Significantly lighter and simpler to use.
from pockyt.
I'm hacking away on this right now. Let's see how it goes.
EDIT: See #7.
from pockyt.
Oops, almost forgot. The reason I'm not considering the scripts you linked to is:
- They scrape
getpocket.com
directly, which is forbidden by their ToS. - Since it's a scrape, the moment Pocket changes their html, it will fail.
However, if this solution isn't good enough, I might reconsider.
from pockyt.
Is there any hack to this? I am going back to a historic collection of articles and what I have found is that the articles have been taken down by the news sites... I would think even saving the HTML response of the article at that time and store it into the DB will help tremendously
from pockyt.
@billlyzhaoyh - Good use-case, I had a bit of time to hack on this today. Managed to get HTML archiving working in 1.4.0.
eg - Get all favorited items and save offline copies of them:
pockyt get -v 1 -a ./pocket
Let me know if it works for you.
from pockyt.
Closing as stale.
from pockyt.
Related Issues (13)
- Updating Pockyt for 3.7 HOT 4
- Updating Pockyt for 3.8 HOT 6
- There was a problem trying to connect this app to Pocket. Please try again. HOT 3
- how can I put title with url HOT 1
- Feature request: since flag for get requests HOT 1
- wrong site on repo description HOT 1
- Supported Python versions? HOT 4
- pockyt reg hanging HOT 4
- Error on running pockyt reg HOT 2
- Crash in `pockyt reg` with python2.7 HOT 9
- Add/remove/set tags via the mod sub-command HOT 4
- pockyt fails to install with Python 2.7 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pockyt.