Comments (5)
My thoughts below:
1) The package & CLI options are a good way of exposing Twint's APIs
Agreed - the combination is enough to allow the majority of use cases
2) Propose to remove databases, ES, translations
Agreed - we should keep this package lean and purpose specific
3) Do we really need all the async stuff?
Synchronous requests should suffice
4) Python3 only
Definitely :)
Output should be a file or stdout
The module should expose a generator interface which can iterate between requests - this will allow "streaming" of results
The CLI options all make sense
I will setup project scaffold this weekend to get us started
from twint-ng.
I think you forgot to push the branch.
We can port all the logic concerning which URL's to use and the HTML elements to look at. But since we're going to use (synchronous) Requests there is no use in porting the entire scraper I think.
During tests I did this weekend, I did not find any reason to do things like rotate user agents. So we should keep the code very simple, only adding bells and whistles when we really need them.
from twint-ng.
Scaffolding done in a separate branch - let me know what you guys think of the tooling choices
How much is portable from the current Twint package? I would assume the scraper can be moved across
from twint-ng.
The module should expose a generator interface which can iterate between requests - this will allow "streaming" of results
Yeah, second that, that would be really neat.
from twint-ng.
Totally agree
from twint-ng.
Related Issues (1)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from twint-ng.