sheebz / phantom-proxy Goto Github PK
View Code? Open in Web Editor NEWa lightweight proxy that lets you to drive phantomjs from node.
License: MIT License
a lightweight proxy that lets you to drive phantomjs from node.
License: MIT License
crashes phantomjs
I'm new to the phantom-proxy project and I'm looking for a very basic example that shows how to open some public website and print the contents of the dom
I couldn't find a detailed example of this word for word in the README so I'm asking here.
Thank you in advance
Is it possible to emulate the phantomJS post feature as in this example: https://github.com/ariya/phantomjs/blob/master/examples/post.js
page.open(someurl, "post", postData, callback);
https://code.google.com/p/phantomjs/issues/detail?id=230
Is this possible? Would be really handy
Here is the example:
var phantomProxy = require('phantom-proxy');
phantomProxy.create({}, function (proxy) {
proxy.page.set('paperSize', {format: 'A4', orientation: 'landscape'}, function (result) {
proxy.page.render('./test.png', function (result) {
proxy.end(function () {console.log('done');});
});
});
});
At the end of story I get test.png file with the default 400x300 dimensions.
Do you have a plan to support open multiple pages?
Sometimes, we need it.
None of the following methods work for debug info, I still see it in the terminal only (Ubuntu 10.04):
node lib/render.js &>./node.log
node lib/render.js 2>&1>./node.log
node lib/render.js 2>&1 | tee ./node.log
nohup node lib/render.js > ./node.log
The only way to do this is:
strace -s 9999 -ewrite -p 2840 &> node.log
(where p is phatomjs PID)
I noticed this bug during an initial install on a machine that had not previously had phantomjs installed and so therefore was not in the path. Albeit this was my mistake, there are other cases in which this could happen and should be reported.
There are several possible solutions to this depending on how you best see it work with the current flow including throw an exception, return null/false from the create function.
If application fails, it is impossible to create a new phantom-proxy instance without manually killing the hung phantomjs process.
When using evaluate() function we should be able to use a callback function for async functions.
phantomProxy.create({}, function (proxy) {
var page = proxy.page;
page.open('http://www.w3.org', function () {
page.includeJs('http://code.jquery.com/jquery-latest.js', function() {
page.evaluate(function(callback) {
$.get('http://google.com', function() {
var data = 'some async data';
callback(data);
});
}, function(result) {
console.log(result); // will output 'some async data'
});
});
});
});
I considered using evaluateSync() but it doesn't have any option to return a data.
Hi ,
Im getting error on ubuntu
Error: cannot access member `statusCode' of deleted QObject",[{"file":"/home/ubuntu/www/node_modules/phantom-proxy/lib/server/webpage.js","line":117,"function":""}]]
Would you be able to find out why am i getting this ?. The same code works fine on my mac.
Many thanks,
A
Regarding the modifications to lib/request.js mentioned: #30 (comment)
http-headers are supposed to be case insensitive I know @mikeal of request has been adamanat about this.
http://stackoverflow.com/questions/5258977/are-http-headers-case-sensitive
Sounds like a bug with the phantomjs server.
Using proxy.page.set with PhantomJS 1.8.0 always returns false to the callback function even if the operation succeeds. The debug output does not indicate any problems.
A source snippet is:
function(proxy, fn) {
proxy.page.set('paperSize', {format: 'Letter', orientation: 'portrait', margin: '0.25in'}, function(result) {
var err = result ? null : new Error('Unable to set paperSize.');
fn(err, proxy);
});
},
The debug output snippet is:
creating proxy to http://localhost:1061
creating phantom proxy instance with options {"debug":true,"port":1061,"host":"localhost","hostAndPort":"http://localhost:1061","eventStreamPath":"/Users/jcb/Projects/Resonate/rendering/src/main/js/node_modules/phantom-proxy/temp/events.txt","clientPort":61290}
pinged server and got back false
creating a new server, waiting for servercreated event on [object Object]
opening control page
success
connection
response: {"type":"serverCreated","args":[true],"source":"server"}
buffer :{"type":"serverCreated","args":[true],"source":"server"}
emitting serverCreated event on server
server created callback fired with value:true
calling url: http://localhost:1061/page/functions/open
Can someone post code or point me to examples using injectJS() and a cookie file? Thanks
for building, automated testing, linting. Should address some of the concerns raised in #30
Using node 0.8.9
I recently decided to leave the original phantom npm project as it's not maintained and yours appears to be the most active. In doing so I've run into an error that says
Unknown module ./server for require()
When I try to spin up an express web server, then the proxy to scrape the page being hosted. I can hit the express web app while my npm code is running so it doesn't appear to be express related (that I've found yet).
Just curious if you have had this issue or seen anything like it in the past ?
Here is my basic usage of the phantom-proxy code to scrape it (note I see the first print but not the 2nd)
var app = require('./app.js');
// the file I require above exports like so
// exports.webserver = server;
app.webserver.listen(8091);
console.log("first");
var phantomProxy = require('phantom-proxy').create(function(proxy){
console.log("here?");
var page = proxy.page
, phantom = proxy.phantom;
page.open("http://localhost:8091", function() {
page.waitForSelector('body', function(){
});
});
});
The PhantomJS colorwheel example makes use of page.content
in order to set document HTML.
Related: Looks like there are a few other properties missing from the Page object, per this document: https://github.com/ariya/phantomjs/wiki/API-Reference-WebPage#wiki-webpage-content
Please explain what the option "eventStreamPath" does?
Please see its usage here: https://github.com/sheebz/phantom-proxy/blob/master/lib/proxy.js#L271
Seems like it is not used in phantom-proxy as it is not even passed to phantomjs.
Casperjs has a method which takes an object of key/value pairs to set. Would like to do something like this.
for example, if I want to set the userAgent
I tried adding a regular plain old settings object to the page
//sets property value ex. viewportSize
_.extend(this, {
settings: {},
but those settings do not translate to the phantomjs page.
I tried something like this:
request.post(self.options.hostAndPort + '/page/settings/set', {
form:{
propertyName:propertyName,
propertyValue:propertyValue
}
},
but that just hangs forever. can you tell me where things like this are documented:
'/page/properties/get'
Is that a phantom exposed thing? I can't find any documentation about it.
Even after "body tag present" (and "done") being printed to stdout, it takes roughly 15 seconds for the process to end. I also encountered this error, before passing a function to proxy.end
:
%% node server.js
body tag present
/opt/phantomHAR/node_modules/phantom-proxy/lib/proxy.js:213
endCallback(true);
^
TypeError: undefined is not a function
at Object.<anonymous> (/opt/phantomHAR/node_modules/phantom-proxy/lib/proxy.js:213:25)
at Request._callback (/opt/phantomHAR/node_modules/phantom-proxy/lib/proxy.js:250:50)
at Request.self.callback (/opt/phantomHAR/node_modules/phantom-proxy/lib/request/main.js:122:22)
at Request.EventEmitter.emit (events.js:107:17)
at Request.<anonymous> (/opt/phantomHAR/node_modules/phantom-proxy/lib/request/main.js:655:16)
at Request.EventEmitter.emit (events.js:126:20)
at IncomingMessage.<anonymous> (/opt/phantomHAR/node_modules/phantom-proxy/lib/request/main.js:617:14)
at IncomingMessage.EventEmitter.emit (events.js:126:20)
at _stream_readable.js:895:16
at process._tickCallback (node.js:339:11)
Missing the require
for the assert
library:
%% node server.js
...
success
/opt/phantomHAR/server.js:38
assert.equal(result, true);
^
ReferenceError: assert is not defined
at null.<anonymous> (/opt/phantomHAR/server.js:38:9)
at Request._callback (/opt/phantomHAR/node_modules/phantom-proxy/lib/webpage.js:166:62)
at Request.self.callback (/opt/phantomHAR/node_modules/phantom-proxy/lib/request/main.js:122:22)
at Request.EventEmitter.emit (events.js:107:17)
at Request.<anonymous> (/opt/phantomHAR/node_modules/phantom-proxy/lib/request/main.js:655:16)
at Request.EventEmitter.emit (events.js:126:20)
at IncomingMessage.<anonymous> (/opt/phantomHAR/node_modules/phantom-proxy/lib/request/main.js:617:14)
at IncomingMessage.EventEmitter.emit (events.js:126:20)
at _stream_readable.js:895:16
at process._tickCallback (node.js:339:11)
self
should be this
:
%% node server.js
...
self.proxy.page.on('navigationRequested', function (url) {
^
ReferenceError: self is not defined
at Object.<anonymous> (/opt/phantomHAR/server.js:4:5)
at /opt/phantomHAR/node_modules/phantom-proxy/lib/proxy.js:279:28
at /opt/phantomHAR/node_modules/phantom-proxy/lib/proxy.js:166:25
at Request._callback (/opt/phantomHAR/node_modules/phantom-proxy/lib/proxy.js:229:50)
at Request.self.callback (/opt/phantomHAR/node_modules/phantom-proxy/lib/request/main.js:122:22)
at Request.EventEmitter.emit (events.js:107:17)
at Request.<anonymous> (/opt/phantomHAR/node_modules/phantom-proxy/lib/request/main.js:655:16)
at Request.EventEmitter.emit (events.js:126:20)
at IncomingMessage.<anonymous> (/opt/phantomHAR/node_modules/phantom-proxy/lib/request/main.js:617:14)
at IncomingMessage.EventEmitter.emit (events.js:126:20)
Has the time come? I am currently heavily invested in the proxy, and have made many critical changes, and right now am more active in the project.
I'm not sure what your current involvement is, but I would like to see faster bug fixes if possible, of which transferring ownership would help.
Would like to have every method call return a Q promise so I can use .then functions as opposed to nesting callbacks.
Whenever I call page.evaluate with any of these two types of defining regex:
var patt1=new RegExp("\r\n|\r|\n");
var splitted = h1.innerText.split(patt1);
or
var splitted = h1.innerText.split(/\r\n|\r|\n/);
I get this error:
events.js:73
throw new Error("Uncaught, unspecified 'error' event.");
^
Error: Uncaught, unspecified 'error' event.
at EventEmitter.emit (events.js:73:15)
at parseResponse (D:\Github\psp.cz-scraper\node_modules\phantom-proxy\lib\proxy.js:124:50)
at Socket.self.proxy.page (D:\Github\psp.cz-scraper\node_modules\phantom-proxy\lib\proxy.js:203:21)
at Socket.EventEmitter.emit as $emit
at SocketNamespace.handlePacket (D:\Github\psp.cz-scraper\node_modules\phantom-proxy\node_modules\socket.io\lib\namespace.js:335:22)
at Manager.onClientMessage (D:\Github\psp.cz-scraper\node_modules\phantom-proxy\node_modules\socket.io\lib\manager.js:488:38)
at WebSocket.Transport.onMessage (D:\Github\psp.cz-scraper\node_modules\phantom-proxy\node_modules\socket.io\lib\transport.js:387:20)
at Parser. (D:\Github\psp.cz-scraper\node_modules\phantom-proxy\node_modules\socket.io\lib\transports\websocket\default.js:36:10)
at Parser.EventEmitter.emit (events.js:96:17)
at Parser.parse (D:\Github\psp.cz-scraper\node_modules\phantom-proxy\node_modules\socket.io\lib\transports\websocket\default.js:343:12)
Process finished with exit code 1
I run on phantom-proxy 0.1.792, node.js 0.8.20 and win8 64-bit. Is this issue solvable? If not, it would be nice to mention it in the docs for page.evaluate, because it can really take some time to figure out that the regex is the issue. Thanks in advance.
So l created a fresh node project with phatom-proxy, followed one of the examples in your readme.md file, and this is the error l was met with.
'undefined' is not a function (evaluating 'require('./webpage').create(this)')
/node_modules/phantom-proxy/lib/server/webserver.js:11
/node_modules/phantom-proxy/lib/server/webserver.js:164
/node_modules/phantom-proxy/lib/server/webserver.js:165
I am currently running ubuntu 12.04, with node version v.0.9.2-pre and phatomjs version 1.7.0
I help would be appreciated.
You have provided a nice example of the world.js part of a cucmber feature.
It would be most helpful if someone provided a full running example of a cucumber test using phantom-proxy.
Thanks
/Jacob
lib/proxy.js
this.server = http.createServer(function (request, response) {
response.writeHead(200, {"Content-Type":"text/html"});
response.end('<html><head><script src="/socket.io/socket.io.js" type="text/javascript"></script><script type="text/javascript">\n\
window.onload=function(){\n\
var socket = new io.connect("http://" + window.location.hostname);\n\
socket.on("cmd", function(msg){\n\
alert(msg);\n\
});\n\
window.socket = socket;\n\
};\n\
</script></head><body></body></html>');
}).listen();
I don't know if this code gets executed or what, but the readme says this doesn't rely on alerts for communication. is this some weird legacy code, or is the readme a lie?
I may very well be doing something wrong, but I can't find what's wrong with my code, and I keep getting this error:
server running already
node.js:201
throw e; // process.nextTick error, or 'error' event on first tick
^
TypeError: Object # has no method 'unref'
at /home/ubuntu/node_modules/phantom-proxy/lib/proxy.js:105:28
at /home/ubuntu/node_modules/phantom-proxy/lib/proxy.js:152:21
at Request._callback (/home/ubuntu/node_modules/phantom-proxy/lib/proxy.js:136:36)
at /home/ubuntu/node_modules/phantom-proxy/lib/request/main.js:122:22
at Request.emit (events.js:67:17)
at ClientRequest. (/home/ubuntu/node_modules/phantom-proxy/lib/request/main.js:224:10)
at ClientRequest.emit (events.js:67:17)
at Socket. (http.js:1210:13)
at Socket.emit (events.js:88:20)
at Array.0 (net.js:320:10)
It will print anything before:
phantomProxy.create({'loadImages':'no'}, function(proxy) {
but nothing after...
I noticed that I can't create few instances of phantom proxy. Connection with previously created instances brokes every time I call phantomProxy.create because it rewrites context. How can I solve this problem? Thanks.
evaluate(functionToEvaluate, callbackFn, [arg1, arg2,... argN]
I think this would be better written as:
evaluate(functionToEvaluate, [arg1, arg2,... argN], callbackFn
It's more node-ish, and it's more phantom-ish
It seems like what this project is about (driving phantomjs over network calls) is exactly what was build into version 1.8 of phantomjs.
Dont you think it would be better if you only write the nodejs part of the driver and used the webdriver code already in phantomjs? Code you dont write is the best code!
It seems like there are implementations available from selenium in Ruby, Java, PHP, Python and C#.
There is even a node implementation of the webdriver client http://code.google.com/p/selenium/wiki/WebDriverJs?
I'm switching to Ruby to use/drive phantomjs. But goodluck with your project!
Some links to the project..
http://phantomjs.org/release-1.8.html
https://github.com/detro/ghostdriver
http://code.google.com/p/phantomjs/issues/detail?id=49
According to the phantomjs docs:
Note: The arguments and the return value to the evaluate function must be a simple primitive object. The rule of thumb: if it can be serialized via JSON, then it is fine. Closures, functions, DOM nodes, etc. will not work!
Right now, if you return an object in an evaluate function, you get back string "[object Object]"
which is undesired. JSON results should be deserialized if possible. I'm deep in the bowels of main.js
and hope to have a PR for this soon.
Just installed phantom-proxy from npm. Ran the first example of the readme.md ( the one with waitForSelector ) and got the following result
ReferenceError: Can't find variable: process
/home/aashish/node_modules/phantom-proxy/lib/proxy.js:285
/home/aashish/node_modules/phantom-proxy/lib/proxy.js:289
TypeError: 'undefined' is not a function (evaluating 'phantomProxy.create')
phantomtest.js:12
Running the first statement "phantomProxy = require('phantom-proxy');" also
returns the above mentioned ReferenceError.
Node v0.10.5
PhantomJS 1.9.2
Thanks for writing this nice module but I can't get it running .
Is it possible to use this library to create a pool of long running phantomjs processes that runs tasks off a queue of sites to scrape?
I would like to have a pool of 10 processes, and use potentially a redis queue to queue up tasks. Then one by one the tasks would be finished by the phantomjs processes and they will then return the scrape to either another queue or send it back to the initiating process.
Tests:
describe('phantom-proxy', function() {
it('should load Google', function(done) {
phantomProxy.create({ loadImages: 'no' }, function(proxy) {
var page = proxy.page;
page.open('https://www.google.com', function(success) {
if (success) {
proxy.end();
done();
} else {
proxy.end();
expect().fail('Failed to load Google');
done();
}
});
});
});
it('should load Github', function(done) {
phantomProxy.create({ loadImages: 'no' }, function(proxy) {
var page = proxy.page;
page.open('https://www.github.com', function(success) {
if (success) {
proxy.end();
done();
} else {
proxy.end();
expect().fail('Failed to load Github');
done();
}
});
});
});
});
Output:
phantom-proxy
^ should load Google (11624ms)
o should load Github: Fatal error: undefined is not a function
My assumption is this has to do with circular dependencies.
I'm trying to get screenshot one of me page site in pdf file. I'm using expressjs on a server side:
var phantomProxy = require('phantom-proxy');
app.get('/print', function (req, res, next) {
phantomProxy.create({}, function (proxy) {
var page = proxy.page;
page.open('http://127.0.0.1:3128/testpageforprint', function (result) {
setTimeout(function () {
proxy.page.render('testpageforprint.pdf', function (result) {
proxy.end(function () {
res.json(200);
});
});
}, 1000);
});
});
});
But unfortunatelly the pdf file creates partially, and I'm getting not all content and also I'm getting the following error message:
Error: cannot call function of deleted QObject; url::/modules/webpage.js; line:377
Can you help what's my problem?
P.S: Page that I want to convert to pdf has many css and js files (if it does sense for resolving the issue)
"Error: spawn ENOENT" is thrown on every run.
Why is this commented out?
https://github.com/sheebz/phantom-proxy/blob/master/lib/proxy.js#L81
I believe this is causing zombie phantomjs processes to pile up and requires a killall
I have version 0.3.0 (the latest on 30 september 2016)
var wp = require("webpage");
var page = wp.create();
^
TypeError: wp.create is not a function
Hi, guys. It seems that right now we can't create 2+ instances of proxies. When I try to do require('phantom-proxy').create(fn)
several times, module spawns two processes with different PhantomJS options (as in my case), that works fine. But when I do proxy.page.open()
for each proxy, I get various errors. Seems like proxies share smth. and are not fully separated.
Also, I have different options.port
, options.clientPort
.
If I remember right, in some issue @sheebz said that proxy is singletone, or maybe I did not get his thought properly. So what is the situation?
I'm loading up some javascript in the page phantom-proxy is opening and I'm not sure how to "waitFor" all the js to be loaded
var phantomProxy = require('phantom-proxy').create({}, function(proxy) {
var page = proxy.page;
phantom = proxy.phantom;
page.open("http://localhost:8091", function() {
page.waitForSelector('body', function() {
//do stuff with jQuery here but its not available yet?
});
});
});
var phantomProxy = require('phantom-proxy').create({}, function(proxy) {
var page = proxy.page,
phantom = proxy.phantom;
page.open('http://www.google.com', function() {
setTimeout(function() {
proxy.page.sendEvent({event:'keypress', keys:'a'}, function(result) {
console.log(result);
proxy.page.render('look.png', function(result) {
console.log(result);
phantomProxy.end();
});
});
}, 5000);
});
});
Even with my branch, none of the events do anything. ugh. since google auto-focuses the input box, typing in it should make 'a' show up in there. it doesn't. click events don't work either. furthermore, this means none of the event tests matter, because they pass erroneously. I will look at it more in the morning but this is some new craziness. I would seriously consider close sourcing this project, because right now it is not ready for prime time.
https://github.com/sheebz/phantom-proxy#render-a-screenshot
Gives me error
Unsafe JavaScript attempt to access frame with URL about:blank from frame with URL file:///home/main/node_modules/phantom-proxy/lib/server/index.js. Domains, protocols and ports must match.
even with these options
{
'ignoreSslErrors':true,
'localToRemoteUrlAccessEnabled':true,
'cookiesFile':'cookies.txt',
'diskCache':'yes',
'loadImages':'yes',
'localToRemoteUrlAccess':true,
'maxDiskCache':'50000',
'outputEncoding':'utf8',
'proxy':'0',
'proxyType':'yes',
'scriptEncoding':'yes',
'webSecurity':false,
'debug':true,
'port':1061
}
In finding issue #53 i would also suggest that the location of phantomjs be available as an option for input as in many hosting cases the binaries may not necessarily be available in the path and have an alternative location.
This should operate such that if the option is not set the fall back is to use the current phantomjs hardcoded in.
I see it commented out in examples/cucumber/feature/support/world.js
and it doesn't exist in lib/server/webpage.js
?
It's documented as an event but it doesn't seem to ever fire.
Just installed phantom-proxy from npm. Ran the first example of the readme.md ( the one with waitForSelector ) and got the following result :
body tag present
/node_modules/phantom-proxy/lib/proxy.js:213
endCallback(true);
^
TypeError: undefined is not a function
at Object.self.proxy.end (/node_modules/phantom-proxy/lib/proxy.js:213:25)
at Request.module.exports.end [as _callback] (/node_modules/phantom-proxy/lib/proxy.js:250:50)
at Request.init.self.callback (/node_modules/phantom-proxy/lib/request/main.js:122:22)
at Request.EventEmitter.emit (events.js:99:17)
at Request.<anonymous> (/node_modules/phantom-proxy/lib/request/main.js:655:16)
at Request.EventEmitter.emit (events.js:126:20)
at IncomingMessage.Request.start.self.req.self.httpModule.request.buffer (/node_modules/phantom-proxy/lib/request/main.js:617:14)
at IncomingMessage.EventEmitter.emit (events.js:126:20)
at IncomingMessage._emitEnd (http.js:366:10)
at HTTPParser.parserOnMessageComplete [as onMessageComplete] (http.js:149:23)
Debian Squeeze
Node v0.8.17
PhantomJS 1.8.1
... I was finishing this post when I tried the other waitForSelector example of the readme, which worked fine. I may be wrong but the end() method is called on "proxy" on the first example and I made it worked by changing it to "phantomProxy" just like in the second example. So it looks like a typo in the first one.
Anyway, thanks for the great module !
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.