Code Monkey home page Code Monkey logo

phpquery's People

Contributors

tobiaszcudnik avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

phpquery's Issues

privacy-breach-w3c-valid-html, privacy-breach-generic

lintian on Debian found the following problems:

privacy-breach-w3c-valid-html test-cases/document-types/document-fragment-utf8.xhtml (http://www.w3.org/icons/valid-xhtml10)
N:
N:    This package creates a potential privacy breach by fetching W3C
N:    validation icons.
N:
N:    These badges may be displayed to tell readers that care has been taken
N:    to make a page compliant with W3C standards. Unfortunately, downloading
N:    the image from www.w3.org might expose the reader's IP address to
N:    potential tracking.
N:
N:    Note that these icons are non-free and must not be copied into the
N:    package. You could safely delete this W3C validation badge.
N:
N:    Refer to http://validator.w3.org/docs/help.html#icon and
N:    http://www.w3.org/Consortium/Legal/logo-usage-20000308 for details.
N:
N:    Severity: serious, Certainty: possible
privacy-breach-generic test-cases/document-types/document-fragment-utf8.xhtml (http://www.w3.org/tr/xhtml1/xhtml1.pdf) 
N:  
N:    This package creates a potential privacy breach by fetching data from an 
N:    external website at runtime. Please remove these scripts or external 
N:    HTML resources.
N:
N:    Please replace any scripts, images, or other remote resources with
N:    non-remote resources. It is preferable to replace them with text and
N:    links but local copies of the remote resources are also acceptable as
N:    long as they don't also make calls to remote services. Please ensure
N:    that the remote resources are suitable for Debian main before making
N:    local copies of them.
N:
N:    Severity: important, Certainty: wild-guess

Other affected files are

privacy-breach-generic test-cases/document-types/document-iso88592-nocharset.xhtml (http://www.w3.org/tr/xhtml1/xhtml1.pdf)
privacy-breach-w3c-valid-html test-cases/document-types/document-iso88592-nocharset.xhtml (http://www.w3.org/icons/valid-xhtml10)
privacy-breach-generic test-cases/document-types/document-iso88592.xhtml (http://www.w3.org/tr/xhtml1/xhtml1.pdf)
privacy-breach-w3c-valid-html test-cases/document-types/document-iso88592.xhtml (http://www.w3.org/icons/valid-xhtml10)
privacy-breach-generic test-cases/document-types/document-utf8-nocharset.xhtml (http://www.w3.org/tr/xhtml1/xhtml1.pdf)
privacy-breach-w3c-valid-html test-cases/document-types/document-utf8-nocharset.xhtml (http://www.w3.org/icons/valid-xhtml10)
privacy-breach-generic test-cases/document-types/document-utf8.xhtml (http://www.w3.org/tr/xhtml1/xhtml1.pdf)
privacy-breach-w3c-valid-html test-cases/document-types/document-utf8.xhtml (http://www.w3.org/icons/valid-xhtml10)

Separated find queries for request 'h1,h2,h3'

In original jQuery, $('h1,h2,h3') returns a collection of elements sorted by their location in document.
In phpQuery, ['h1,h2,h3'] launch separated find queries for h1, h2 and h3. So the result collection is sorted by tag name in the find request.
In this case it is unable to quickly build a document's table of contents (headers hierarchy).

XML namespaces

Query namespace|* selects all elements not those from specified namespace.

not working: ->find() method with direct child selector

On an item node like this one,

<item>
    <artist>
        <name>Niagara</name>
    </artist>
    <name>Pendant que les champs brûlent</name>
</item>

$node->find('> name');

gets those nodes

<name>Niagara</name>

<name>Pendant que les champs brûlent</name>

while it should only get this one

<name>Pendant que les champs brûlent</name>

Is that a bug ?

I could use

$node->children('name');

But I want to be able to select NOT only direct descendants.

How could I do ?
Thanks.

Mangled javascript if contains closing tags in strings

What steps will reproduce the problem?

  1. Process HTML that contains <script> tag with HTML in strings. Ex:
<div>
<script>
  var html = [
    '<div>',
    '<select>',
    '</select>',
    '</div>',
  ];
</script>
</div>

What is the expected output?
I expect the JS code within <script> tag not to be changed.

What do you see instead?
Some closing tags like </select> or </option> are fully removed, and some, like </div> are changed to close open tags outside of <script>

What version of the product are you using? On what operating system?
phpQuery 0.9.5

In all github forks: Couldn't fetch DOMElement

While parsing some sites fetched works fine, for some fetches, this is the resulting error:
Couldn't fetch DOMElement. Node no longer exists in .../phpQuery/phpQuery.php on line 148
It's the same with all forks I've found on github.
Any ideas?

UTF-8 issue when try to create a DOM document

I have a fetched page by CURL, what charset is windows-1250, and doctype is

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

I change the encoding of my string, check it, and replace the meta charset in string:

$html = str_replace('windows-1250', 'UTF-8', mb_convert_encoding($result, 'UTF-8')); var_dump(mb_detect_encoding($html, "UTF-8, ASCII, ISO-8859-1, windows-1250")); $Doc = \phpQuery::newDocumentHTML($html, 'UTF-8'); echo pq($Doc)->html();

All the UTF-8 characters are messy. var_dump says, its UTF-8, content-type="text/plain; charset=UTF-8".

When I var_dump($Doc); I see, the DOMDocument encoding and xmlencoding values are nulls.

But if I am using:

$Dom = new \DOMDocument(); $Dom->loadHTML($html);

and var_dump it, then everyhing is fine, the characters are ok.

I've checked the createDocumentWrapper and the $contentType is ok.

If I set the static $debug to true I've get this:

`string 'Load markup for content type text/html;charset=utf-8' (length=52)

string 'Loading HTML, content type 'text/html;charset=utf-8'' (length=52)

string 'Full markup load (HTML):

' (length=275)

string 'DOC: UTF-8 REQ: UTF-8' (length=21)

string 'Full markup load (HTML), documentCreate('utf-8')' (length=48)

string 'Selecting document '52280a0c077ec7c5fb2f2350db12f22c' as default one' (length=68)`

pq() returns array in a random order

I tried to use it to parse something like this way:

phpQuery::newDocumentFile($search_url,"text/html;charset=utf8");
$array = pq(".results .wx-rb");
foreach ($array as $item) {
echo pq($item)->find("a")->text();
}

but it outputs in random order, not the order it shows orginally.How to fix that?

Joint repo @ phpquery/phpquery - gathering all edits & forks..

Dear Tobiasz,

dunno if you're actively pursuing phpquery anymore - likely not - however i'd like to inform you about a phpquery repo i put together with the aim to gather all the forks and edits in one coherent and complete repo as single source for further developement.

Current state: I pulled the googlecode SVN - thus preserving your original commit history - added some of the forks as branches, applied some tags and so on..

https://github.com/phpquery/phpquery

If you wish, i can hand over the ownership of that repo to you.. or alternatively to anyone who is willing to act as a future maintainer.

Feel free to contact me at lib..
cheers,
Jan

Parsing HTML5 data ..

Hi! I first downloaded original phpQuery code from Google Code site, and i love the project ;)

I had problems using PQ with some html5 markup ..
In particular, when i try to work with Special SCRIPTS (i.e: HandleBars templates) inside script tag, phpQuery breaks the code, because it treats text inside as it was normal html.

For example, consider "appending" this simple code:

$doc->append("<script> document.write('<div>Hello!</div>'); </script>");

PQ transforms it like this:

<script> document.write('<div>Hello!'); </script>

I had to use Masterminds php-HTML5 Parser like this:

$HTML5 = new HTML5(['disable_html_ns'=>true]) ;
$doc->append($HTML5->loadHTMLFragment("<script> document.write('<div>Hello!</div>'); </script>"));

So, i'd like to know if it is possible to extend phpQuery html parser to use a (better) HTML5 parser, just like Masterminds, or any other ...
I could try to do the trick by myself, but i need someone to address me to where i must add the code.

Hope this could be useful for someone else. TY!

Error: Unable to create XML parser while phpquery installing

[root@default /]# pear channel-discover phpquery-pear.appspot.com
Error: Unable to create XML parser
Discovering channel phpquery-pear.appspot.com over http:// failed with message: channel-add: invalid channel.xml file
Trying to discover channel phpquery-pear.appspot.com over https:// instead
Error: Unable to create XML parser
Discovery of channel "phpquery-pear.appspot.com" failed (channel-add: invalid channel.xml file)
<?php
var_dump(xml_parser_create());
?>

outputs

resource(2, xml)

Latest Amazon Linux

cat /etc/issue
Amazon Linux AMI release 2014.03

Out of memory while fetching too many pages

Following the demo, I tried invoking phpQuery::newDocumentFile($htmlFile) for many times, unfortunately PHP got out of memory. I had to manually unset phpQuery::$documents on every loop.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.