Code Monkey home page Code Monkey logo

Comments (13)

Azq2 avatar Azq2 commented on June 5, 2024 1

@lexborisov

myhtml_init(myhtml, MyHTML_OPTIONS_DEFAULT, 1, 0);. same as myhtml_init(myhtml, MyHTML_OPTIONS_DEFAULT, 2, 0);. because

        default:
            // default MyHTML_OPTIONS_PARSE_MODE_SEPARATELY
            if(thread_count < 2)
                thread_count = 2;

from perl-html5-dom.

zdm avatar zdm commented on June 5, 2024

What is the sense to use more threads if performance is degraded?

from perl-html5-dom.

lexborisov avatar lexborisov commented on June 5, 2024

Hi @zdm

How many cores in your processor (without threads)?
The number of threads must be equal to the number of cores + 2. If you have 4 cores, set { threads => 2, }.

from perl-html5-dom.

zdm avatar zdm commented on June 5, 2024

I have 4 cores. Performance is degraded in any case.

Benchmark: running t0, t2, t4 for at least 3 CPU seconds...
        t0:  5 wallclock secs ( 3.94 usr +  0.61 sys =  4.55 CPU) @ 357.38/s (n=1625)
        t2:  1 wallclock secs ( 2.69 usr +  0.39 sys =  3.08 CPU) @ 219.55/s (n=676)
        t4:  1 wallclock secs ( 2.52 usr +  0.67 sys =  3.19 CPU) @ 106.34/s (n=339)
    Rate   t4   t2   t0
t4 106/s   -- -52% -70%
t2 220/s 106%   -- -39%
t0 357/s 236%  63%

from perl-html5-dom.

lexborisov avatar lexborisov commented on June 5, 2024

@zdm
Please, model of your processor, OS, and test files. (Spectre и Meltdown patches?)

from perl-html5-dom.

zdm avatar zdm commented on June 5, 2024

from perl-html5-dom.

zdm avatar zdm commented on June 5, 2024

Could you run benchmarks on your side and post results here?

from perl-html5-dom.

lexborisov avatar lexborisov commented on June 5, 2024

@zdm
I have the same result. I will look at implementation at a Perl.
I test Modest code and it is 2+ times faster in threads mode than in the single mode.

from perl-html5-dom.

zdm avatar zdm commented on June 5, 2024

Also, please, take in attention. that async mode is not working, maybe this issues are related.
And, by the way, it would be nice, if it can call callback when parsing is done instead of $tree->wait call.

from perl-html5-dom.

Azq2 avatar Azq2 commented on June 5, 2024

I have same perfomance degradation on original Modest C source.

$ bin/myhtml/print_tree_high_level ~/index.html                                                                                                                                     
1 threads: 635.027258
2 threads: 674.421497
3 threads: 663.557751

My friends tested on AMD Ryzen and intel i7/xeon. Same results - single thread faster.

"One click" test:

git clone https://github.com/lexborisov/Modest
cd Modest
curl https://html.spec.whatwg.org/ > ~/index.html
curl https://dumpz.org/aNdawx3GKn3Q/text/ > ./examples/myhtml/print_tree_high_level.c
make
bin/myhtml/print_tree_high_level ~/index.html

Test code:

#include <stdio.h>
#include <stdlib.h>
#include <myhtml/api.h>
#include <sys/time.h>

struct res_html {
	char	*html;
	size_t	size;
};

struct res_html load_html_file(const char* filename)
{
	FILE *fh = fopen(filename, "rb");
	if(fh == NULL) {
		fprintf(stderr, "Can't open html file: %s\n", filename);
		exit(EXIT_FAILURE);
	}
	
	if(fseek(fh, 0L, SEEK_END) != 0) {
		fprintf(stderr, "Can't set position (fseek) in file: %s\n", filename);
		exit(EXIT_FAILURE);
	}
	
	long size = ftell(fh);
	
	if(fseek(fh, 0L, SEEK_SET) != 0) {
		fprintf(stderr, "Can't set position (fseek) in file: %s\n", filename);
		exit(EXIT_FAILURE);
	}
	
	if(size <= 0) {
		fprintf(stderr, "Can't get file size or file is empty: %s\n", filename);
		exit(EXIT_FAILURE);
	}
	
	char *html = (char*)malloc(size + 1);
	if(html == NULL) {
		fprintf(stderr, "Can't allocate mem for html file: %s\n", filename);
		exit(EXIT_FAILURE);
	}
	
	size_t nread = fread(html, 1, size, fh);
	if (nread != size) {
		fprintf(stderr, "could not read %ld bytes (%ld bytes done)\n", size, nread);
		exit(EXIT_FAILURE);
	}
	
	fclose(fh);
	
	struct res_html res = {html, (size_t)size};
	return res;
}

double current_timestamp()
{
	struct timeval t;
	gettimeofday(&t, NULL);
	return (double) t.tv_sec * 1000.0 + (double) t.tv_usec / 1000.0;
}

int main(int argc, const char * argv[])
{
	const char* path;
	
	if (argc == 2) {
		path = argv[1];
	}
	else {
		printf("Bad ARGV!\nUse: print_tree_high_level <path_to_html_file>\n");
		exit(EXIT_FAILURE);
	}
	
	int tries = 20;
	double start, elapsed;
	
	struct res_html res = load_html_file(path);
	
	myhtml_t* myhtml = myhtml_create();
	myhtml_init(myhtml, MyHTML_OPTIONS_PARSE_MODE_SINGLE, 1, 0);
	
	start = current_timestamp();
	
	for (int i = 0; i < tries; ++i) {
		myhtml_tree_t* tree = myhtml_tree_create();
		myhtml_tree_init(tree, myhtml);
		
		myhtml_parse(tree, MyENCODING_UTF_8, res.html, res.size);
		
		myhtml_tree_destroy(tree);
	}
	
	elapsed = current_timestamp() - start;
	
	myhtml_destroy(myhtml);
	
	printf("1 threads: %f\n", elapsed / tries);
	
	myhtml = myhtml_create();
	myhtml_init(myhtml, MyHTML_OPTIONS_DEFAULT, 2, 0);
	
	start = current_timestamp();
	
	for (int i = 0; i < tries; ++i) {
		myhtml_tree_t* tree = myhtml_tree_create();
		myhtml_tree_init(tree, myhtml);
		
		myhtml_parse(tree, MyENCODING_UTF_8, res.html, res.size);
		
		myhtml_tree_destroy(tree);
	}
	
	elapsed = current_timestamp() - start;
	
	myhtml_destroy(myhtml);
	
	printf("2 threads: %f\n", elapsed / tries);
	myhtml = myhtml_create();
	myhtml_init(myhtml, MyHTML_OPTIONS_DEFAULT, 3, 0);
	
	start = current_timestamp();
	
	for (int i = 0; i < tries; ++i) {
		myhtml_tree_t* tree = myhtml_tree_create();
		myhtml_tree_init(tree, myhtml);
		
		myhtml_parse(tree, MyENCODING_UTF_8, res.html, res.size);
		
		myhtml_tree_destroy(tree);
	}
	
	elapsed = current_timestamp() - start;
	
	myhtml_destroy(myhtml);
	
	printf("3 threads: %f\n", elapsed / tries);
	
	return 0;
}

Currently i no have idea why. May be:

  1. I do something wrong when using myhtml api.
  2. Bug on Linux/Windows in myhtml.

I continue trying to understand real cause of degradation.

from perl-html5-dom.

lexborisov avatar lexborisov commented on June 5, 2024

Hi @Azq2
Try test with myhtml_init(myhtml, MyHTML_OPTIONS_DEFAULT, 1, 0);.
Thread mode with one thread:

  1. Tokenizer
  2. Tree builder
  3. Token process

from perl-html5-dom.

Azq2 avatar Azq2 commented on June 5, 2024

And summarize:

  1. Threads "tree construction" and "tokenizer" very depend on each other. No way for efficient separate HTML parsing by threads (at least
    in current myhtml implementation).
    Speed very depends on CPU, OS and html contents. In some cases multithreads mode faster. But in other 99.9% single mode get maximum speed.
  2. This not a problem. Single mode fast. Very fast. Faster than other available parsers. I don't know any causes for using threads.
  3. As see benchmark in comment #3 (comment) this is not directly related to the module. That https://github.com/lexborisov/myhtml issue.
  4. I added attention for threads option and changed default threads count to 0.
    https://github.com/Azq2/perl-html5-dom#threads

from perl-html5-dom.

Azq2 avatar Azq2 commented on June 5, 2024

I think problem resolved. Please, open if not.

from perl-html5-dom.

Related Issues (9)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.