Code Monkey home page Code Monkey logo

Comments (11)

wordpressfan avatar wordpressfan commented on July 4, 2024 1

Yes it's a small query and the table is indexed well so I believe we are fine but this is a good concern, thanks for discussing it.

The big picture that I was thinking of last night is:
What if we made the process of registering the new job is happening through action scheduler and CRON too?
So the process will be as follows:

  1. The not cached page is visited for the first time, we will create a row in RUCSS table with the status pending_create for example and in this step we will not send the RUCSS SaaS job creation request.
  2. Will have a CRON to get some of those pending_create rows and create async job in action scheduler to create them.
  3. Once the job is created, the status will be pending and get into the normal process.

This will give us the control of how many jobs are created at once and not to miss any job but also will add a layer of complexity.

from wp-rocket.

MathieuLamiot avatar MathieuLamiot commented on July 4, 2024

From @wordpressfan: (Slack)

Let's think about this, we need a way to stop sending the request of registering a new job when we have enough pending jobs (we can define the word enough later)
to do that we can easily create a new filter here:

public function add_url_to_the_queue( string $url, bool $is_mobile ) {

As a switch to enable/disable registering new jobs from the plugin, then we can add a callback to change this switch based on the number of pending+inprogress jobs so for example we get 100 pending rows to process every minute, then we need to stop registering new jobs once we reach 500 rows (pending+inprogress)

from wp-rocket.

MathieuLamiot avatar MathieuLamiot commented on July 4, 2024

@wordpressfan I agree with the approach. However, it should not impact the 408 retry mechanism here
What do you think of adding an optional argument force to add_url_to_the_queue, set to False by default.
And in the method add_url_to_the_queue:

if ( force == False ) {
   $max_pending_jobs = apply_filters( 'rocket_rucss_max_pending_jobs', MAGIC_CONSTANT )
   if ( 0 < $max_pending_jobs) {
      $nb_pending_jobs = count_pending_jobs()
      if ( nb_pending_jobs >= max_pending_jobs) {
         return false;
      }
   }
}

Where MAGIC_CONSTANT is to be defined (> 100 not to block the default batches, and not too big to avoid the problem we currently have. Maybe 300?)
And where count_pending_jobs is just a query on the RUCSS DB.

@piotrbak, what do you think in terms of product impact?

from wp-rocket.

MathieuLamiot avatar MathieuLamiot commented on July 4, 2024

@wordpressfan @vmanthos what was discussed during the daily is that we could replace MAGIC_CONSTANT in the above suggestion with a calculated value?
The one that could be relevant would be this one: n * apply_filters( 'rocket_rucss_pending_jobs_cron_rows_count', 100 )
What do you think?
This way, we would scale how many jobs we send based on how many jobs we are trying to collect.

I would suggest n to be ~3 (it has to be more than 2 and not to big...)
So something like:

$max_pending_jobs = apply_filters('rocket_rucss_max_pending_jobs', 3 * apply_filters( 'rocket_rucss_pending_jobs_cron_rows_count', 100 ) )

(And please, let's not hard code 3 and 100 but use (shared) constants instead? ; maybe put the inner apply_filter in a method as we use it at different places in the codebase)

from wp-rocket.

MathieuLamiot avatar MathieuLamiot commented on July 4, 2024

I'm just worried about hammering the DB with the COUNT query every time a job is added 🤔
It's not a costly query and the DB is small... I don't really know

from wp-rocket.

wordpressfan avatar wordpressfan commented on July 4, 2024

Reproduce the issue

Yes I can see the issue in one of our customer's site.

Root cause

Jobs are kept in plugin's database even when they are removed from SaaS side so every check status after that time will fail because SaaS at this point returns empty response.

Scope a solution

As We mentioned above we need to add a new status pending_create and stop registering jobs into SaaS directly.

  1. Here

    public function create_new_job( string $url, string $job_id, string $queue_name, bool $is_mobile = false ) {
    if ( ! self::$table_exists && ! $this->table_exists() ) {
    return false;
    }
    $item = [
    'url' => untrailingslashit( $url ),
    'is_mobile' => $is_mobile,
    'job_id' => $job_id,
    'queue_name' => $queue_name,
    'status' => 'pending',
    'retries' => 0,
    'last_accessed' => current_time( 'mysql', true ),
    ];
    return $this->add_item( $item );
    }

    we need to make $job_id and $queue_name optional arguments, also change the status here from pending to pending_create

  2. Make $job_id argument optional here with default value as empty string:

    public function reset_job( int $id, string $job_id ) {
    if ( ! self::$table_exists && ! $this->table_exists() ) {
    return false;
    }
    return $this->update_item(
    $id,
    [
    'job_id' => $job_id,
    'status' => 'pending',
    'error_code' => '',
    'error_message' => '',
    'retries' => 0,
    'modified' => current_time( 'mysql', true ),
    ]
    );
    }

And change the status to pending_create.

  1. Remove this part of code from here

    if ( false === $add_to_queue_response || ! isset( $add_to_queue_response['contents'], $add_to_queue_response['contents']['jobId'], $add_to_queue_response['contents']['queueName'] ) ) {
    return $html;
    }
    /**
    * Lock preload URL.
    *
    * @param string $url URL to lock
    */
    do_action( 'rocket_preload_lock_url', $url );
    // We got jobid and queue name so save them into the DB and change status to be pending.
    $this->used_css_query->create_new_job(
    $url,
    $add_to_queue_response['contents']['jobId'],
    $add_to_queue_response['contents']['queueName'],
    $is_mobile
    );

  2. Change the functionality of this method add_url_to_the_queue to only:

$used_css_row = $this->used_css_query->get_row( $url, $is_mobile );
if ( empty ( $used_css_row ) ) {
    $this->used_css_query->create_new_job( $url, '', '', $is_mobile );
} else {
    $this->used_css_query->reset_job( $used_css_row->id );
}
  1. Remove this part:

    if ( false !== $add_to_queue_response ) {
    $new_job_id = $add_to_queue_response['contents']['jobId'];
    $this->used_css_query->update_job_id( $id, $new_job_id );
    }

  2. Remove this part:

    if ( false !== $add_to_queue_response ) {
    $new_job_id = $add_to_queue_response['contents']['jobId'];
    $this->used_css_query->reset_job( $id, $new_job_id );
    }

  3. For the CRON we can use the current CRON rocket_rucss_pending_jobs to process also pending_create jobs by adding a new callback here:

    'rocket_rucss_pending_jobs' => 'process_pending_jobs',

  4. Here: https://github.com/wp-media/wp-rocket/blob/06381b8860eb46e570911b2812652f54a676dd6d/inc/Engine/Optimization/RUCSS/Controller/UsedCSS.php#L871-L870
    Create a new method process_pending_create_jobs that will be called exactly like process_pending_jobs

  5. In this new method process_pending_create_jobs we will get rows with pending_create status and do the same functionality we already do inside process_pending_jobs with a filter to change the number of rows to handle, with a default value (we need to agree on that default value because we only need to keep the number of pending rows below a specific number and use this max_pending_count to decide the number of pending_create to process)

  6. Here we are between two approaches of creating async action scheduler's task for each row to call SaaS to create the job.

Effort

[M] or [L]

@CrochetFeve0251 @wp-media/php to review plz.

from wp-rocket.

MathieuLamiot avatar MathieuLamiot commented on July 4, 2024

Looks good overall. Some comments:
On point 1:

  • we need to make $job_id and $queue_name optional arguments -> Let's just remove those arguments in the method, and pass an empty string to the DB, no?
  • To ease discussion, let's not re-use pending in the status name. Maybe 'to_submit'?

Point 2: Same here, let's just remove the job_id argument.

Point 3 & 4:
Here again, for naming clarity we should properly name methods to avoid confusion.
If I recap your suggestion, my understanding would be:

public function prepare_job( $url, $is_mobile ) {
    $used_css_row = $this->used_css_query->get_row( $url, $is_mobile );
    if ( empty ( $used_css_row ) ) {
        $this->used_css_query->create_new_job( $url, '', '', $is_mobile );
    } else {
        $this->used_css_query->reset_job( $used_css_row->id );
    }
}
public function send_job( $used_css_row ) {
$url = $used_css_row->url;
$is_mobile = $used_css_row->is_mobile; 
// CURRENT CONTENT OF add_url_to_the_queue
 /**
		 * Filters the RUCSS safelist
		 *
		 * @since 3.11
		 *
		 * @param array $safelist Array of safelist values.
		 */
		$safelist = apply_filters( 'rocket_rucss_safelist', $this->options->get( 'remove_unused_css_safelist', [] ) );

		/**
		 * Filters the styles attributes to be skipped (blocked) by RUCSS.
		 *
		 * @since 3.14
		 *
		 * @param array $skipped_attr Array of safelist values.
		 */
		$skipped_attr = apply_filters( 'rocket_rucss_skip_styles_with_attr', [] );
		$skipped_attr = ( is_array( $skipped_attr ) ) ? $skipped_attr : [];

		$config = [
			'treeshake'      => 1,
			'rucss_safelist' => $safelist,
			'skip_attr'      => $skipped_attr,
			'is_mobile'      => $is_mobile,
			'is_home'        => $this->is_home( $url ),
		];

		$add_to_queue_response = $this->api->add_to_queue( $url, $config );
		if ( 200 !== $add_to_queue_response['code'] ) {
			Logger::error(
				'Error when contacting the RUCSS API.',
				[
					'rucss error',
					'url'     => $url,
					'code'    => $add_to_queue_response['code'],
					'message' => $add_to_queue_response['message'],
				]
			);

			return false;
		}
// END OF CURRENT CONTENT OF add_url_to_the_queue

			/**
			 * Lock preload URL.
			 *
			 * @param string $url URL to lock
			 */
			do_action( 'rocket_preload_lock_url', $url );

			// We got jobid and queue name so save them into the DB and change status to be pending.
			$this->used_css_query->add_job_reference(
				$add_to_queue_response['contents']['jobId'],
				$add_to_queue_response['contents']['queueName'],
			);

add_job_reference has to be created.
We can delete add_url_to_the_queue and replace occurrences with prepare_job

in the if ( empty( $used_css ) ) { we would just call prepare_job.

Point 5: Make sure to add do_action( 'rocket_preload_unlock_url', $row_details->url ); after calling prepare_job: the job will not be immediately sent to the server, so we should unlock preload in the meantime.

Point 9: We can use this limit for the number of "to_submit" jobs:
$max_pending_jobs = apply_filters('rocket_rucss_max_pending_jobs', 3 * apply_filters( 'rocket_rucss_pending_jobs_cron_rows_count', 100 ) )

Point 10: Do you foresee an issue in using a for loop to go through the (up to) max_pending_jobs RUL and making the API calls? for fetching the results, we use async tasks for each API call. Is there a reason for this approach?

from wp-rocket.

CrochetFeve0251 avatar CrochetFeve0251 commented on July 4, 2024

@wordpressfan The grooming looks good but there is one point I am not sure about is the rocket_preload_lock_url filter disapearing. Why are we sure we do not need to lock the URLs for the preload anymore?

Another thing this issue is working on the exact same methods with different AC. I guess we need some communication to not end up with monster conflicts.

from wp-rocket.

MathieuLamiot avatar MathieuLamiot commented on July 4, 2024

rocket_preload_lock_url I think it should be kept and used in the CRON job part, when actually sending the job to the SaaS. that could be in the send_job method as suggested above

from wp-rocket.

wordpressfan avatar wordpressfan commented on July 4, 2024

After a discussion with @MathieuLamiot and @CrochetFeve0251

  • Yes we will keep the rocket_preload_lock_url action but will move it from its current place to be in the method that will run for each url after sending the job to SaaS.
  • For the action scheduler part we need to do the same functionality we do now for pending rows, so we will create Async job for every database row to send the request to SaaS.

from wp-rocket.

MathieuLamiot avatar MathieuLamiot commented on July 4, 2024

@CrochetFeve0251 Can you provide a documentation for the new submission system and the related filters, so that support can learn how to use them? Thanks!

from wp-rocket.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.