Code Monkey home page Code Monkey logo

Comments (9)

dd32 avatar dd32 commented on August 23, 2024 3

WordPress.org has this running as a daily cron task to solve this issue:

/**
 * Mark any duplicate jobs as duplicates.
 * 
 * When the table cannot be read, the query times out, or a new crontask is added
 * it's common to end up with duplicate entries in the cavalcade tables.
 * 
 * This simply marks them as duplicates which prevents `get_cron_array()` having many extra
 * duplicate tasks that slows things down.
 *
 * NOTE: For non-recurring tasks, this will clean out any duplicate jobs even if queued at different times
 *       hopefully that'll be compatible with future core/w.org cron uses.
 */
function mark_jobs_as_duplicates() {
	global $wpdb;

	$jobs_with_duplicates = $wpdb->get_results(
		"SELECT
			min(`id`) as `id_to_keep`,
			`site`, `hook`, `args`, `interval`
		FROM `wp_cavalcade_jobs`
		WHERE status IN ('waiting', 'running')
		GROUP BY `site`, `hook`, `args`, `interval`
		HAVING COUNT(*) > 1"
	);

	foreach ( $jobs_with_duplicates as $j ) {
		$sql = $wpdb->prepare(
			"UPDATE `wp_cavalcade_jobs`
			SET `status` = 'duplicate'
			WHERE `id` != %d AND status = 'waiting'
				AND `site` = %d AND `hook` = %s
				AND `args` = %s AND `interval` = %d",
			$j->id_to_keep,
			$j->site,
			$j->hook,
			$j->args,
			$j->interval
		);

		// Hack for non-recurring jobs. wpdb::prepare() doesn't support NULL fields, but we can force it in.
		if ( is_null( $j->interval ) ) {
			$sql = str_replace( '`interval` = 0', '`interval` IS NULL', $sql );
		}

		$wpdb->query( $sql );
	}

}

It's not ideal, but it's been working for us for quite some time.

Duplicate jobs are a little more common with Cavalcade over the usual WP cron storage, as Cavalcade inserts multiple rows where as WP cron just overwrites the previous cron array with a new one (which will only include one cron entry).

Using a table lock would help, but Cavalcade would also have to reload the DB cron entries for the current site after locking, which could cause a lot of table locking on a high usage site like WordPress.org.

For reference, Cavalcade on WordPress.org in numbers:

  • ~10k-20k cron executions an hour over 20 threads spread over 5 servers
  • 400-600 cron entries added per hour spiking to ~800-1000/hour now and then

from cavalcade.

archon810 avatar archon810 commented on August 23, 2024 1

Would love to hear if there's a solution as we're looking to implement Cavalcade right now. If duplicates are being created, the solution would be (at the moment) no better than WP's broken built-in cron.

from cavalcade.

roborourke avatar roborourke commented on August 23, 2024 1

@archon810 you could try Cavalcade 2.0. Worth noting this doesn't happen all the time - Cavalcade is running on all our production sites and wordpress.org too.

from cavalcade.

archon810 avatar archon810 commented on August 23, 2024 1

Now that we've been running Cavalcade for several months, I just checked and we're also seeing a ton of dupes, including for WP's own functions.

Are there any plans to attempt to root cause and fix the issue?

image
image
image
image

from cavalcade.

r-a-y avatar r-a-y commented on August 23, 2024

I noticed the same thing, which is why I opened #88 to help clean up job entries via WP-CLI, but it would be great to know what is causing these duplicate entries to occur.

from cavalcade.

svandragt avatar svandragt commented on August 23, 2024

Does this project have multiple webservers? My approach was going to be to put an key constraint on hook + args + site so that a second instance of the same event wouldn't get into the database. The issue with that is that the args column is a longtext instead of varchar(max) which doesn't support indexes. (https://github.com/humanmade/Cavalcade/blob/master/inc/namespace.php#L70-L73) The issue can also occur when using intervals because each worker will check if an event needs to be rescheduled.

The acquire_lock method of the runner isn't adequate: https://github.com/humanmade/Cavalcade-Runner/blob/master/inc/class-job.php#L55-L66 When 4 workers get started and they call https://github.com/humanmade/Cavalcade-Runner/blob/master/inc/class-runner.php#L236-L239 at a similar time then try and update we see the race condition here.

Last time I spoke with @rmccue about this he wanted to look at database locking if I recall correctly. Looking into this something like https://dev.mysql.com/doc/refman/5.7/en/innodb-locking-reads.html might be an option.

from cavalcade.

archon810 avatar archon810 commented on August 23, 2024

Maybe change the logic to confirm 100% that the query that looks for existing jobs returned correctly, and if it errors, don't schedule a potential dupe? Error check?

from cavalcade.

roborourke avatar roborourke commented on August 23, 2024

Hey, thanks for these recent updates. Yes is the answer that we want to understand and resolve this issue for good. Thanks @dd32 for the code that checks for duplicates too. We did have some initial discovery work internally but we didn't reach any direct conclusions yet.

We will have a few QA sprints coming up next month so I will try and get this addressed then.

from cavalcade.

germanoronoz avatar germanoronoz commented on August 23, 2024

Hello,

Any news regarding this issue?

We implemented @dd32 duplicate marking solution as a cron job and it's working, but we still see a lot of duplicates (around 30,000 dupes currently).

It's a multisite network with 400+ sites, and three workers (runners on AWS instances) with 25 max-workers each. The three of them are %99 CPU all the time, and we think it is due to the underlying dupes problem.

We modified the runner a bit to change events with less than 15 minutes interval, to make the nextrun the current execution time plus the interval, so they never get stuck, and changed also the nextrun of the events with a one minute interval to be two minutes instead:

image

Hopefully you cant sort this out!
Best regards

from cavalcade.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.