Code Monkey home page Code Monkey logo

moodle-local_datacleaner's Introduction

Build Status

DataCleaner Moodle Module

Moodle DataCleaner is an anonymiser of your Moodle data.

Branches

The following maps the plugin version to use depending on your Moodle version.

Moodle verion Branch
Moodle up to 3.9 master
Moodle 3.10 MOODLE_310_STABLE
Moodle 3.11 and above MOODLE_311_STABLE

The following maps the plugin version to use depending on your Totara version.

Totara verion Branch
Totara 13 TOTARA_13

How it works

Standard practice when hosting most applications, Moodle included, is to have various environments in a 'pipeline' leading to production at the end. eg a typical flow might be dev > stage > prod but there could be as many as you want for various reasons, like load testing, penetration testing etc.

To test properly it's often useful to have real production data in these other environments, but there are downsides:

  • Usually production can be quite massive, we don't need or want it all and disk space can be a pain with multiple copies.
  • There may be sensitive data we don't want to expose to developers or testers, eg personal data, grades, uploaded assignments etc
  • Moodle is integrated with 3rd party systems and we don't want test systems interacting with real systems, eg sending emails, or touching assignments in Turnitin etc, ie we want to remove any API keys and other related config

So we need a way to 'clean' the database after a refresh, to reduce the size of the data, to remove anything sensitive, and to ensure it's not going to touch any other real system. This also needs to be configurable because every Moodle instance has different needs and there is no one-size-fits all approach. This could be configured outside Moodle in the deployments tools, but over time we have found the most flexible and easiest approach is to have this configuration inside Moodle itself, so our clients can directly make these decisions, and not be exposed to any of the complexity of our internal processes around continous integration and deployment.

Practically this means the cleaning configuration needs to be added into the production system (which initially sounds scary but isn't), then you refresh the database to another environment where it can be washed. There are multiple levels of safeguards in place to ensure this never gets run in production, which would of course be catastrophic:

  • It can only be run from the CLI. There is no GUI.
  • We store the hostname in the cleaning configuration data. If the hostname matches production, DataCleaner will not run. If this data is missing then it will not run.
  • Typically a refreshed database will be from a nightly snapshot and so the data should be slightly stale. If a non admin user has logged in recently, that's a sign this Moodle is being used, and the DataCleaner will not run.
  • If cron has run recently, DataCleaner will not run. This should only be run on a data washing instance, cron should not be needed here.
  • It can only be run if and only if a 'local_datacleaner_allowexecution = true;' has been added to config.php

Installation

The simplest method of installing the plugin is to choose "Download ZIP" on the right hand side of the Github page. Once you've done this, unzip the DataCleaner code and copy it to the local/datacleaner directory within your Moodle codebase. On most modern Linux systems, this can be accomplished with:

unzip ./mdl-local_datacleaner-master.zip
cp -r ./mdl-local_datacleaner-master <your_moodle_directory>/local/datacleaner

Once you've copied the plugin, you can finish the installation process by logging into your Moodle site as an administrator and visiting the "notifications" page:

<your.moodle.url>/admin/index.php

Your site should prompt you to upgrade.

Configuration

Once the installation process is complete, you'll be prompted to fill in some configuration details. Note that you MUST visit the DataCleaner config page to save the current wwwroot, or the cleaner will not run later in the other environments.

$CFG->local_datacleaner_allowexecution = true;

You have to add the config item above to your config.php in each of the environments you want the cleaner to run. DO NOT add that config setting to a Production environment!

There are multiple 'cleaners' which process different types of data in Moodle. Each one can be enabled individually and may have additional config settings.

You can find the DataCleaner configuration via the Moodle administration block:

Site Adminstration > Plugins > Local plugins > Data cleaner

Sub-plugin options

Enable the sub-plugin options to clean the corresponding data area.

Cleanup core:

Enable this sub-plugin to clean core configuration settings.

Remove config:

Enable this sub-plugin to clean configuration settings. This has its own Settings page.

Remove standard logs:

Enable to truncate the standard log table.

Remove users:

This will remove users who have not logged in for a specific number of days. This has its own Settings page.

Remove courses:

Remove courses older than a specific number of days and/or in specific categories. This has its own Settings page.

Scramble user data:

Enable this sub-plugin to anonymise user data. This has its own Settings page.

Clean grades:

Enable to delete grade history or replace with fake data. This has its own Settings page.

Replace URLs:

Enable to replace all occurrences of the production URL with another URL. This has its own Settings page.

Cleanup sitedata:

Clean orphaned files or replace with a generic file for the specific file type.

Cleanup email:

When a suffix has been configured in the settings, this will append that value to all emails. There is also a regular expression field that will ignore users when appending the suffix.

Also this will allow you to configure following Moodle settings:

  • noemailever
  • divertallemailsto
  • divertallemailsexcept

Environment matrix:

Notice: A soft dependency on local_envbar is required for populating the available environments that can be configured.

This facilitates searching values in the {config} and {config_plugins} tables to allow setting those values. Useful for scrubbing API keys to prevent them calling home on a development environment.

A CLI script exists to run the Environment matrix cleaner as a standalone operation.

sudo -u apache /usr/bin/php /<your_moodle_directory>/local/datacleaner/environment_matrix/cli/matrix_replace.php --run

An additional CLI flag has been implemented. --reset.

This flag will purge all other saved environment configuration so that the new instance only has one set of environment data.

Running

After installing and configuring DataCleaner, copy your database and optionally your site data to another Moodle instance.

From here run the cli script. On most modern Linux systems, this can be accomplished with:

sudo -u apache /usr/bin/php /<your_moodle_directory>/local/datacleaner/cli/clean.php --run

There are protections in place which prevent accidental running on this on your production system - which would of course be catastrophic!

More options

Run the cli script with --help for more options:

sudo -u apache /usr/bin/php /<your_moodle_directory>/local/datacleaner/cli/clean.php --help

moodle-local_datacleaner's People

Contributors

abias avatar anupamatd avatar azrek avatar brendanheywood avatar danmarsden avatar dkleto avatar dmitriim avatar egiles avatar golenkovm avatar ilyatregubov avatar jwalits avatar keevan avatar kenneth-hendricks avatar marcusboon avatar marxjohnson avatar nhoobin avatar nicosoft avatar nigelcunningham avatar patkira avatar peterburnett avatar picnicpete avatar rhell4 avatar roperto avatar sarahjcotton avatar scottverbeek avatar srdjan-catalyst avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

moodle-local_datacleaner's Issues

Few leftover bits after cleaning, e.g. user stats, course completions, scorm ...

Should the user clean option also dump records from other tables? I noticed data still in

stats_user_daily, stats_user_weekly, stats_user_monthly, user_lastaccess, use_preferences,

With clean completion enabled, I noticed user records in course_completions, course_modules_completion pointing to user records that no longer exist due to the user clean.

scorm_scoes_track doesn't get touched at all. I guess I'll delete the 200,000+ results using sql ... a scorm plugin would be ideal.

Arbitrary sql cleaner

This is a fall back cleaner for worst case scenario of weird bits that you need to get dirty in the sql to clean.

  • Thinking just a giant text area which a bunch of sql statements. It would allow multiple sql statements separated by ; and add that in if it's missing so you can do a bunch of things
  • the sql would be in moodle sql, so using {table} and not mdl_table
  • maybe have some ability to interact with the matrix so we can do an update based on env. Not sure how this could work, but thinking:
    • if the matrix can just have the ability to set arbitrary key / values which are separate from the admin search function, and then you can store whatever you want
    • in the sql you can then reference any config item via something like {{core:lang}} or {{mod_assign:foo}}. There is only a loose dependency on the matrix, all the matrix does is set the config items, and then we can leverage any config item in the sql.

Add option to clean more log tables

logstore_standard is clened in /cleaner/logstore_standard/classes/clean.php .

We could add cleaning more log tables, including Totara ones, e.g.:

  • mdl_prog_completion_log
  • mdl_totara_sync_log
  • mdl_upgrade_log

Maybe logstore_standard should be renamed into more generic cleaner/log? Let me know what do you think.

Risky tests for cleaner/config/

vendor/bin/phpunit "cleaner_config_test" local/datacleaner/cleaner/config/tests/config_test.php
Moodle 3.5.7+ (Build: 20190725), dd20fb4ecf72aed13097931af07e7c871de9ffa9
Php: 7.1.24.1.16.04.1.1, pgsql: 9.5.14, OS: Linux 4.4.0-134-generic x86_64
PHPUnit 6.5.14 by Sebastian Bergmann and contributors.

Removing config settings 0% (0/2) 2 / 2 (100%)
Removing config settings 50% (1/2) 00:00:00 elapsed.
Removing config settings 100% (2/2) 00:00:00 elapsed.

Execution took 00:00:00

Time: 6.55 seconds, Memory: 50.00MB

There was 1 risky test:

  1. cleaner_config_test::test_cleaner_config_execute
    Removing config settings 0% (0/2)
    Removing config settings 50% (1/2) 00:00:00 elapsed.
    Removing config settings 100% (2/2) 00:00:00 elapsed.

Execution took 00:00:00

To re-run:
vendor/bin/phpunit "cleaner_config_test" local/datacleaner/cleaner/config/tests/config_test.php

OK, but incomplete, skipped, or risky tests!
Tests: 2, Assertions: 9, Risky: 1.

The user scrambling feature has a rounding error which makes a lot of duplicates

The aim is to have a deterministic, but non reversible way of taking the existing names and mixing them up. We have N records and we have F fields we want to randomize.

The way this should work is to create a temporary table with a sorted list of names, and then using two prime numbers make pseudo random combinations from this list.

So if you had a bunch of real names like this:

id First Last
1 David Smith
2 Nicholas Hoobin
3 Bill Jones
4 Daniel Roperto
5 Brendan Heywood
6 Sarah Bryce

In this case we have N = 6, and F = 2.

What we want is to find the smallest two adjacent prime numbers P1 and P2 which when multiplied gives us the number of records we have or higher. In this case 6 = 2 x 3. (note a quick heuristic is to start this search at it's square root and then work down / up from there. So ceil(root(6)) = 3).

For N = 100 records we'd have P1 = 9 and P2 = 11 (sqrt(100 = 10))

For N = 1000 records we'd get P1 = 31 and P2 = 37 (sqrt(1000) = 31)

For N = 10000 records we'd get P1 = 101 and P2 = 102 (sqrt(10000) = 100)

So we are going to create a temp table with 3 (the bigger prime) records. This process is the same regardless of F.

If we had 5 fields to randomize then F = 5, we want 5 prime numbers. So as well as P1 and P2 we just find the next couple primes higher than P1, P2, etc.

Now we create a temp table which has as many records as our highest prime. We only have P1 and P2, and P2 = 3. So we want 3 records in our temp table. Let's start by cloning the first 3 records from our real data into a temp table:

id First Last
1 David Smith
2 Nicholas Hoobin
3 Bill Jones

We don't want to be able to figure out who was who originally so lets sort each column independently:

id First Last
1 Bill Hoobin
2 David Jones
3 Nicholas Smith

Now we pick the index modulo 2 for the first column, and the modulo 3 for the second column (and so on if we had more columns) and then lookup what value this maps to in the temp table as our new scrambled value:

id Old First Old Last id % P1 (2) New First id % P2 (3) New Last
1 David Smith 1 Bill 1 Hoobin
2 Nicholas Hoobin 2 David 2 Jones
3 Bill Jones 1 Bill 3 Smith
4 Daniel Roperto 2 David 1 Hoobin
5 Brendan Heywood 1 Bill 2 Jones
6 Sarah Bryce 2 David 3 Smith

And voila, we have now scrambled everyone! Everyone has a realistic name, the names cannot be easily reversed back to the original records, and because of the prime numbers we guarantee that everyone has a unique combination of names. Because it is a single sql update is it incredibly quick. And because we are sampling from our original names we get realistic data that looks familiar.

Note with a small N, there data does look repetitive, but for larger data-sets this deficiency fades away. If you have 1000 students, and typical course sizes of 30 students, then you shouldn't see too may duplicates within a course.

Also because it is deterministic if we re-run this each night we will get the same result so it's easier for a tester to use the same user again and again and not have it be wildly different each time (eg if we used pure random numbers).

Users scramble seems to ignore database settings

Hi, when testing the php local/datacleaner/cli/clean.php --run I noticed that configuration was completely ignored to get id to update.

I had to change the line
$criteria = self::get_user_criteria(static::$options);

to
$config = get_config('cleaner_delete_users');
$criteria = self::get_user_criteria($config);

Not sure if it's default behaviour or not, but regardless of what I did, Admin users were ignored until I did the change into the code.

Clean or scramble user extra profile fields

Data in extra user profile fields is not cleaned. I would suggest to do something simple - e.g. just delete all the extra data with equivalent of:
TRUNCATE mdl_user_info_data;

This should be an extension to moodle-local_datacleaner/cleaner/users/classes/clean.php I believe;
Let me know what do you think and I can work on a patch.

Add setting to remove really _all_ courses to cleaner_courses

Hi,

This feature request is targeted at the latest version of the plugin published on https://moodle.org/plugins/pluginversion.php?id=12918 and may be superseded by recent commits here in Github.

We would like to use local_datacleaner to remove really all courses in one of our washing box instances.

On /admin/settings.php?section=cleaner_courses, we can only select the minimum age for a course to be deleted and the categories to be deleted. We have seen that, if we set minimum age = 0 and don't select any category, the plugin seems to interpret this to remove all courses, but unfortunately some courses remained undeleted.

Before we dive into debugging why some courses remain undeleted, I would like to ask if it's possible to add a setting which just removes really all courses in a washing box.

PS: Selecting all categories might be an intermediate solution, but I am afraid that this might also break as soon as new categories are created.

Thanks,
Alex

Table not found when using 'USING' (courses/delete_dangling_course_contexts)

I have tested this plugin in Moodle3.8
There seems to be an issue in the Courses/delete_dangling_course_contexts function.

ERROR

Default exception handler: Error writing to database Debug: Unknown table 'mdl_context' in MULTI DELETE
DELETE FROM mdl_context USING mdl_course
WHERE contextlevel = 50
AND mdl_context.instanceid = mdl_course.id
AND mdl_course.id IS NULL

The Error is produced with the execution of the following statement

$DB->execute("DELETE FROM {context} USING {course}
WHERE contextlevel = 50
AND {context}.instanceid = {course}.id
AND {course}.id IS NULL");

This is the correcting statement I used to successfully remove the dangling course context.

$DB->execute("DELETE {context} FROM {context} 
LEFT JOIN {course} 
ON {context}.instanceid = {course}.id 
WHERE {course}.id IS NULL 
AND {context}.contextlevel = 50");

Totara 9.8 unit tests failing on test_it_removes_backup_files

Hi,

Totara version: 9.8
Totara build: '20170621.00'
which relates to Moodle version: 2015111610.00
Moodle release = '3.0.10 (Build: 20170508)'
PHP: 5.5
OS: Linux
DB: Postgres

Unit tests for Totara are failing with:

14:43:26.444 There was 1 error:
14:43:26.444
14:43:26.445 1) cleaner_orphaned_sitedata\tests\unit\backup_cleaner_test::test_it_removes_backup_files
14:43:26.445 ReflectionException: Method stored_file::get_pathname_by_contenthash() does not exist
14:43:26.445
14:43:26.445 /var/www/client/totara/local/datacleaner/cleaner/orphaned_sitedata/tests/unit/orphaned_sitedata_testcase.php:72
14:43:26.446 /var/www/client/totara/local/datacleaner/cleaner/orphaned_sitedata/tests/unit/backup_cleaner_test.php:81
14:43:26.446 /var/www/client/totara/local/datacleaner/cleaner/orphaned_sitedata/tests/unit/backup_cleaner_test.php:65
14:43:26.446 /var/www/client/totara/lib/phpunit/classes/advanced_testcase.php:80
14:43:26.446
14:43:26.447 To re-run:
14:43:26.447 ./vendor/bin/phpunit cleaner_orphaned_sitedata\tests\unit\backup_cleaner_test local/datacleaner/cleaner/orphaned_sitedata/tests/unit/backup_cleaner_test.php
14:43:26.447
14:43:26.447 FAILURES!
14:43:26.448 Tests: 3, Assertions: 2, Errors: 1

Cheers.

Unit tests issues when running phpunit init

sudo php admin/tool/phpunit/cli/init.php

....
-->cleaner_users
++ Success ++
PHP Warning:  asort() expects parameter 1 to be array, null given in /var/www/vanilla-moodle/local/datacleaner/cleaner/completion/settings.php on line 55

Warning: asort() expects parameter 1 to be array, null given in /var/www/vanilla-moodle/local/datacleaner/cleaner/completion/settings.php on line 55
PHP Warning:  asort() expects parameter 1 to be array, null given in /var/www/vanilla-moodle/local/datacleaner/cleaner/courses/settings.php on line 49

Warning: asort() expects parameter 1 to be array, null given in /var/www/vanilla-moodle/local/datacleaner/cleaner/courses/settings.php on line 49
PHP Warning:  asort() expects parameter 1 to be array, null given in /var/www/vanilla-moodle/local/datacleaner/cleaner/completion/settings.php on line 55

Warning: asort() expects parameter 1 to be array, null given in /var/www/vanilla-moodle/local/datacleaner/cleaner/completion/settings.php on line 55
PHP Warning:  asort() expects parameter 1 to be array, null given in /var/www/vanilla-moodle/local/datacleaner/cleaner/courses/settings.php on line 49

Warning: asort() expects parameter 1 to be array, null given in /var/www/vanilla-moodle/local/datacleaner/cleaner/courses/settings.php on line 49

PHPUnit test environment setup complete.

Admin user can't be deleted after Moodle instance has been cleaned

Hi,

This bug report is targeted at the latest version of the plugin published on https://moodle.org/plugins/pluginversion.php?id=12918 and may be superseded by recent commits here in Github.

We have a Moodle instance which is cleaned with these settings:

// Enable and configure local_datacleaner sub-plugins
$CFG->forced_plugin_settings = array(
    'cleaner_core' => array(
        'enabled' => 1,
        'deletemucfile' => 1),
    // Delete courses to fulfil data protection obligations.
    'cleaner_delete_users' => array(
        'enabled' => 1,
        'minimumage' => '0',
        'keepsiteadmins' => 1),
    // Delete users to fulfil data protection obligations.
    'cleaner_courses' => array(
        'enabled' => 1,
        'minimumage' => '0'));

As a result, all courses and all users (except admins) are deleted.
Now, if we go to /admin/user.php and want to delete one of the remaining admin accounts, there may (not necessarily with all user accounts, but reproducibly with multiple user accounts) be an error message that the user account can't be deleted:

bildschirmfoto 2018-01-29 um 21 31 09

Add possibility to add a manual admin user to cleaner_delete_users

Hi,

This feature request is targeted at the latest version of the plugin published on https://moodle.org/plugins/pluginversion.php?id=12918 and may be superseded by recent commits here in Github.

After (optionally) deleting really all users from one of our washing box instances, we would like to use local_datacleaner to add a manual (=manually authenticated) admin user who can access the cleaned Moodle instance to do his stuff. The credentials of this admin user should be configurable within the local_datacleaner settings.

This is requested because

  • there may be users who should be able to play with a cleaned Moodle instance as admins, but don't have admin permissions in production.
  • there may be reasons not to re-use the existing accounts of these users, for example because you un-hooked the cleaned Moodle instance from LDAP.

Thanks,
Alex

Installation issue

Attempting to install the plugin on a recent 3.0+ site. I left all the settings in their default value. After pressing the Save changes button, I get a screen with just the header "New settings - Cleanup sitedata" and the save button. No way to navigate cleanly away from this page, I end in a loop.

This may or may not be an issue with the Moodle core code that detects if a new settings were added or not. This may or may not relate to the fact that (some) settings should use the proper frankentyle prefix/name.

Make it so that the actual data washing must be enabled from config.php only

I was thinking about the risks and you are right in the description that they are quite scary. It is just too easy to loose the context and perform the fatal operations on the live site in these situations. Good to see checking mechanisms in place to prevent accidental execution on the production site.

What about adding yet another one - the actual data washing would be executed only if a special flag was enabled via config.php. And it must be physically in config.php only, not in the database. Something like

if (empty($CFG->config_php_settings['local_datacleaner_allowexecution'])) {
    cli_error('Execution not allowed here');
}

url_replace is doing lots of redundant time expensive work

  • do't ever replace on a column which isn't a text or varchar or similar
  • add extra debugging so that we can see the timings for each column + table, rather than just table granularity. We want to be able to see exactly which calls are expensive
  • change the 23 / 443 progress to the total number of columns being replaced not tables

Plugin causes issue on plugin overview page

After installing the plugin, here is what shows on plugin overview page and the page is not accessible.

Coding error detected, it must be fixed by a programmer: Cannot call moodle_page::add_body_class after output has been started.
Debug info: 
Error code: codingerror
Stack trace:
line 1141 of /lib/pagelib.php: coding_exception thrown
line 966 of /lib/pagelib.php: call to moodle_page->add_body_class()
line 1535 of /lib/pagelib.php: call to moodle_page->set_course()
line 642 of /lib/pagelib.php: call to moodle_page->initialise_theme_and_output()
line 773 of /lib/pagelib.php: call to moodle_page->magic_get_theme()
line 1601 of /admin/renderer.php: call to moodle_page->__get()
line 332 of /admin/renderer.php: call to core_admin_renderer->plugins_control_panel()
line 216 of /admin/plugins.php: call to core_admin_renderer->plugin_management_page()
Coding error detected, it must be fixed by a programmer: block_manager has already loaded the blocks, to it is too late to change things that might affect which blocks are visible.
Debug info: 
Error code: codingerror
Stack trace:
line 867 of /lib/blocklib.php: coding_exception thrown
line 388 of /lib/blocklib.php: call to block_manager->check_not_yet_loaded()
line 417 of /lib/blocklib.php: call to block_manager->add_region()
line 1912 of /lib/outputlib.php: call to block_manager->add_regions()
line 1543 of /lib/pagelib.php: call to theme_config->setup_blocks()
line 642 of /lib/pagelib.php: call to moodle_page->initialise_theme_and_output()
line 773 of /lib/pagelib.php: call to moodle_page->magic_get_theme()
line 54 of /lib/classes/output/mustache_template_finder.php: call to moodle_page->__get()
line 110 of /lib/classes/output/mustache_template_finder.php: call to core\output\mustache_template_finder::get_template_directories_for_component()
line 54 of /lib/classes/output/mustache_filesystem_loader.php: call to core\output\mustache_template_finder::get_template_filepath()
line 99 of /lib/mustache/src/Mustache/Loader/FilesystemLoader.php: call to core\output\mustache_filesystem_loader->getFileName()
line 82 of /lib/mustache/src/Mustache/Loader/FilesystemLoader.php: call to Mustache_Loader_FilesystemLoader->loadFile()
line 619 of /lib/mustache/src/Mustache/Engine.php: call to Mustache_Loader_FilesystemLoader->load()
line 160 of /lib/outputrenderers.php: call to Mustache_Engine->loadTemplate()
line 2908 of /lib/outputrenderers.php: call to renderer_base->render_from_template()
line 2825 of /lib/outputrenderers.php: call to core_renderer->notification()
line 387 of /lib/setuplib.php: call to core_renderer->fatal_error()
line ? of unknownfile: call to default_exception_handler()``

Concerns about the handling of the original wwwroot

Hi,

I have some concerns about the handling of the original wwwroot setting in this plugin. These concerns are targeted at the latest version of the plugin published on https://moodle.org/plugins/pluginversion.php?id=12918 and may be superseded by recent commits here in Github.

[...] Note that you MUST visit the DataCleaner config page to save the current wwwroot, or the cleaner will not run later in the other environments.
** Well, what's the DataCleaner config page? It's /local/datacleaner/index.php, but you won't know without looking at the code.
** The DataCleaner config page stores the original wwwroot automatically when the page is visited. But this page does not output any information that the original wwwroot has now been stored when you visit the page.

  • There is no possibility to check that the original wwwroot is set correctly and / or to check which value it has without looking at the database.

  • The original wwwroot is stored in a Moodle core variable on $CFG->original_wwwroot in the mdl_config table. As far as I know, plugin settings should rather go to mdl_config_plugins.

  • The original wwwroot is set everytime when you visit the DataCleaner config page. This also happens when you visit the DataCleaner config page in the washing box Moodle instance. I think this is unexpected.

I would propose to describe crystal clearly in the README and on the DataCleaner config page how the original wwwroot setting works and what the admin has to know and do to make it work.

Thanks,
Alex

Security check for online users may be too strict

Hi,

This problem report is targeted at the latest version of the plugin published on https://moodle.org/plugins/pluginversion.php?id=12918 and may be superseded by recent commits here in Github.

If you reset a Moodle production instance which has constantly some active users to a washing box instance and if your resetting script is really quick, you might see the fact that the local_datacleaner run denies to work because there are still "recently active" users in the database.

Looking at the code, I saw that local_datacleaner's security check checks if there were users active in the past 5 mins. So we could overcome this security check by adding a sleep of 3 minutes to our resetting script, but this is a rather cumbersome solution.

If you have any idea to prevent a false alarm of the security, I would be grateful.

Thanks,
Alex

Concatenate operator not supported by AMOS yet

FYI, the AMOS parser at lang.moodle.org does not support the concatenation operator as used in
you lang file (e.g. line 33 and others). As a consequence, your plugin's strings can not be registered with AMOS at the moment. Sorry for that annoyance. Meantime, we generally recommend to avoid using the operator in the string files and simply put all the strings into one long string (line lenght auto-checks are excluded for lang files).

Scramble user data fails with database error

Hi,

While testing this plugin I get the following error:

== Running users cleaner ==
Scrambling the data of 6755 users.
Scrambling user data 0% (0/168875) !!! Error writing to database !!!
Potential coding error - existing temptables found when disposing database. Must be dropped!

When I turned on debug mode I saw it was trying to run the following query which seems to have an issue with the FROM clause:

UPDATE mdl_user u
SET firstname = mdl_temp_table.firstname,firstnamephonetic =mdl_temp_table.firstnamephonetic,alternatename = mdl_temp_table.alternatename
FROM mdl_temp_table
WHERE (1 + (u.id % 83)) = mdl_temp_table.id AND u.id IN (?,?,?...)

Plugin version: 2016051801
Moodle version: 2016052300.05 (3.1+)
MySQL version: 5.6.27

I would love to use this feature, let me now if you need any more info.

Thanks for the great plugin,
Mikhail

The env matrix should use the admin settings api to save data instead of directly calling set_config

ie if we mutate a saml setting we should not need to do stuff like this:

https://github.com/Peterburnett/SAMLrefresh/blob/master/SAML_refresh.php#L17-L20

So I think we need to:

  1. test if the admin tree contains an admin setting with the same name as what we are cleaning
  2. if it exists the instantiate an admin setting object and calls it's methods to set and save the data. If there is exceptions then this is something we need to fix in the prod matrix data. (see below, just call admin_write_settings($data);) If this happens then we still need to either hard fail, or we should still set the data via set_config so there is no chance that prod data remains. But a hard failure is more correct because there might be prod data still saved in a secondary location which needs to be cleaned. Either way we need some noisy error logs
  3. If there is no admin setting tree object then just call set_config as we are now

Investigate skipping tables in restore step to save time

There is a large variety of tables that we want to restore the schema of, but not the data. Depending on the DB we can do this outside the moodle code, eg read this:

http://stackoverflow.com/questions/37038193/exclude-table-during-pg-restore

So not sure about the exact architecture here, but I think the rough steps are:

  • create generic restore scripts (and maybe backup??) inside this plugin so that we have that under source control and this plugin is taking control of a little more of the process
  • where applicable / possible we create the list of tables to restore data from (it seems we need a white list not a black list). We have a chicken and egg problem here as we don't yet have the DB restored. So maybe we parse the install.xml files and the excluded tables are hard coded? Or maybe because this is slowly changing, we have a cli script which generates the restore script based on current config, which you then copy as-is to the washing instance and run there.

Add setting to remove only big files to cleaner_sitedata

Hi,

This feature request is targeted at the latest version of the plugin published on https://moodle.org/plugins/pluginversion.php?id=12918 and may be superseded by recent commits here in Github.

We would like to use local_datacleaner to remove only big files (i.e. which are more than a configurable number of MB in size) in one of our washing box instances.

On /admin/settings.php?section=cleaner_sitedata, we can on the one hand select certain file types to be replaced. On the other hand, Moodle knows about the size of stored files in mdl_files. So there is the data and the mechanism to realize this hopefully quite quickly.

Thanks,
Alex

Protect tables from course cascade deletion.

When cleaner_courses is enabled during the cascade deletion some important tables have rows removed from them which can impact using the site.

One example would be removing rows from the table {my_pages} which would result in not being able to view the page at $CFG->wwwroot/my

Totara 12 - failing unit test on cleaner_config_test

Getting following error when running unit tests on Totara 12:

  1. cleaner_config_test::test_cleaner_config_getwhere
    Property 'names' defined in 'cleaner_config_test' was not reset after the test!
    Please either find a way to avoid using a class variable or make sure it get's unset in the tearDown method to avoid creating memory leaks.

/var/www/clients/totara/lib/phpunit/classes/advanced_testcase.php:521
/var/www/clients/totara/lib/phpunit/classes/advanced_testcase.php:105

To re-run:
vendor/bin/phpunit cleaner_config_test local/datacleaner/cleaner/config/tests/config_test.php

Make clear what the core cleaner does

Hi,

I would like to know what the core cleaner does in this plugin. This question is targeted at the latest version of the plugin published on https://moodle.org/plugins/pluginversion.php?id=12918 and may be superseded by recent commits here in Github.

When you visit /admin/settings.php?section=cleaner_core, you can enable a "core cleaner" which isn't described in any way on the settings page. The README doesn't say anything about this core cleaner either on https://github.com/catalyst/moodle-local_datacleaner#cleanup-core.

If you enable it and run local_datacleaner, you can read:

=== DRY RUN ===
== Running core cleaner ==

Would truncate 12 tables.

Looking at the code on https://github.com/catalyst/moodle-local_datacleaner/blob/master/cleaner/core/classes/clean.php#L48, you will then see that there are some tables cleaned by this cleaner.

I would be grateful if you could make clear what these tables should be cleaned, ideally on the settings page and the Readme.

Thanks,
Alex

Totara 12 - failing unit test on cleaner_email_test

In Totara 12, the following unit test is failing:

  1. cleaner_email_test::test_cleaner_email_suffix_append
    Property 'config' defined in 'cleaner_email_test' was not reset after the test!
    Please either find a way to avoid using a class variable or make sure it get's unset in the tearDown method to avoid creating memory leaks.

/var/www/site/lib/phpunit/classes/advanced_testcase.php:521
/var/www/site/lib/phpunit/classes/advanced_testcase.php:105

To re-run:
vendor/bin/phpunit cleaner_email_test local/datacleaner/cleaner/email/tests/cleaner_email_test.php

Add possibility to disable a scheduled task with local_datacleaner

Hi,

This feature request is targeted at the latest version of the plugin published on https://moodle.org/plugins/pluginversion.php?id=12918 and may be superseded by recent commits here in Github.

We would like to use local_datacleaner to disable a scheduled task in our washing box instances. The washing box instances have cron enabled by purpose and they have $CFG->noemailever = true set in config.php by purpose. This works generally fine. However, the auth_ldap\task\sync_task is configured to run every night in our production instance and would also run in the washing box instance.

There isn't a mechanism in Moodle core to disable or configure a scheduled task via config.php (like $CFG->foo for forcing core settings and $CFG->forced_settings for forcing plugin settings). That's why we would be grateful if local_datacleaner would offer the ability to disable and / or reconfigure a scheduled task when a washing box instance is cleaned.

I am aware that disabling and / or reconfiguring a scheduled task with local_datacleaner would not prevent an admin to re-enable / re-configure the scheduled task later. To force a scheduled task's configuration, an addition to Moodle core would be necessary.

Thanks,
Alex

User data cleaner

I cannot run the user cleaner. It keeps running in 'dryrun' mode ... or am I doing something wrong?
I am using v2.3.5

Both say: 'Would ..."

Please see the screenshot.
-- screenshot 1: DryRun Mode
dryrun_mode

-- screenshot 2: Run Mode
run_mode

Invasive test output

When running unit tests, the plugin outputs a large amount of Replacing URLs information which is not particularly useful for a high level view of the unit tests for a Moodle instance.

Request a check is included and this output is turned off when unit tests are running to avoid cluttering and distorting the unit test output output.

Example:
replacing_urls

cleaner removed items from config_log breaking future upgrades

After running data cleaner we attempted to run major upgrades to the LMS (from 3.1 to 3.5) we found that the default settings had removed unused plugin default settings from config_log - which stores all plugin configuration defaults.

As the upgrader checks these defaults when it runs it failed as the settings didn't exist anymore (even if they aren't in use). To recover the upgrade we had to import the last known copy of the config_log table.

Create site backup, restore, refresh and clean commands inside this plugin

  • There are lots of ways to backup and restore a db, it would be good at collect best practice around this and put it inside this and under source control.
  • it needs to the same regardless of db, so the same script should detect the dbtype in config.php and do whatever it should. Ideally we should do this in php land using define('ABORT_AFTER_CONFIG', true); instead of trying to parse the php file. There is a crusty bash script here which does this in a poor mans way:
  • it should be smart and auto detect all the db stuff from config.php when creating the backup, and likewise when restoring a back. ie you should be able to have a different db name in a new instance and it will just do what it's supposed to.
  • unlike the script below it should make as few assumptions around the environment as possible, ie not assume it has sudo, not assume the postgres user is called something in particular

https://github.com/brendanheywood/scripts/blob/master/mdl-backup

Create a 'dataset' cleaner which can manage a matrix of config data for different env's

This is a pseudo poor mans cmdb, but the difference is that the kind of config that would be managed here is by the client vs more technical or confidential stuff which can be managed at the deployment level

I think this cleaner could be partially dependent on the envbar which already knows a list of environments.

  • Then this would generate a table with a column for every env with prod on the left. Each row would be 1 config item.

  • Under the table would be a form which is simply a search, if you type in anything it returns a list of all config items whose module, name or value matches, you can then select which ones you want to manage and they get added to the table above.

  • Now that they are added to the table above, you can edit the values for the other env's, they all default to empty.

  • So now when this cleaner runs, it will just pickup the current wwwroot and then replace all the config items depending on what it is. If the env does not match one of the configured env's then just delete the config as a default safe behavior.

  • In many setups there will be a cleaning box which does the bulk of the work of stripping out and reducing the dataset size and then producing a clean db dump.This would then be restored into potentially many environments. So make a simple cli script for calling this cleaner directly in isolation so that after restore into each env it can reset the values again to what they should be in this env.

  • It is a possibility that each env should not see each other env's data. Generally this would only mean non-prod can't see prod, but possible we won't tighter restrictions. So in the standalone cli also have a flag to reset the values to what they should be, and then drop all the datacleaner env matrix config so there is no way to see the other env's config.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.