Describe the bug
When i run pynonymizer to use its process control system , it fails to go through the default process control flow ,ie, as per pynonymize.py
, it is supposed to go through the below flow starting with CREATE_DB
is per my understanding
logger.info(actions.summary(ProcessSteps.CREATE_DB))
if not actions.skipped(ProcessSteps.CREATE_DB):
db_provider.create_database()
logger.info(actions.summary(ProcessSteps.RESTORE_DB))
if not actions.skipped(ProcessSteps.RESTORE_DB):
db_provider.restore_database(input_path)
logger.info(actions.summary(ProcessSteps.ANONYMIZE_DB))
if not actions.skipped(ProcessSteps.ANONYMIZE_DB):
db_provider.anonymize_database(strategy)
logger.info(actions.summary(ProcessSteps.DUMP_DB))
if not actions.skipped(ProcessSteps.DUMP_DB):
db_provider.dump_database(output_path)
logger.info(actions.summary(ProcessSteps.DROP_DB))
if not actions.skipped(ProcessSteps.DROP_DB):
db_provider.drop_database()
But in reality when i run it as follows , it does not create the db and fails
To Reproduce
Issue1:
pynonymizer.run(input_path="main_sys.sql", strategyfile_path="strategy_file1.yaml",
db_host='< host >', db_name = 'main_sys', db_password='<password>', output_path='main_sys_anonymized.sql')
Does this imply that it did not run CREATE_DB
by default, but instead ran RESTORE_DB
first , since logs state restoring followed by the logs stating Table 'main_sys.admins' does not exist ?
So i tried to explicitly start from CREATE_DB
step as shown in Issue2 below
Error log:
mysql: [Warning] Using a password on the command line interface can be insecure.
Restoring: 100%|โโโโโโโโโโ| 233k/233k [00:00<00:00, 658kB/s]
["UPDATE `user` SET `first_name` = ('hello'),`last_name` = ('test');"]
["UPDATE `user` SET `first_name` = ('hello'),`last_name` = ('test');"]
Anonymizing user: 0%| | 0/1 [00:00<?, ?it/s] mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 1146 (42S02) at line 1: Table 'main_sys.admins' doesn't exist
Anonymizing user: 0%| | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/Users/test/Documents/tools/pnonymizer/mask.py", line 3, in <module>
pynonymizer.run(input_path="main_sys.sql", strategyfile_path="strategy.yaml",
File "/Users/test/Documents/DataProcessor/venv/lib/python3.9/site-packages/pynonymizer/pynonymize.py", line 147, in pynonymize
db_provider.anonymize_database(strategy)
File "/Users/test/Documents/venv/lib/python3.9/site-packages/pynonymizer/database/mysql/__init__.py", line 159, in anonymize_database
self.__runner.db_execute(statements)
File "/Users/test/Documents/venv/lib/python3.9/site-packages/pynonymizer/database/mysql/execution.py", line 131, in db_execute
self.__mask_subprocess_error(error)
File "/Users/test/Documents/venv/lib/python3.9/site-packages/pynonymizer/database/mysql/execution.py", line 81, in __mask_subprocess_error
raise error from None
File "/Users/test/Documents/venv/lib/python3.9/site-packages/pynonymizer/database/mysql/execution.py", line 124, in db_execute
subprocess.check_output(
File "/usr/local/Cellar/[email protected]/3.9.9/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py", line 424, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/usr/local/Cellar/[email protected]/3.9.9/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py", line 528, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['mysql', '-h', '127.0.0.1', '-P', '3306', '-u', 'test', '-p******']' returned non-zero exit status 1.
Issue2:
pynonymizer.run(input_path="main_sys.sql", strategyfile_path="strategy_file1.yaml",
db_host='< host >', db_name = 'main_sys', db_password='<password>', output_path='main_sys_anonymized.sql',
start_at_step='CREATE_DB')
When i tried to use start_at_step='CREATE_DB'
in pynonymizer.run()
to understand and change the process control behaviour by ensuring that the database gets created to prevent the above error , the following below error happens which implies that the it is attempting to run RESTORE_DB
and than CREATE_DB
causing the below failure even though it is supposed to first CREATE_DB
.
Error log:
Restoring: 100%|โโโโโโโโโโ| 307k/307k [00:00<00:00, 1.13MB/s]
Anonymizing user: 0%| | 0/1 [00:00<?, ?it/s] mysql: [Warning] Using a password on the command line interface can be insecure
["UPDATE `user` SET `last_name` = ( 'test' );"]
ERROR 1146 (42S02) at line 1: Table 'main_sys.admins' doesn't exist
Anonymizing user: 0%| | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
subprocess.check_output(
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/subprocess.py", line 415, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['mysql', '-h', '127.0.0.1', '-P', '3306', '-u', 'root', '-p******']' returned non-zero exit status 1.
Can you please advise how i can i achieve the basic process flow via the python script ,ie,
1. CREATE_DB
2. RESTORE_DB
3. ANONYMIZE_DB
4. DUMP_DB
. Thank you @rwnx
The only way i am able to use the tool in a step by step manner is to specify the only_step
by calling pynonymizer.run for each of the below values
CREATE_DB, RESTORE_DB,ANONYMIZE_DB,DUMP_DB
Expected behavior
As per documentation and code it should go through the steps in the below order as default process control behaviour ,ie,
logger.info(actions.summary(ProcessSteps.CREATE_DB))
if not actions.skipped(ProcessSteps.CREATE_DB):
db_provider.create_database()
logger.info(actions.summary(ProcessSteps.RESTORE_DB))
if not actions.skipped(ProcessSteps.RESTORE_DB):
db_provider.restore_database(input_path)
logger.info(actions.summary(ProcessSteps.ANONYMIZE_DB))
if not actions.skipped(ProcessSteps.ANONYMIZE_DB):
db_provider.anonymize_database(strategy)
logger.info(actions.summary(ProcessSteps.DUMP_DB))
if not actions.skipped(ProcessSteps.DUMP_DB):
db_provider.dump_database(output_path)
logger.info(actions.summary(ProcessSteps.DROP_DB))
if not actions.skipped(ProcessSteps.DROP_DB):
db_provider.drop_database()
Additional context