ppomes / myanon Goto Github PK
View Code? Open in Web Editor NEWA mysqldump anonymizer
Home Page: https://ppomes.github.io/myanon/
License: Other
A mysqldump anonymizer
Home Page: https://ppomes.github.io/myanon/
License: Other
Given the following SQL:
DROP TABLE IF EXISTS `the_blobs`;
CREATE TABLE `the_blobs` (
`blob1` blob,
`blob2` tinyblob,
`blob3` mediumblob,
`blog4` longblob
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
INSERT INTO `the_blobs` VALUES (
0x68656c6c6f,
0x68656c6c6f,
0x68656c6c6f,
0x68656c6c6f
);
And the following myanon configuration:
# Config file for test1.sql
secret = 'lapin'
stats = 'no'
tables = {
`the_blobs` = {
`blob1` = fixed '0x0000000000'
}
}
When I run build/main/myanon -f tests/test1.conf < tests/test1.sql
, I should expect to see:
DROP TABLE IF EXISTS `the_blobs`;
CREATE TABLE `the_blobs` (
`blob1` blob,
`blob2` tinyblob,
`blob3` mediumblob,
`blog4` longblob
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
INSERT INTO `the_blobs` VALUES (
0x0000000000,
0x68656c6c6f,
0x68656c6c6f,
0x68656c6c6f
);
But I actually see:
DROP TABLE IF EXISTS `the_blobs`;
CREATE TABLE `the_blobs` (
`blob1` blob,
`blob2` tinyblob,
`blob3` mediumblob,
`blog4` longblob
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
INSERT INTO `the_blobs` VALUES (
0x0000000000x68656c6c6f,
0x68656c6c6f,
0x68656c6c6f,
0x68656c6c6f
);
Dump parsing error at line 7: syntax error - Unexpected []
Would be nice to get some feedback if a column specified in the config file didn't exist rather than silently ignore... had misspelled one column so had the potential to leak PII data (which we are trying to avoid).
It would be a really nice feature for myself if it could parse a data-only dump (mysqldump --no-create-info
). I fiddled with it for a while and couldn't get it to work.
An easy workaround for now is to process the entire dump, and then just remove the create statements from the resulting script. This is a great little tool, thank you!
Given a field like id
and a field like username
, I'd like to keep id
the same but set username
to user<id>
.
I imagine something like:
tables = {
`people` = {
`id` = texthash 10
`username` = sql CONCAT('user', id);
}
}
Consider this a feature request. Thanks :)
When anonymizing a column that contains both NULL and non-NULL values the NULL values are hashed. I would expect only the non-null values to be hashed and the NULL rows to maintain their current NULL value.
I've attached a tar.gz with an example:
On line 48 of null-example-anonymized.sql you can see that rows 3 & 5 hash the NULL value to 'ahavykafkojauwmdriqpohobuuttmiif'. I would expect those values to remain NULL.
null-example.tar.gz
Please update the documentation for Ubuntu to include the following:
sudo apt-get install build-essential
Once the user runs that step on a fresh Ubuntu 20.04 server the ./configure process runs without error.
Is there an official myanon docker image on Docker Hub or elsewhere?
Can't parse set, like this:
CREATE TABLE some_table
(
some_field
int NOT NULL AUTO_INCREMENT,
some_field
timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
some_field
timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
some_field
int NOT NULL,
some_field
int NOT NULL,
some_field
decimal(28,8) NOT NULL,
flags
set('vat_included') CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '',
some_field
int NOT NULL,
fails with the error:
Dump parsing error at line 8: Unable to read table definition - Unexpected [s]
Could you fix it please?
we have a rule
``key = fixed '0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF'
64 chars, going into a varchar(64) column, but the last character is being truncated, which then effects all our other data. If we try and extend this, we get an error as > 64 characters. Really love a fix :) or push of new code if you have any
When using the mysqldump flag --insert-ignore
myanon will not anonymize anything.
Would be helpful to support it.
https://dev.mysql.com/doc/refman/8.0/en/mysqldump.html#option_mysqldump_insert-ignore
Using SQL:
CREATE TABLE `test_with_column_names` (
`a` int(10) unsigned NOT NULL
) ENGINE=InnoDB;
INSERT INTO `test_with_column_names` (`a`) VALUES (1);
And configuration:
# Config file for test1.sql
secret = 'lapin'
stats = 'no'
tables = {
`test_with_column_names` = {
`a` = inthash 2
}
}
Gives a syntax error, running with myanon -d
, the output is:
main/myanon -d -f tests/test1.conf
CREATE TABLE `test_with_column_names` (
`a` int(10) unsigned NOT NULL
) ENGINE=InnoDB;
INSERT INTO `test_with_column_names` (FOUND TABLE `test_with_column_names`
ENTERING STATE ST_TABLELOOKING FOR `test_with_column_names`:`a`
ENTERING STATE INITIALFOUND TABLE `test_with_column_names`
ENTERING STATE ST_VALUES
Dump parsing error at line 3: syntax error - Unexpected [(]
Process finished with exit code 1
After running this a few times, on different environments, I've noted that my first "texhhash" username is always the same random value - seems like need to add some randomisation onto the base?
I have field data in JSON Arrays and simple coma separated list in string, right now whole field is anonymized which "destroys" the array format of it.
Got some areas where I have to use fixed to set the value, but looks like if this was null it still replaces it with fixed (I believe?) text hash only seems to replace when it's not null, so maybe need a fixed leave null flag / option? fixed "1234567" true (not default, don't replace null)
The process now that generates random hash data works but it would be useful if we could pipe to a faker data generator to be able to take the production data and replace with test data that is human readable.
Given the following config:
# Config file for test1.sql
secret = 'lapin'
stats = 'no'
tables = {
`lottypes` = {
`int1` = inthash 2
`int2` = fixed '9'
`datetime1` = fixed '1970-01-01 12:00:00'
`text1` = texthash 5
`text2` = fixed null
`blob1` = fixed 'hello'
`blob2` = texthash 5
# `blob3` = fixed '\'hi\''
}
}
When I run
build/main/myanon -f tests/test1.conf < tests/test1.sql
I expect to see see
INSERT INTO `lottypes` VALUES (... ,'hello','migez', ...);
But I actually see
INSERT INTO `lottypes` VALUES (... ,hello,migez, ...);
I tried quoting/escaping (by uncommenting the config line for blob3
) , but I received the error:
Config parsing error at line 14: Syntax error - Unexpected [h]
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.