Code Monkey home page Code Monkey logo

Comments (6)

theprestigedog avatar theprestigedog commented on August 23, 2024 1

Brilliant, cheers Jerome. This sorts our use case out perfectly.

Cracking Python package you've put together here.

from pynonymizer.

rwnx avatar rwnx commented on August 23, 2024

To expand on this a little, I'd like to understand the problem being solved here - pynonymizer uses faker to generate the seed data for anonymization, but i'd say the exact way it behaves (and it's subsequent binding to faker) was more of an implementation detail.

To be clear though, i support the case for adding more custom-provided generators - adding support for 3rd party faker providers raises implementation questions like:

  • will this be only published, packaged faker providers? e.g. https://pypi.org/project/faker-microservice/
  • if it is packaged providers, how do we depend upon the packages (peer-dependencies or sub-dependencies, etc)
  • Alternatively, if the user specifies them, do they do it (at the cli or in the strategy file?)

Faker's CLI docs seem to indicate the following usage: https://github.com/joke2k/faker/#command-line-usage

-i {my.custom_provider other.custom_provider} list of additional custom providers to use. Note that is the import path of the package containing your Provider class, not the custom Provider class itself.

Now i'm not against that specifically, but i'd be happier knowing that we were adding a feature that would enable extensibility as a general feature and wouldn't need to track specifically with faker, but i'm curious to hear your thoughts here!

from pynonymizer.

theprestigedog avatar theprestigedog commented on August 23, 2024

Thanks for the response Jerome.

The problem we're trying to solve is that we have a large database with some non-traditional data located inside a single column. The format of the data looks something like this:

Firstname Lastname <[email protected]>

From what I've seen from the pynonymizer package, there doesn't seem to be a way to filter this column using the strategy.yml to provide randomised rows based on this format. I could well be wrong here though.

If that's the case, it seems the easiest way for me to achieve this might be to create a specialised Faker provider that I could hopefully point pynonymizer toward and be able to utilise it via a specific fake_type in the strategy.yml.

Happy for you to tell me there's a much easier way here though!

from pynonymizer.

rwnx avatar rwnx commented on August 23, 2024

There was some relevant discussion in #62 about combined fields from faker fields and data consistency. It doesn't look like it'll fit with this use case but it might help indicate where we're at.

As far as dirty workarounds go you could reference data in the seed table in a literal i.e. by concatenating stuff together. But it's gross and definitely brittle. 😅

I agree that the ability to use custom generators might be answer here, it'll also bake in a code interface to custom data formats, and think that can only be a good thing. I'll take a look at this as a feature and update here.

from pynonymizer.

rwnx avatar rwnx commented on August 23, 2024

I've added this feature in #75 which should release in 1.21.0

from pynonymizer.

rwnx avatar rwnx commented on August 23, 2024

OK so this is out with v1.21.0! You can check out the docs to see what the expected usage is, but take a look at the mysql integration test also, since I've based this on the usecase here:

a custom provider: https://github.com/jerometwell/pynonymizer/blob/master/tests_integration/mysql/custom_provider.py
referencing that provider in the strategyfile: https://github.com/jerometwell/pynonymizer/blob/master/tests_integration/mysql/sakila.yml

If you could review and close the issue if it's resolved, otherwise we can continue the discussion here 😇

from pynonymizer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.