nuclearstar / k-anonymity Goto Github PK

View Code? Open in Web Editor NEW

148.0 148.0 73.0 1017 KB

Anonymization methods for network security.

License: MIT License

Jupyter Notebook 96.92% Python 3.08%

jupyter-notebook k-anonymity l-diversity python3 t-closeness

k-anonymity's People

Contributors

Stargazers

Watchers

Forkers

rohan2821999 jisangyu rohithavarada tv-ai visakhunnikrishnan reroes ryokugyu hrngok cryptosl apogiatzis sreenivas123456 shukritech ashayustech hahahannes muntazir110 olivier-lakoue lauradgnn firmai-research mazenode liboyuty singh31 kwon-jh wcbdata ccrxf elis0m dukecheung callmax xiaojunzaizai ishu8 xx-fighting ccastel38 harshithrs ragot005 abbasrzaidi alexplinio floatmaster amanrajputsingh dexin-zhang scli-csrg mboumireille nebulaf lpatel29 chamikara1986 hamedmx nquint16 chsafouane eviangel pooja-telavane frank1543179 red-gentlmen purplesmurf45 ricciardi kuisatz econds devonej manojmaurya007 ascidian-ai meshob2002 firequeen-3010 adisaputra10 kumaravinash44 simply-divine nokia-a6 joaolira-br chuckdud mavenor srinivasgonu eksi13 ak216punia jiashunzhang

k-anonymity's Issues

K anonymous

Please add a license and reference to original code

Hiya! I just came across this looking for k-anonymity implementations, I saw that you used our notebook from the EuroPython 2018 (https://github.com/KIProtect/data-privacy-for-data-scientists). It would be nice if you could add a backlink to it and put the proper license in your code (our code was licensed under MIT). Thank you! If you have questions regarding the implementation I'm happy to help.

error when running cell 777

AttributeError Traceback (most recent call last)
in
----> 1 dfn = build_anonymized_dataset(df, finished_partitions, feature_columns, sensitive_column)

in build_anonymized_dataset(df, partitions, feature_columns, sensitive_column, max_partitions)
14 grouped_columns = df.loc[partition].agg(aggregations, squeeze=False)
15 sensitive_counts = df.loc[partition].groupby(sensitive_column).agg({sensitive_column : 'count'})
---> 16 values = grouped_columns.iloc[0].to_dict()
17 for sensitive_value, count in sensitive_counts[sensitive_column].items():
18 if count == 0:

AttributeError: 'list' object has no attribute 'to_dict'

Tried converting this python code to pyspark

I tried converting this python code to pyspark code. I am running the same dataset with pyspark code in AWS EMR cluster.
For 200 records it was taking 9 minutes of time. For the 30,000 records it was taking 22.5 hours of time. Is there any way to optimise the code? Please help me.
Thanks in Advance.

erreur résolu

replace line 173 with : values = {'age' : grouped_columns[0], 'education-num' : grouped_columns[0]}
now we no longer have the error list object has no attribute 'to_dict()'

then while printing your base you get the error " unhashable type: 'list' "
you just need to replace all the prints of your k-anonyme,l-diverse and t-close databases that are named dfn , dfl and dft by print(dfn.head())...

then the code works perfectly

issue in cell 776, missing group by of feature columns

    grouped_columns = df.loc[partition].agg(aggregations, squeeze=False)

nuclearstar / k-anonymity Goto Github PK

k-anonymity's People

Contributors

Stargazers

Watchers

Forkers

k-anonymity's Issues

K anonymous

Please add a license and reference to original code

error when running cell 777

Tried converting this python code to pyspark

erreur résolu

issue in cell 776, missing group by of feature columns

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent