coderwzw / arlib Goto Github PK
View Code? Open in Web Editor NEWAn open-source framework for conducting data poisoning attacks on recommendation systems, designed to assist researchers and practitioners.
An open-source framework for conducting data poisoning attacks on recommendation systems, designed to assist researchers and practitioners.
Thanks for your repo. It helps me a lot.
I found that you give the item which will be attacked a rating 1.0 in /ARLib/attack/Black/RandomAttack.py.
def posionDataAttack(self):
uNum = self.fakeUserNum
row, col, entries = [], [], []
for i in range(uNum):
# fillerItemid = random.sample(set(range(self.itemNum)) - set(self.targetItem),
# self.maliciousFeedbackNum - len(self.targetItem))
fillerItemid = random.sample(set(range(self.itemNum)) - set(self.targetItem),
self.maliciousFeedbackNum)
row += [i for r in range(len(fillerItemid + self.targetItem))]
col += fillerItemid + self.targetItem
entries += [1 for r in range(len(fillerItemid + self.targetItem))]
fakeRat = csr_matrix((entries, (row, col)), shape=(uNum, self.itemNum), dtype=np.float32)
return vstack([self.interact, fakeRat])
We suppose It means that you give the attacked items at rating = 1.0.
However, I found that you also give all items in raw data at rating = 1.0 in /ARLib/util/DataLoader.py
def __create_sparse_interaction_matrix(self):
"""
return a sparse adjacency matrix with the shape (user number, item number)
"""
row, col, entries = [], [], []
for pair in self.training_data:
row += [self.user[pair[0]]]
col += [self.item[pair[1]]]
entries += [1.0]
interaction_mat = sp.csr_matrix((entries, (row, col)), shape=(self.user_num,self.item_num),dtype=np.float32)
return interaction_mat
Hence, our poison train data became this:
I'm wondering the reason why you copy the raw data and rewrite it to 1.0 in poison data, but not its original rating. For example in clean/train.txt:
In clean/train.txt, there are various rating like 1, 2, 3, 4, 5. But in poison data, there are only 1.0.
Thanks for your work and your patience. Look forward to your response!
Thanks for the efforts and the interesting project. When I try to test PGA and RAPU_G attack on SGL model, I met following errors:
For PGA attack, I met 3 errors:
Line 46 in edc4cff
I tried to change it to s = torch.tensor(self.interact[self.controlledUser, :].toarray()).cuda()
and met the following error.
Line 50 in edc4cff
I tried to change it to ui_adj[self.controlledUser, self.userNum:] = np.array(s.cpu())
and met the following error.
Line 51 in edc4cff
Lines 243 to 245 in edc4cff
Lines 310 to 314 in edc4cff
I tried to add a line after
Lines 243 to 244 in edc4cff
self.sparse_norm_adj = sp.coo_matrix(self.sparse_norm_adj)
but find the training loss called by Line 45 in edc4cff
training: 1 batch 0 rec_loss: nan cl_loss nan
training: 1 batch 100 rec_loss: nan cl_loss nan
evaluating the model...
Progress: [++++++++++++++++++++++++++++++++++++++++++++++++++]100%
Quick Ranking Performance (Top-50 Item Recommendation)
Current Performance
Epoch: 1, Hit Ratio:0.11035349004127042 | Precision:0.014352392065344223 | Recall:0.11574197836148567 | NDCG:0.053679494051791274
Best Performance
Epoch: 1, Hit Ratio:0.11035349004127042 | Precision:0.014352392065344223 | Recall:0.11574197836148567 | NDCG:0.053679494051791274
./util/DataLoader.py:77: RuntimeWarning: divide by zero encountered in power
d_inv = np.power(rowsum, -0.5).flatten()
./util/DataLoader.py:77: RuntimeWarning: divide by zero encountered in power
d_inv = np.power(rowsum, -0.5).flatten()
training: 2 batch 0 rec_loss: nan cl_loss nan
training: 2 batch 100 rec_loss: nan cl_loss nan
evaluating the model...
Progress: [++++++++++++++++++++++++++++++++++++++++++++++++++]100%
Also, I tried FedRecAttack that also call the recommender.model._init_uiAdj()
func. It works well without error. So I guess there is something wrong with PGA code but could not find the root for the problem.
For RAPU_G attack, the "higher" module is not included, which confilcts with
Line 84 in edc4cff
Moreover, I think it would be awsome if you could provide a table of the attack effectiveness of those included attack for reference. In that way, users could have an awareness of whether they are using the repository correctly.
Thanks for the code!
I wonder is there are evaluation on the attack result, for example, the recommendation rate for the target item? I have run the code and did not find any output for it.
Hi, thanks for the efforts and the interesting project. When I try to do some experiment on ML-1m dataset, I found that the size of dataset is not as big as raw data.
After tracing your code, I found in "ARLib/data/clean/ml-1M/split.py", there is a if-else to select rating only bigger than 4.
with open('ratings.dat') as f:
for line in f:
items = line.strip().split('::')
new_line = ' '.join(items[:-1])+'\n'
if int(items[-2])<4:
continue
num=random.random()
if num > 0.2:
train.append(new_line)
elif num > 0.1:
val.append(new_line)
else:
test.append(new_line)
And I'm wondering about why you do that.
Thanks again for collecting these model and attack method, it helps me a lot!!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.