coderwzw / arlib Goto Github PK

An open-source framework for conducting data poisoning attacks on recommendation systems, designed to assist researchers and practitioners.

Python 100.00%

arlib's People

Contributors

Stargazers

Watchers

Forkers

m1234543 leisangcs tce chengleileilei qq-zhou nicey00 ahuizhao zhengzhw2023

arlib's Issues

Some question about random attack

Thanks for your repo. It helps me a lot.
I found that you give the item which will be attacked a rating 1.0 in /ARLib/attack/Black/RandomAttack.py.

def posionDataAttack(self):
        uNum = self.fakeUserNum
        row, col, entries = [], [], []
        for i in range(uNum):
            # fillerItemid = random.sample(set(range(self.itemNum)) - set(self.targetItem),
            #                              self.maliciousFeedbackNum - len(self.targetItem))
            fillerItemid = random.sample(set(range(self.itemNum)) - set(self.targetItem),
                                self.maliciousFeedbackNum)
            row += [i for r in range(len(fillerItemid + self.targetItem))]
            col += fillerItemid + self.targetItem
            entries += [1 for r in range(len(fillerItemid + self.targetItem))]
        fakeRat = csr_matrix((entries, (row, col)), shape=(uNum, self.itemNum), dtype=np.float32)
        return vstack([self.interact, fakeRat])

We suppose It means that you give the attacked items at rating = 1.0.

However, I found that you also give all items in raw data at rating = 1.0 in /ARLib/util/DataLoader.py

def __create_sparse_interaction_matrix(self):
        """
        return a sparse adjacency matrix with the shape (user number, item number)
        """
        row, col, entries = [], [], []
        for pair in self.training_data:
            row += [self.user[pair[0]]]
            col += [self.item[pair[1]]]
            entries += [1.0]
        interaction_mat = sp.csr_matrix((entries, (row, col)), shape=(self.user_num,self.item_num),dtype=np.float32)
        return interaction_mat

Hence, our poison train data became this:

I'm wondering the reason why you copy the raw data and rewrite it to 1.0 in poison data, but not its original rating. For example in clean/train.txt:

In clean/train.txt, there are various rating like 1, 2, 3, 4, 5. But in poison data, there are only 1.0.
Thanks for your work and your patience. Look forward to your response!

Errors with PGA and RAPU_G attack

Thanks for the efforts and the interesting project. When I try to test PGA and RAPU_G attack on SGL model, I met following errors:

For PGA attack, I met 3 errors:

Error: TypeError: sparse matrix length is ambiguous; use getnnz() or shape[0].

ARLib/attack/PGA.py

Line 46 in edc4cff

s = torch.tensor(self.interact[self.controlledUser, :]).cuda()

I tried to change it to s = torch.tensor(self.interact[self.controlledUser, :].toarray()).cuda() and met the following error.

Error: ValueError: could not broadcast input array from shape (9,1412) into shape (942,1412)

ARLib/attack/PGA.py

Line 50 in edc4cff

ui_adj[:self.userNum, self.userNum:] = np.array(s.cpu())

I tried to change it to ui_adj[self.controlledUser, self.userNum:] = np.array(s.cpu()) and met the following error.

Error: File "./attack/PGA.py", line 51, in posionDataAttack
recommender.model._init_uiAdj(ui_adj + ui_adj.T)
File "./recommend/SGL.py", line 245, in _init_uiAdj
self.sparse_norm_adj = TorchGraphInterface.convert_sparse_mat_to_tensor(self.sparse_norm_adj).cuda()
File "./recommend/SGL.py", line 334, in convert_sparse_mat_to_tensor
coo = X.tocoo()
AttributeError: 'numpy.ndarray' object has no attribute 'tocoo'

ARLib/attack/PGA.py

Line 51 in edc4cff

recommender.model._init_uiAdj(ui_adj + ui_adj.T)

ARLib/recommend/SGL.py

Lines 243 to 245 in edc4cff

    
           self.sparse_norm_adj = sp.diags(np.array((1 / np.sqrt(ui_adj.sum(1)))).flatten()) @ ui_adj @ sp.diags( 
        
               np.array((1 / np.sqrt(ui_adj.sum(0)))).flatten()) 
        
           self.sparse_norm_adj = TorchGraphInterface.convert_sparse_mat_to_tensor(self.sparse_norm_adj).cuda()

ARLib/recommend/SGL.py

Lines 310 to 314 in edc4cff

    
           def convert_sparse_mat_to_tensor(X): 
        
               coo = X.tocoo() 
        
               i = torch.LongTensor([coo.row, coo.col]) 
        
               v = torch.from_numpy(coo.data).float() 
        
               return torch.sparse.FloatTensor(i, v, coo.shape)

I tried to add a line after

ARLib/recommend/SGL.py

Lines 243 to 244 in edc4cff

    
           self.sparse_norm_adj = sp.diags(np.array((1 / np.sqrt(ui_adj.sum(1)))).flatten()) @ ui_adj @ sp.diags( 
        
               np.array((1 / np.sqrt(ui_adj.sum(0)))).flatten())

self.sparse_norm_adj = sp.coo_matrix(self.sparse_norm_adj) but find the training loss called by

ARLib/attack/PGA.py

Line 45 in edc4cff

grad = recommender.train(requires_adjgrad=True, Epoch=self.attackEpoch)

become nan:

training: 1 batch 0 rec_loss: nan cl_loss nan
training: 1 batch 100 rec_loss: nan cl_loss nan
evaluating the model...
Progress: [++++++++++++++++++++++++++++++++++++++++++++++++++]100%
Quick Ranking Performance (Top-50 Item Recommendation)
Current Performance
Epoch: 1, Hit Ratio:0.11035349004127042 | Precision:0.014352392065344223 | Recall:0.11574197836148567 | NDCG:0.053679494051791274
Best Performance
Epoch: 1, Hit Ratio:0.11035349004127042 | Precision:0.014352392065344223 | Recall:0.11574197836148567 | NDCG:0.053679494051791274
./util/DataLoader.py:77: RuntimeWarning: divide by zero encountered in power
d_inv = np.power(rowsum, -0.5).flatten()
./util/DataLoader.py:77: RuntimeWarning: divide by zero encountered in power
d_inv = np.power(rowsum, -0.5).flatten()
training: 2 batch 0 rec_loss: nan cl_loss nan
training: 2 batch 100 rec_loss: nan cl_loss nan
evaluating the model...
Progress: [++++++++++++++++++++++++++++++++++++++++++++++++++]100%

Also, I tried FedRecAttack that also call the recommender.model._init_uiAdj() func. It works well without error. So I guess there is something wrong with PGA code but could not find the root for the problem.

For RAPU_G attack, the "higher" module is not included, which confilcts with

ARLib/attack/RAPU_G.py

Line 84 in edc4cff

import higher

Moreover, I think it would be awsome if you could provide a table of the attack effectiveness of those included attack for reference. In that way, users could have an awareness of whether they are using the repository correctly.

Missing of the poisoning attack evaluation?

Thanks for the code!

I wonder is there are evaluation on the attack result, for example, the recommendation rate for the target item? I have run the code and did not find any output for it.

Wondering the reason why filter the data with score >=4 only in ml-1m

Hi, thanks for the efforts and the interesting project. When I try to do some experiment on ML-1m dataset, I found that the size of dataset is not as big as raw data.

After tracing your code, I found in "ARLib/data/clean/ml-1M/split.py", there is a if-else to select rating only bigger than 4.

with open('ratings.dat') as f:
    for line in f:
        items = line.strip().split('::')
        new_line = ' '.join(items[:-1])+'\n'
        if int(items[-2])<4:
            continue
        num=random.random()
        if num > 0.2:
            train.append(new_line)
        elif num > 0.1:
            val.append(new_line)
        else:
            test.append(new_line)

And I'm wondering about why you do that.
Thanks again for collecting these model and attack method, it helps me a lot!!

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

	self.sparse_norm_adj = sp.diags(np.array((1 / np.sqrt(ui_adj.sum(1)))).flatten()) @ ui_adj @ sp.diags(
	np.array((1 / np.sqrt(ui_adj.sum(0)))).flatten())
	self.sparse_norm_adj = TorchGraphInterface.convert_sparse_mat_to_tensor(self.sparse_norm_adj).cuda()

	def convert_sparse_mat_to_tensor(X):
	coo = X.tocoo()
	i = torch.LongTensor([coo.row, coo.col])
	v = torch.from_numpy(coo.data).float()
	return torch.sparse.FloatTensor(i, v, coo.shape)