Light

xmaster6y / mulsi Goto Github PK

View Code? Open in Web Editor NEW

6.0 1.0 1.0 291 KB

:lock_with_ink_pen: Safe inference using representation engineering.

License: MIT License

Makefile 0.65% Python 99.26% Shell 0.09%

mulsi's Introduction

MUltimodal Llm Safe Inference

Safe inference using representation engineering.

mulsi's People

Contributors

Stargazers

Watchers

Forkers

sourcery-ai-experiments

mulsi's Issues

[EXPERIMENT] Detect concepts using images

Description

Detect concepts activated by the attack using images instead of text when building contrast vectors.

Acceptance Criterion

The code is documented
Utilities and class test are written
The code was reviewed

Tasks

Select concepts to work on.
Gather a set of images for each selected concept.
Build the contrast vectors.

Extend concept reading with datasets

Description

Extend concept reading with text datasets.

Acceptance Criterion

The code is documented
Utilities and class test are written
The code was reviewed

Tasks

Collect datasets.
Compute dataset representation with class Representation.
Extend script to evaluate on a dataset.

[EXPERIMENT] Expand the list of model to evaluate

Description

Expand the list of models to attack and evaluate the detection method on. The idea is to verify the effectiveness of the detection method.
And if possible, see if the some patterns appear when different models are attacked by different methods.

Acceptance Criterion

The code is documented
Utilities and class test are written
The code was reviewed

Tasks

List models and build processors.
Attack the models and perform detection.
Analyze results.

Attack Scripts

Description

Make some scripts and modules to easily reproduce attacks.

Acceptance Criterion

The code is documented
Utilities and tests are written
The code was reviewed

Tasks

Create a basic FGSM attack
Create a script to scale the attack

Contrast Reading Script

Description

Make some scripts and modules to easily scale reading.

Acceptance Criterion

The code is documented
Utilities and tests are written
The code was reviewed

Tasks

Create a basic contrast reading
Create a script to scale the reading

Add BERT-Attack on text modality

Description

Attacking the text modality with BERT-Attack. The attack was initially developed on BERT, thus it should be adapted to the targeted LLM.

Acceptance Criterion

The code is documented
Utilities and class test are written
The code was reviewed

Tasks

Develop the attack.
Test script for the attack.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.