great code and paper thanks only does it suitable for classification framework

does it suitable for classification framework? about seq2pat HOT 7 CLOSED

fidelity commented on June 18, 2024

does it suitable for classification framework?

from seq2pat.

Comments (7)

skadio commented on June 18, 2024

Exactly!

This is the approach presented in our Dichotomic Pattern Mining (DPM) framework, see Frontiers'2022 and AAAI'22

If you look at the Quick Start Example in the Readme, you can see that we can mine for patterns from POSITIVE and NEGATIVE outcomes. The nice thing is you can constraint the POS/NEG mining models independent of each other. Then one can look at frequent patterns that are unique to pos/neg, or in common between pos/neg, or their union. This is exactly what dichotomic_pattern_mining() method returns.

Then, we can take these patterns and "re-encode" the sequences as 1-hot binary encodings denoting whether the sequences exhibit frequent patterns found.

This reverse encoding process of turning sequences into feature vectors is quite complicated actually. That's why we provide the Pattern2Feature() functionality and the get_features() method.

These 0-1 encoded vectors can be used for downstream machine learning tasks for classification, as in your example.

Our papers use this Dichotomic Pattern Mining framework for classification of digital behavior for intent prediction and intruder detection etc.

The cool/novel thing is, one can use standard ML algorithms (e.g., XBOOST) that are not designed to work with sequential data to work with sequences thanks to this transformation. It turns out, this actually works quite well/competitive compared to LSTM, RNNs etc. while remaining "interpretable".

I will add more annotations to Quick Start DPM example and there is also a corresponding DPM notebook that you can use

Hope this helps!

Serdar

from seq2pat.

Sandy4321 commented on June 18, 2024

Great thanks
It would be nice to add xgboost example , since in paper there is no xgboost
From start to end example will help to understand what exactly needs to be done

from seq2pat.

Sandy4321 commented on June 18, 2024

For example
In real data
Sequences may be very long
Let's say 1000 tokens
Then some limitation needed to prevent to be in the same pattern building process
tokens located very far each from another
For example
In one given sequence
If token a located very far from token b
Then these a and b should not be influencing on pattern calculations...

from seq2pat.

Sandy4321 commented on June 18, 2024

Then one can look at frequent patterns that are unique to pos/neg,

BUT lets say pattern ABC
Existing 1234 times in positive data
Existing 5 times in negative data

Then why to eliminate this ABC from classifier tuning?

from seq2pat.

takojunior commented on June 18, 2024

For example In real data Sequences may be very long Let's say 1000 tokens Then some limitation needed to prevent to be in the same pattern building process tokens located very far each from another For example In one given sequence If token a located very far from token b Then these a and b should not be influencing on pattern calculations...

Yes the aforementioned limitation can be applied by using a maximum span constraint on the indices of items in a pattern, for which we have added as a default constraint and users can set max_span to control the limit of span when initializing seq2pat, see this line. I think on the other way around, user may also use minimum span to explore patterns particularly with larger span in between.

from seq2pat.

takojunior commented on June 18, 2024

Then one can look at frequent patterns that are unique to pos/neg,

BUT lets say pattern ABC Existing 1234 times in positive data Existing 5 times in negative data

Then why to eliminate this ABC from classifier tuning?

Thanks @Sandy4321 for the question. This is basically hinting that the ABC pattern may neither be frequent in positive or negative, but the occurrences are quite different, then there might still be a value to include the pattern in classifier? This might need some further analysis when we compare the patterns between pos/neg groups. I agree that an enough significant difference between the pattern's occurrences while comparing two groups might have a contribution to classifier, but if they are not frequent enough, then the process would also suffer from too many arbitrary patterns and/or it tries too hard to capture too much noises, which we would also want to avoid.

from seq2pat.

Sandy4321 commented on June 18, 2024

Sure
Thanks

from seq2pat.

does it suitable for classification framework? about seq2pat HOT 7 CLOSED

Comments (7)

Related Issues (17)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent