michael-rapp / apriori Goto Github PK
View Code? Open in Web Editor NEWA Java implementation of the Apriori algorithm for finding frequent item sets and (optionally) generating association rules
License: Apache License 2.0
A Java implementation of the Apriori algorithm for finding frequent item sets and (optionally) generating association rules
License: Apache License 2.0
Hi Michael,
I have been trying to run your program from a different set of ways. Could you possible provide a comprehensive list of step by step commands(which includes pre-reqs such as kotlin) that I can use to execute the program?
This was a tricky bug to find:
My first ugly work-around looks like this:
Collection<Transaction<Article>> customers = getCustomers();
customers.add(null);
Iterator<Transaction<Article>> iterator = customers.iterator();
Output<Article> output = apriori.execute(iterator);
edit: and this work-around doesn't work. Iterators are also reused, see below
Junit test code is:
File inputFile = new File("c:/data1.txt");
double minSupport = 0.1;
double minConfidence = 0.2;
Apriori<NamedItem> apriori = new Apriori.Builder<NamedItem>(minSupport)
.generateRules(minConfidence).create();
Iterator<Transaction<NamedItem>> iterator = new DataIterator(inputFile);
Output<NamedItem> output = apriori.execute(iterator);
RuleSet<NamedItem> ruleSet = output.getRuleSet();
if (ruleSet != null) {
Iterator<AssociationRule<NamedItem>> iteratorItemSet = ruleSet.iterator();
while (iteratorItemSet.hasNext()) {
AssociationRule<NamedItem> itemSet = iteratorItemSet.next();
System.out.println("result ............." + itemSet.toString());
}
} else {
System.out.println("ruleSet is null");
}
data1.txt content is:
bread butter sugar
coffee milk sugar
bread coffee milk sugar
coffee milk
run result is :
ruleSet is null
the support of "coffee milk " is 3/4=0.75, the confidence is 3/3=1,accord with minSupport = 0.1 and minConfidence = 0.2,ruleSet should has "coffee -> milk" at least.
I'm using following dataset as input with an minSupport of 0.5 and a minConfidence of 1.0 (Version 1.3.0):
0 1 2 3
0 1 2 3
0 1 3 4 5
0 1 4
0 1 4
This dataset produces following NullpointerException:
java.lang.NullPointerException at de.mrapp.apriori.modules.FrequentItemSetMinerModule.generateInitialItemSets(FrequentItemSetMinerModule.java:70) at de.mrapp.apriori.modules.FrequentItemSetMinerModule.findFrequentItemSets(FrequentItemSetMinerModule.java:230) at de.mrapp.apriori.tasks.FrequentItemSetMinerTask.findFrequentItemSets(FrequentItemSetMinerTask.java:104) at de.mrapp.apriori.Apriori.execute(Apriori.java:830)
if one of the first two identical entrys is removed, the algorithm works fine:
0 1 2 3
0 1 3 4 5
0 1 4
0 1 4
Hi. I had a some error: Cannot access de.mrapp.util.datastructure.SortedArraySet. Can you help me?
This is my code: double minSupport = 0.5;
Apriori apriori = new Apriori.Builder(minSupport).create();
Iterable<Transaction> iterable = () -> new DataIterator(new File("sad.txt"));
Output output = apriori.execute(iterable);
FrequentItemSets frequentItemSets = output.getRuleSet();
Junit test code is:
File inputFile = new File("c:/data1.txt");
double minSupport = 0.2;
Apriori<NamedItem> apriori = new Apriori.Builder<NamedItem>(minSupport).create();
Iterator<Transaction<NamedItem>> iterator = new DataIterator(inputFile);
Output<NamedItem> output = apriori.execute(iterator);
SortedSet<ItemSet<NamedItem>> frequentItemSets = output.getFrequentItemSets();
System.out.println("frequentItemSets.size():" + frequentItemSets.size());
Iterator<ItemSet<NamedItem>> iteratorItemSet = frequentItemSets.iterator();
while (iteratorItemSet.hasNext()) {
ItemSet<NamedItem> itemSet = iteratorItemSet.next();
System.out.println("result ............."+ itemSet.toString());
}
data1.txt content is:
# Test data for the Apriori algorithm
# One transaction per line, items are separated with whitespaces
bread butter sugar
coffee milk sugar
bread coffee milk sugar
coffee milk
run result is :
result .............[milk]
result .............[coffee, milk, sugar]
result .............[bread, coffee, milk, sugar]
the support of bread is 2/4=0.5;
the support of coffee is 3/4=0.75;
the support of sugar is 3/4=0.75;
the support of milk is 3/4=0.75;
all these are greater than minSupport = 0.2 , but the result only includes "milk"
Junit test code is:
File inputFile = new File("c:/data1.txt");
int frequentItemSetCount=1;
Apriori<NamedItem> apriori = new Apriori.Builder<>(frequentItemSetCount)
.supportDelta(0.1).maxSupport(1.0).minSupport(0.0).create();
Iterator<Transaction<NamedItem>> iterator = new DataIterator(inputFile);
Output<NamedItem> output = apriori.execute(iterator);
SortedSet<ItemSet<NamedItem>> frequentItemSets = output.getFrequentItemSets();
System.out.println("frequentItemSets.size():"+frequentItemSets.size());
Iterator<ItemSet<NamedItem>> iteratorItemSet = frequentItemSets.iterator();
while (iteratorItemSet.hasNext()) {
ItemSet<NamedItem> itemSet = (ItemSet<NamedItem>) iteratorItemSet.next();
System.out.println("result ............."+ itemSet.toString());
}
data1.txt content is:
# Test data for the Apriori algorithm
# One transaction per line, items are separated with whitespaces
bread butter sugar
coffee milk sugar
bread coffee milk sugar
coffee milk
run result is :
frequentItemSets.size():4
result .............[coffee, milk]
result .............[sugar]
result .............[milk]
result .............[coffee]
frequentItemSetCount =1 but frequentItemSets.size()=4
As pointed out in #2, it might be useful to provide the functionality to sort and/or filter the frequent item sets, which are contained by the Apriori algorithm's Output
. This would require to create a custom implementation of the type SortedSet
, which provides sort
-/filter
-methods such as the class RuleSet
does.
It seems that the Condition class is missing. I cannot find it in the jar file or your repository either.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.