krishnapg / sofia-ml Goto Github PK
View Code? Open in Web Editor NEWAutomatically exported from code.google.com/p/sofia-ml
Automatically exported from code.google.com/p/sofia-ml
What steps will reproduce the problem?
1. Grab source from SVN.
2. cd cluster-src/
3. make all_test
What is the expected output? What do you see instead?
Test fails with:
sf-kmeans-methods_test: sf-kmeans-methods_test.cc:50: int main(int, char**):
Assertion `cluster_centers_3->ClusterCenter(0).ValueOf(1) == 1.0' failed.
Adding some debug just before the assert failure resulting in:
cluster_centers_3->ClusterCenter(0).ValueOf(1) : 0 (should be 1.0)
What version of the product are you using? On what operating system?
SVN version:
r25 | [email protected] | 2010-04-28 04:52:54 +1000 (Wed, 28 Apr 2010) | 1 line
Running on x86_64 linux with gcc version 4.7.2 (Debian 4.7.2-5).
Please provide any additional information below.
Original issue reported on code.google.com by [email protected]
on 17 Feb 2013 at 12:28
What steps will reproduce the problem?
Follow the instructions in the README "Quick Start" section and on (on Ubuntu
14.04)
1. svn checkout http://sofia-ml.googlecode.com/svn/trunk/ sofia-ml-read-only
2. cd sofia-ml-read-only/src
3. make clean
4. make all_test
What is the expected output? What do you see instead?
Expecting success in all tests
Seeing instead:
Test is failing immediately with:
g++ -O3 -lm -Wall -o sf-sparse-vector_test sf-sparse-vector_test.cc
sf-sparse-vector.cc
./sf-sparse-vector_test
sf-sparse-vector_test: sf-sparse-vector_test.cc:27: int main(int, char**):
Assertion `x1.GetGroupId() == "2"' failed.
make: *** [sf-sparse-vector_test] Aborted (core dumped)
make: *** Deleting file `sf-sparse-vector_test'
What version of the product are you using? On what operating system?
Latest source:
r31 | [email protected] | 2010-07-26 14:17:11 -0700 (Mon, 26 Jul 2010) | 1 line
On Ubuntu 14.04 (LTS)
Please provide any additional information below.
Original issue reported on code.google.com by [email protected]
on 4 May 2015 at 9:22
What steps will reproduce the problem?
1. make all_tests
What is the expected output?
PASS.
What do you see instead?
sf-weight-vector_test: sf-weight-vector_test.cc:95: int main(int, char**):
Assertion `w_6.ValueOf(3) == 1' failed.
What version of the product are you using? On what operating system?
Latest sophia-ml from svn, Debian 5, GCC 4.3.2.
Original issue reported on code.google.com by [email protected]
on 14 Feb 2010 at 3:22
Hello D.
I've started to work on the multi-label branch. I have made the following
changes:
- Parse comma-separated list of labels.
- Add a MultiplePassOuterLoop routine: it shuffles the dataset and makes
several passes over it. It's more intuitive to determine a number of passes and
results can sometimes be more stable on some datasets.
- Add a MultiLabelWeightVector. It is compatible with other weight classes
(both API-wise and file-wise). It also has a bunch of additional methods such
as "SelectLabel".
- Add Multi-Label Passive-Aggressive. Strictly speaking, the learner optimizes
a label ranking (relevant labels should be more ranked higher than irrelevant
labels). On the 20 newsgroup dataset, it gives 82% accuracy (liblinear gave
85%). (I didn't optimize the hyperparameters though).
- Add a "--prediction_type multi-label" option.
- Infer the number of dimensions from the training dataset when --dimensioality
is set to 0.
I wanted to add one-vs-all but unfortunately, the fact that the labels are
attached to the vectors makes it hard (or inefficient): I need to be able to
pass +1 or -1 instead of the real label to the update function.
Possible short-term plans could include optimizing the multi-class hinge loss
and the multinomial logistic loss by SGD.
Original issue reported on code.google.com by [email protected]
on 28 Apr 2011 at 8:38
Hi,
I have changed my training data into sparse data format you mentioned.
./sofia-kmeans --k 1000 --init_type random --opt_type batch_kmeans --iterations
1000 --objective_after_init --training_file demo/SMLFAutoTrain1s512val.txt
--model_out demo/CSMLFAutoTrain1s512val.txt
However, I am getting the following errors:
Reading data from: demo/SMLFAutoTrain1s512val.txt
Error reading file demo/SMLFAutoTrain1s512val.txt
I opened your demo.train, I saw that you have square box at the end of every
vector. How can I changed my data format to yours since the square box at the
end may not be the only one? I tried to fetch your demo.train file in matlab,
and it doesn't let me do that either.
For the example of kmeans:
> ./sofia-kmeans --k 5 --init_type random --opt_type mini_batch_kmeans
--mini_batch_size 100 --iterations 500 --objective_after_init
--objective_after_training --training_file demo/demo.train --model_out
demo/clusters.txt
the above command will return the five centroid location, right?
In this case, since only producing the 5 cluster center location, the class
label in the training data (demo.train) can be assigned with any values, right?
Of course, I chose, say, all 1 among these values: 1,0,-1.
I look forward to your clarification.
Thank you,
Fred
Original issue reported on code.google.com by [email protected]
on 23 Sep 2011 at 3:56
Attachments:
I noticed in the demo that all the features have a value of "1". Does sofia-ml
support and/or make use of higher integer values (like for # of times a word is
seen in a document) or for floating point numbers?
Original issue reported on code.google.com by [email protected]
on 25 Feb 2013 at 4:34
There is a problem with the source code. Many files forget to include standard
libraries, and some of the assertions in the Unit tests fail.
What steps will reproduce the problem?
1. Follow the instructions on https://code.google.com/p/sofia-ml/
2. Run make all_test in src/
What is the expected output? What do you see instead?
I see lots of compile time errors.
What version of the product are you using? On what operating system?
Ubuntu 14.04, G++ 4.7, sofia-ml
Please provide any additional information below.
The following updates fixed everything for me:
sf-sparse-vector_test.cc
l27 //assert(x1.GetGroupId() == "2");
l75 //assert(x6.GetGroupId() == "3");
simple-cmd-line-helper.h
l68 #include <cstdlib>
l69 #include <stdio.h>
sofia-ml-methods_test.cc
l19 #include <cstdlib>
Original issue reported on code.google.com by [email protected]
on 16 Jul 2014 at 4:55
What steps will reproduce the problem?
cd "sofia-ml-read-only"
./sofia-kmeans --k 100 --init_type random --opt_type mini_batch_kmeans
--mini_batch_size 100 --iterations 1000 --cluster_mapping_type rbf_kernel
--test_file <test file location goes here> --cluster_mapping_out <cluster
mapping output location goes here>
What is the expected output? What do you see instead?
The expected output is a cluster mapping text file. Instead, I see:
cd "sofia-ml-read-only"
sofia-kmeans: sf-cluster-centers.cc:93: float
SfClusterCenters::SqDistanceToClosestCenter(const SfSparseVector&, int*) const:
Assertion `!cluster_centers_.empty()' failed.
What version of the product are you using? On what operating system?
I don't know where to find the product version. The most recent version is the
one I have been using.
Operating system: Ubuntu 12.04.5 LTS
Please provide any additional information below.
N/A
Original issue reported on code.google.com by [email protected]
on 23 Sep 2014 at 6:46
This effects large ids.
The issue is in cluster-src/sofia-kmeans.cc
The solution diff is:
345c345
< << test_data->VectorAt(i).GetY() << std::endl;
---
> << (int)test_data->VectorAt(i).GetY() << std::endl;
Original issue reported on code.google.com by [email protected]
on 6 Mar 2013 at 2:37
Download & build, run the demo commands adding --hash_mask_bits to the
arguments. Training proceeds fine, but testing of the model gives the malloc
error:
$ ./sofia-ml --learner_type pegasos --loop_type stochastic --lambda 0.1
--iterations 100000 --dimensionality 150000 --training_file demo/demo.train
--model_out demo/model --hash_mask_bits 8
hash_mask_ 255
Reading training data from: demo/demo.train
Time to read training data: 0.061278
Time to complete training: 52.3639
Writing model to: demo/model
Done.
$ ./sofia-ml --model_in demo/model --test_file demo/demo.train --results_file
demo/results.txt --hash_mask_bits 8
hash_mask_ 255
sofia-ml(6235) malloc: *** error for object 0x800000: pointer being freed was
not allocated
*** set a breakpoint in malloc_error_break to debug
Reading model from: demo/model
Done.
Reading test data from: demo/demo.train
Time to read test data: 0.06114
Time to make test prediction results: 0.008274
Writing test results to: demo/results.txt
Done.
========
$ g++ --version
i686-apple-darwin10-g++-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5659)
Original issue reported on code.google.com by [email protected]
on 18 Jun 2010 at 6:43
Make all_test, then find an error occured white testing sf-sparse-vector_test,
assertion assert(x1.GetGroupId() == "2"); failed at line 27 of file
sf-sparse-vector_test.cc.
Solution. Add a line "group_id_c_string[end - position]=0;" in
sf-sparse-vector.cc line 145. cause string generated by strncpy is not always
'\0' terminated.
Original issue reported on code.google.com by [email protected]
on 5 Nov 2012 at 4:22
For the k-means training, does label (in my case, face label) have an influence
on the clustering?
Original issue reported on code.google.com by [email protected]
on 24 Jun 2013 at 1:01
in sofia-ml.cc
337 float objective = sofia_ml::SvmObjective(training_data,
338 *w,
339 CMD_LINE_BOOLS["--lambda"]);
Note that lambda is passed in from CMD_LINE_BOOLS not CMD_LINE_FLOATS which
results in lambda=0. In TrainModel the correct value of lambda is used:
176 float lambda = CMD_LINE_FLOATS["--lambda"];
Original issue reported on code.google.com by [email protected]
on 9 May 2013 at 1:20
Is there any example in sofia-ml for multilabel classification?
Original issue reported on code.google.com by [email protected]
on 20 Jan 2015 at 5:34
What steps will reproduce the problem?
1. run make with gcc version 4.4.3 20100127 (Red Hat 4.4.3-4) (GCC)
What is the expected output? What do you see instead?
a proper build
What version of the product are you using? On what operating system?
trunk on 2010-03-30 15:52
Please provide any additional information below.
gcc output:
:sofia-ml-read-only/src$ make
g++ -O3 -lm -Wall -o sofia-ml sofia-ml.cc sofia-ml-methods.cc
sf-weight-vector.cc sf-sparse-vector.cc sf-data-set.cc
sf-hash-weight-vector.cc sf-hash-inline.cc
sf-sparse-vector.cc: In member function âvoid SfSparseVector::Init(const
char*)â:
sf-sparse-vector.cc:132: error: âsscanfâ was not declared in this scope
sf-hash-weight-vector.cc: In constructor
âSfHashWeightVector::SfHashWeightVector(int)â:
sf-hash-weight-vector.cc:40: error: âexitâ was not declared in this scope
sf-hash-weight-vector.cc: In constructor
âSfHashWeightVector::SfHashWeightVector(int, const std::string&)â:
sf-hash-weight-vector.cc:54: error: âexitâ was not declared in this scope
sf-hash-weight-vector.cc: In member function âvirtual void
SfHashWeightVector::AddVector(const SfSparseVector&, float)â:
sf-hash-weight-vector.cc:96: error: âexitâ was not declared in this scope
sf-hash-weight-vector.cc:111: error: âexitâ was not declared in this scope
make: *** [sofia-ml] Error 1
Original issue reported on code.google.com by [email protected]
on 30 Mar 2010 at 1:55
What steps will reproduce the problem?
1. Create this training file:
======= train.txt =======
1 1:1 2:.1 3:.1 200:1
1 1:1.2 2:.01 3:.01 200:1
1 1:3 2:.2 3:.41 200:1
-1 3:4 200:1
-1 2:3 200:1
-1 1:.1 2:3 3:2 200:1
====================
2. ./sofia-ml-read-only/sofia-ml --learner_type pegasos --loop_type stochastic
--lambda 0.1 --iterations 100000 --dimensionality 200 --training_file train.txt
--model_out debug-model.txt
3. debug-model.txt has:
-5.01486 -0.169397 -10.0628 -10.0518 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0\
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
The the model should spit out 201 terms, the first being the bias term. Instead
it spits out 200, and clips off the last weight. When I set dimensionality to
201, I get what I would expect:
0.263645 0.561799 -0.509116 -0.382012 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 \
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 \
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0.263645
This was compiled from source a couple weeks ago. The program should probably
crash if you say dimensionality is 200 and there is a "200:x" term in the
sparse vector representation, unless the no-bias flag is set.
Original issue reported on code.google.com by [email protected]
on 26 Feb 2013 at 3:24
What steps will reproduce the problem?
1. make src folder with gcc version 4.3
Adding ...
#include <cstring>
#include <cstdlib>
to the top of sf-sparse-vector.cc file fixed this problem for me.
Go Jumbos.
Original issue reported on code.google.com by [email protected]
on 24 Jan 2010 at 11:42
Hi there.
What steps will reproduce the problem?
./sofia-ml --learner_type pegasos --loop_type stochastic --lambda 0.1
--iterations 10000 --dimensionality 450000 --training_file ../data/m256
--model_out demo/model
What is the expected output?
What do you see instead?
Reading training data from: ../data/final/catted/train/m256
Segmentation fault (core dumped)
What version of the product are you using? On what operating system?
Ubuntu 13.10 64bit
Please provide any additional information below.
I guess it is because my training data (attached) is so sparse that in some
lines all features are zero. Can sofia-ml support such dataset? Thank you!
Original issue reported on code.google.com by [email protected]
on 18 Mar 2014 at 3:34
Attachments:
What steps will reproduce the problem?
1. Create 2-dimensional data drawn from 2-dim multivariate Gaussian
distributions with different means variance = 1. e.g 21 different
distributions, lets say 1000 draws. Total at 21.000 points. (have tried many
different variations and does not have any positive effect on the reported
issue)
2. Train sofia-kmeans with any batch size (tested 500:500:5000) and with any
number of k clusters (tested 64 128 256) using mini_batch_kmeans with fixed
random seed.
command line: sofia-kmeans --k 64 --dimensionality 3 --random_seed 124
--init_type random --opt_type mini_batch_kmeans --mini_batch_size 500
--iterations 10 --objective_after_init --objective_after_training
--training_file traindatafile.svmlight --model_out modelfile.sofia
3. Calculate the training error
command line: sofia-kmeans --model_in modelfile.sofia --test_file
traindatafile.svmlight --objective_on_test --cluster_assignments_out
trainingassignments.sofia
4. run this in a loop as a function of number of iterations. i ran [1 10 100e3
500e3 and 1000e3]
What is the expected output? What do you see instead?
I expect that the training error would fall as a function of number of
iterations used. Since it has fixed seed the random initialization is the same.
This occurs until 100e3 then it start to diverge. i.e. the training error
starts increasing dramatically. The training error becomes even larger than the
random initialization. This is very puzzling to me.
What version of the product are you using? On what operating system?
svn checkout http://sofia-ml.googlecode.com/svn/trunk/sofia-ml
sofia-ml-read-only
performed 10/3-2015
OS: Ubuntu 14.04
Please provide any additional information below.
Attached is the commands and output from sofia-kmeans (sofia_kmeans.txt) and
furthermore all model, assignment and datafiles are provided to reproduce these
finding (tmp.zip)
Original issue reported on code.google.com by [email protected]
on 11 Mar 2015 at 12:36
Attachments:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.