Comments (5)
Thanks for your research & report. Unfortunately, we will always keep this dataset as it was originally published in 2017.
since obviously you don't want to review the whole data set
As the creator of Fashion-MNIST, I can share with you the reviewing procedure. I did review the whole dataset before publishing it. What I did: after having an initial label-mapping thanks to the Zalando article database, I wrote a small program to layout 100 images from the same class and picked out those anomalies one by one, manually. This process was repeated until I and my colleagues were satisfied with all training and test images. Tedious procedure, sure, but is it impossible? No.
Mislabelling is by nature existed in all datasets, as you showed in your paper (all datasets). If humans did their best reasonable effort when producing the dataset, then algorithms have to live with it.
Over the years, there have been quite some published papers using Fashion-MNIST dataset as a benchmark, not only for classification but in adversarial learning and many other domains. Fashion-MNIST will stay as it is.
from fashion-mnist.
@mueller91 also kindly have a look at the dataset explorer here: https://observablehq.com/@stwind/exploring-fashion-mnist of all the samples in the dataset where you can explore the images flagged by your algorithm
from fashion-mnist.
@hanxiao thank you for your statement
@kashif that's a great tool, thank you!
from fashion-mnist.
thanks @mueller91 for the insightful paper. Its true that these datasets have mis-labels, as the process that generated the original labels had humans in the loop and errors creep in that way. Other times the classes in questions are very similar and requires some domain expert to disambiguate between them.
Also looking briefly at your paper's table 7 the classes in questions do appear to be those that visually look similar. When you write that "we found a large number of mislabeled / incorrect instances" do you mean from the images that your algorithm flagged, from those a large number of them were mislabeled?
from fashion-mnist.
The algorithm sorts all instances in the training data by desceding likelihood that they are mislabeled. So, it does not assign a binary label, but returns a scalar likelihood and gives you an ordered list of instances to review which will make best use of your time/manpower budget for reviewing (since obviously you don't want to review the whole data set).
Table 7 includes 64 instances we found that way. What we did was the following:
We ran the algorithm, obtained indexes, and looked at the first 180 of them. Within these, 64 were clearly mislabeled, about 35 percent. (This does not include ambiguous instances, i.e. we only list the item in the table when we were fairly certain of a mislabel).
Usually, the rate of 'hits' declines as you go through the instances, so in the first 1000 instances, we'd expect to find less than 35% mislabeled instances in total. The idea is to keep reviewing until your budget /time / manpower is exhausted.
from fashion-mnist.
Related Issues (20)
- why running the benchmark/runner.py very slow? very slow !!!
- Benchmark - Free Pascal Implementation - 94% with ~174K parameters
- I NEED HELP WITH THE GRAPHICS
- Fashion-MNIST converted to png
- Benchmark : CNN-Fashion-MNIST
- Cant find the csv file of the data set HOT 2
- How to create clustering visulization? label:"help wanted"
- optional native extra labels.
- What's the mean and std of FMNIST? HOT 1
- Benchmark: E2E-3M Accuracy Result of 95.92% for Fashion-MNIST
- Interactive Visualization of the dataset
- Loading the dataset from the local path using tensorflow 2.0 HOT 5
- Benchmark: Ensemble 4 CNNs on censored images
- Link to Pytorch Dataset broken HOT 2
- convolution network mean acc achieve 0.9765
- Incremental learning HOT 1
- Direct download links not working HOT 3
- Invalid Tensor size in TF.js
- Image of full body in trainingset HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fashion-mnist.