renmengye / few-shot-ssl-public Goto Github PK

View Code? Open in Web Editor NEW

547.0 547.0 99.0 1.49 MB

Meta Learning for Semi-Supervised Few-Shot Classification

License: Other

Python 100.00%

few-shot-ssl-public's People

Contributors

Stargazers

Watchers

Forkers

shubhampachori12110095 jdc08161063 zhaoyang1708 amoliu codeaudit blackyang dongzhuoyao wshenx rickppd ml-lab huangtao00 haiminzhang ahmedayad89 codes-kzhan afcarl parsonszeng aturkelson kelvinson hyunghunny shujian2015 shogo-a junxiongguan wangxueliangustc hyzcn csyanbin empramses nhonth ginobilinie freor suvarnak elenitriantafillou caihengyu520 yaoyao-liu jjwangnlp lyzl2010 zhangzp517 xjtushujun tailuoaoteman felixshiyong jumperwang mattochal shaoguangji arifmudi sty-yyj coolsunxu xiangyi1996 megayeye rebornczs yuanmengzhixing hujilin1229 sean0719 ifif-s mathematicalmodels yll1997 licj1 pancakeawesome brycexu godencrystal yuanwanglll panxipeng zialiu leozhang97 carmete lurkingllearning yuchenlichuck jeffgan99 kevluo puzzlebird zhfeing fu-zhou-2019 nidohsp wanli-19 qinzhengmei wb11uo legitqx yh-zhao96 victorb00 jaihonikhil ashok-arjun cpaulzyf trinh-hoang-hiep sdivakarbhat cslele hujh08 jimmyiskandar baojiazhong myhan1996 future-superstar yunkai696 xxandrewoxx lovemiki liutong19991016 mingquzheye zhuzhutingru123 banalasaritha iq-scm standardgalactic cll308 sameer-13

few-shot-ssl-public's Issues

How can i use "DATASET_REGISTRY" and "CONFIG_REGISTRY" in the code for this paper "META-LEARNING FOR SEMI-SUPERVISED FEW-SHOT CLASSIFICATION"?

Unpickling is not successful

Hi,

Thank you for the paper and sharing the code with community.

When running 'run_exp.py' on mini-imagenet, I get the following error and have not been able to resolve it:
No such file or directory: 'data/mini-imagenet/images/n0153282900000005.jpg'

I have downloaded and saved .pkl files as instructed but this error shows that probably unpickling is not done successfully, I guess. I would appreciate if you help me through.

Hi, thanks for your excellent work! But when I download your code and run it on my own computer, there is an error occurs:
line 111, in read_vinyals_split
img_list = os.listdir(char_folder)
FileNotFoundError: [Errno 2] No such file or directory: 'data/omniglot/images_all/Angelic/character01'
Could you please tell me why is that?

Which split files should I use to generate data exactly the same as the npy dataset to tiered?

In this repository, quite a few cvs files are provided. Which three files are the oringal split proposed in the paper and should be used for reporting performance in new papers?

Number of distractors

How to specify the number of distractors?
I tried following modification, in run_exp.py line 329
meta_train_dataset = get_dataset( FLAGS.dataset, train_split_name, nclasses_train, nshot, num_distractor=NUMBER_OF_DISTRACTORS,

but the batch size of unlabelled data became
nclasses_train*num_unlabel +num_unlabel*num_distractor.
According your paper, the unlabelled batch size should be
nclasses_train*(num_unlabel + num_distractor).

Any suggestion?

New text dataset

Hi, thanks for your work. This is a question, not an issue, so feel free to close it if you want.

I have a dataset with not-labelled call transcriptions, and I want to train a classifier for them. I'm wondering if I could use the few-shot-ssl-public to train it (once part of the dataset it's labelled manually).

I'm waiting your suggestion, thanks again!

Pytorch. Is there a version implemented by pytorch?

Is there a version implemented by pytorch?

ImportError: cannot import name OmniglotEpisode

 File "/home/dd/few-shot-ssl-public/run_baselines_exp.py", line 57, in <module>
    from fewshot.data.omniglot import OmniglotEpisode
ImportError: cannot import name OmniglotEpisode

Why using the labeled and unlabeled split?

You separate the images in each class into disjoint labeled and unlabeled sets. Why we can't sample labeled images and take the remained images as unlabeled split and sample unlabeled data from here? In such way we don't need to create the labeled and unlabeled split. Since the update of prototypes among tasks do not influence others.

Question for Omniglot dataset setting

Hi, I have a question for Omniglot dataset setting.
In the paper, for test episode,

We used M = 5 for training and M = 20 for testing in most cases, thus measuring the ability of the models to generalize to a larger unlabeled set size.

But in Omniglot dataset, there are only 20 samples per class, so I think M(the number of unlabeled data in support set for one class) should be less than 20.
How did you set M for experiments with Omniglot dataset?

csv files dict wrong for mini-imagenet

Hi mengye,
Thanks for sharing the code.

In the following line:
https://github.com/renmengye/few-shot-ssl-public/blob/master/fewshot/data/mini_imagenet.py#L53
I think the val and test dict is wrong for mini-imagenet.

What does m_dist_1 += tf.to_float(tf.equal(m_dist_1, 0.0)) mean?

In clustering, I don't understand this code:

# Run clustering.
for tt in range(num_cluster_steps):
      protos_1 = tf.expand_dims(protos, 2)
      protos_2 = tf.expand_dims(h_unlabel, 1)
      pair_dist = tf.reduce_sum((protos_1 - protos_2)**2, [3])  # [B, K, N]
      m_dist = tf.reduce_mean(pair_dist, [2])  # [B, K]
      m_dist_1 = tf.expand_dims(m_dist, 1)  # [B, 1, K]
      m_dist_1 += tf.to_float(tf.equal(m_dist_1, 0.0))

Does m_dist_1 += tf.to_float(tf.equal(m_dist_1, 0.0)) mean that if the distance from the center of the cluster is 1 then add 1.
But why add 1?

How to interpret the pkl files?

I read the files with the following codes:

import pickle as pkl

with open("val_images_png.pkl", "rb") as f:
    data = pkl.load(f)

Then I found that data is a list of (n, 1) arrays. For example,

print(data[0].shape) # (18068, 1)

I think each array corresponds to an image. However, 18068 is not even divisible by 3, which means that it is not an RGB image?

How could I convert each array to an image with shape (H, W, 3)? Thanks.

Labels of tieredImageNet

Thanks for sharing the data and code. I have a question about the tieredImageNet data. When I loaded the .pkl file, I found the data was a list of numpy arrays without label information. Did I miss something?

Question about the number of unlabelled samples.

I am trying to reproduce your ssl-experiments and am a bit confused about the exact number of unlabelled images.

In your paper you write:

We used M = 5 for training and M = 20 for testing in most cases, thus measuring the ability of the models to generalize to a larger unlabeled set size.

In your README.md under 'Core Experiments' it says:

Add additional flags --num_unlabel 20 --num_test 20 for testing mini-imagenet and tiered-imagenet models, so that each episode contains 20 unlabeled images per class and 20 query images per class.

Could you please specify which experiments were carried out with M=5 and which with M=20. Moreover, for which experiments is there a difference in M between meta-train and meta-test time?

Network architecture

Hi,
Where can I find the explanation of network architectures which are used in experiments of the paper?
I think its not written in the paper.

Thanks!

Data Format in tiered imagenet

Hi,
Thankyou for sharing the code and data. I have a question about the tieredImagenet data. When I loaded the .pkl file, I found the data was a list of numpy array. But the array is not of the shape (img_size * img_size * 3) and can't not be reshaped into this form. A numpy array should be a image right? Besides, I find that the shape of images change, from near 8000 to near 19000?

Cannot reshape array of size 0 into shape (0,newaxis)

When I run the baseline file, it shows an error : cannot reshape array of size 0 into shape (0,newaxis) in the File "/home/xxx/few-shot/run_baselines_exp.py", line 413, in get_nn_fit
x_test_ = x_test.reshape([x_test[ii].shape[0], -1]). Is the flags I set not wrong? My settings are as following,
flags.DEFINE_integer("nclasses_eval", default=5, help="Number of classes for testing")
flags.DEFINE_integer("nclasses_train", default=5, help="Number of classes for training")
flags.DEFINE_integer("nshot", default=1, help="1 nshot")
flags.DEFINE_integer("num_eval_episode", default=600, help="Number of evaluation episodes")
flags.DEFINE_integer("num_test", default=-1, help="-1 Number of test images per episode")
flags.DEFINE_integer("num_unlabel", default=5, help="5 Number of unlabeled for training")
flags.DEFINE_integer("seed", default=0, help="Random seed")

A doubt regarding the test data.

Hey @renmengye,
I have a doubt regarding the paper. Suppose we trained the model by taking 5 classes in each episode. I have total of 40 classes. After training, I have test data where I have to classify each image to one of the 40 categories. Now how can I do that? The most logical way is to calculate a noormalized probability for each class and then assigned an image to a class with the highest probability. But this somehow looks not the right way as we trained the model for only 5 classes. Kindly help me here
Regards

Super-class Labels of tieredImageNet

The general labels in each split seem to have an extra, corrupted(?) value that many data points take on. For instance in the training set there are 20 labels {0, ..., 19} but there is a label value 20 used by 341546 data points.

Should all of these data points be excluded, or is there a way to generate correct labels for these?

Apologies if I am confused in my understanding of your dataset. Thank you for working on a hierarchical few-shot dataset.

Here is some illustrative code from my exploration of the data:

>>> train['label_general_str']
['garment',
 'musical instrument, instrument',
 'restraint, constraint',
 'feline, felid',
 'instrument',
 'hound, hound dog',
 'electronic equipment',
 'passerine, passeriform bird',
 'ungulate, hoofed mammal',
 'aquatic bird',
 'snake, serpent, ophidian',
 'primate',
 'protective covering, protective cover, protect',
 'terrier',
 'saurian',
 'building, edifice',
 'establishment',
 'tool',
 'craft',
 'game equipment']
>>> len(train['label_general_str'])
20
>>> train['label_general'].max()  # should be 19
20
>>> uniq, count = np.unique(train['label_general'], return_counts=True)
>>> count
array([  1300,   1300,   1216,   1300,   1300,   1300,   1300,   1300,
         1300,   2600,   1300,   2449,   2600,  11700,   2590,  10258,
        13587,  13000,  24158,  11291, 341546])  # many invalid points with last value

Can not download tieredImageNet Dataset

The Google Drive shareable link of tieredImageNet has expired, could you please update the link?