The code is the implementation of zero-shot learning based on the "bidirectional latent embedding" framework in the following paper: Qian Wang, Ke Chen, Zero-Shot Visual Recognition via Bidirectional Latent Embedding. arXiv:1607.02104 Please cite this paper if you use the codes or data in this package. ================== Overview ================== The algorithm is implemented in matlab. We have tested the codes with the following MATLAB versions: R2013a on Windows and R2015a on Linux. Running the codes on different datasets requires different amounts of RAM, for UCF101,HMDB51 and CUB-200-211, a 24G RAM is enough, but AwA needs more. For a quick start, you can run the example scripts for training and test. We provide example scripts for different settings. Please refer to the following sections for details. ================== Codes ================== example_[dataset]_[visRep]_[dataSplits].m Run this script file to start. You can modify the parameters, the source data in this file. zsl_bidiLEL.m We also provide this interface function to run the experiment with different datasets/representations/splits Input variables dataset: a string specifying dataset name, depends on how you name the .mat files (e.g., 'awa50', 'ucf101', 'hmdb51', 'cub200'); visRep: a string specifying the type of visual representations (e.g., 'googlenet', 'vgg19', 'idt', 'c3d','mbh'); semRep: a string specifying the type of semantic representations (e.g., 'wordvector', 'attributes', 'combination'); dataSplits: a string specifying the data splits (e.g., '40_10' for awa50, '81_20' and '51_50' for ucf101); opts: specifying the values of hyper-parameters (by default the optimal values used in the paper will be used). dy: subspace dimensionality alpha: regularisation parameter kG: kNN for graph construction in SLPP k_st: kNN for self-training gamma: for semantic representations combination initB: 'random'- random initialisation for top-down embedding; 'preStored' - using pre-stored randomly generated initialisation for repeatable results. Output variables res: a n-dimensional vector of zero-shot accuracy for n different splits, n is the number of data splits res_st: a n-dimensional vector of zero-shot accuracy with self-training for n different splits Usage 1. use default hyper-parameters: [res, res_st] = zsl_bidiLEL('ucf101', 'c3d', 'attributes', '51_50'); 2. specify hyper-parameters: [res, res_st] = zsl_bidiLEL('ucf101', 'c3d', 'attributes', '51_50',opts); 3. for the combination of multiple visual representations, one can simply add new feature matrices into feature.Data{}. codes/bottomUp_subspace_learning.m The script file for latent subspace learning. Other processes like top-down learning and recognition are also called in this file. codes/bottomUp_subspace_learning_noKernel.m The noKernel version of bottom-up learning. It is called when there is only one vis. rep. used. codes/bottomUp_alternative_ssl.m The script file for learning subspace with alternative algorithms, i.e., LPP, PCA, LDA codes/topDown_latEmbeddings_learning.m learn the latent embeddings for test classes:B_te(i.e.,B^u in the paper). codes/constructW.m codes/constructKernel.m codes/EuDist2.m These functions files are downloaded from: http://www.cad.zju.edu.cn/home/dengcai/Data/DimensionReduction.html codes/semi_sammon.m The implementation of Landmarks based Sammon's Mapping, adapted from the original version: http://theoval.cmp.uea.ac.uk/~gcc/matlab/#sammon codes/self_training.m The implementation of self-training strategy. The adjusted latent embeddings are denoted as B_te_st. codes/sp_zsr1.m; codes/sp_zsr2.m The implementation of a simple version of structured predition. The difference between sp_zsr1.m and sp_zsr2.m lies in the initialisation methods of kmeans clustering (B_te vs. Random). ================== Data ================== There are 4 types of data for each dataset. Details are as follows. type 1 : [dataset]_c3d.mat, [dataset]_idt.mat, [dataset]_googlenet.mat, [dataset]_vgg19.mat variables : feature.Data{m} contains a representation matrix with rows of instances. allLabels contains corresponding labels (1~NumOfClassess). type 2 : [dataset]_semantics.mat variables : attributes contains attributes based semantic representations for C classes used in the paper, each row represents one class. wordvector contains wordvectors based semantic representations for C classes used in the paper, each row represents one class. type 3 : [dataset]_zsl_splits_x_y.mat (x,y denotes the number of training and test classes respectively) variables : train_class_set contains numberOfTrials rows, each row has numberOfClass binary-valued elements with 1 indicating training class and 0 indicating zero-shot class test_class_set contains numberOfTrials rows, each row has numberOfClass binary-valued elements with 1 indicating zero-shot class and 1 indicating training class type 4 : fixedInitialB.mat variables : initialBForAllDatasets.[dataset]_[dataSplits] contains pre-stored randomly generated initialisation of B for top-down embedding only for repeatable experiment results. ================== Apply to new datasets ================== It is easy to experiment on new datasets. You just need to structure the necessary data in a proper format, and call the function zsl_BiDiLEL. In specific, you need: 1. visual representations stored in format feature.Data{m} and the allLabels for all the instances, the file should be named as [dataset]_[visRep].mat; 2. semantic representations stored in the file [dataset]_semantics.mat (if you use other semRep other than attributes and wordvector, you might revise the code a little bit); 3. data splits for zero-shot learning, two variables, train_class_set and test_class_set, which are stored in the file named [dataset]_zsl_splits_x_y.mat, where x and y are the number of training and test classes respectively. ================== Contacts ================== Qian Wang qian.wang@manchester.ac.uk