- Date:14 Sep 2020
- Views:1
- Downloads:0
- Pages:10
- Size:1.79 MB

Transcription:

Average quantization error 1 00 Average quantization error 0 9084 Average quantization error 0 9094. 1 0 1 1 0 1 1 0 1, a PCA aligned b ITQ rotation c KNNH transformation. Figure 1 Toy illustration of K Nearest Neighbors Hashing KNNH The basic idea of binary embedding is to quantize. data points to the closest vertex of the Hamming cube a PCA leaves out the binary repesentation and splits each cluster. to different vertices b ITQ found the optimized rotation in the context of lowest quantization error c KNN Hashing. endeavors to maintain the k nearest neighbors within the same subspace during rotation detailed in Section 2 4 Although. it yields even larger quantization error than ITQ the proposed transfomation is closer to ideal space partitioning. conditional entropy minimization which eludes analysis on neighbor search schemes 36 21 within a unified frame. sign by transforming the hashing problem into a space work X. partitioning problem With Kozachenko Leonenko estima E x d e x 22. tor we further prove that the conditional entropy minimiza x. tion encourages the data point and its k nearest neighbors to where e and d refers to encoder and decoder e g x. share the same hashing codes As illustrated in Figure 1 the is XR e is sign and d refers to scalar matrix in. proposed K Nearest Neighbors Hashing KNNH transfor ITQ 12 A quantizer that minimizes E should map any x. mation approaches optimum by preserving KNN within the to its nearest codeword in the codebook We argure that this. same subspaces i e the same codewords Extensive ex objective is simple and intuitive but may not straight to the. periments show that our method outperforms the state of hashing target. the arts on benchmark datasets which indicates the effec It is common practice to turn E into well known signal. tiveness of the proposed theory in real world applications to noise ratio or signal to quantization noise ratio 14. 2 Approach E x 2,SN R 10 log10,E x d e x 2,2 1 Preliminary. This target alone reflects the compressibility of data instead. Formally denote input matrix X Rn d as the con, of codewords similarity In another word distortion mini. catenation of n vectors X xi ni 1 The vertices of an. mization is a data compression system where E and SN R. axis aligned c dimensional hypercube is 1 1 c denot. focus on the minimization of reconstruction error However. ed as Bc In general the encoder bi sign xi maps a, hashing aims at the approximation of nearest neighbors us. vector xi Rd to the closest vertexes bj Bc hence we. ing codewords There is no guarantee that optimized com. split Rd into 2c disjoint subspaces S1 S2 S2c where. pression leads to the closest codewords since a cluster near. Sj x sign x bj Given i i d samples xi ni 1 from, the endpoints of a quantization interval can be split into d.

the underlying probability density function p x for x. ifferent codewords, Rd we apply KNN estimator and re substitution to non. parametric estimation,R described in Section 2 3,R Then we 2 3 Conditional entropy minimization. have p bj x p x sign x bj dx x Sj p x dx, Under the constraint of orthogonal transformation the. For a discrete random variable Y with probability mass. relation of nearest neighbors can be preserved during rota. function p Y Shannon entropy is defined as H Y, P tion However this benefit may bring us to another pitfal. E log p y y p y log p y, l orthogonality is the guarantee of order preserving This.

2 2 Mean square minimization condition holds when we relax our discrete hashing codes. to be a continuous variable b Rc It is obvious that. The authors of 11 formulated a variety of hashing Rxi Rxj 2 xi xj 2 when RT R I But the exis. methods 12 41 43 and some other approximate nearest tence of non smooth encoder makes e Rxi e Rxj 2. Estimated H B V 0 078 bit Estimated H B V 0 074 bit Estimated H B V 0 042 bit. 0 2 0 4 0 4,0 1 0 2 0 2,1 0 1 0 1 0,1 0 1 1 0 1 1 0 1. a Random rotation b ITQ rotation c KNNH transformation. Figure 2 The contribution i e ln v i, of each data point to H B V shown in colormap KNN based shrinkage leads. to lower conditional entropy than ITQ rotation due to less confusing points near boundary. xi xj 2 an open problem i j in a neighborhood To we utilize the alternate format of H B V. overcome this issue early works turn to relaxed hashing. function such as Sigmoid 42 and tanh 33 to ap H B H B V I B V H V H V B. proximate original problem We show that it is feasible H B V H B H V H V B. to create the direct relation between features and hashing. codes without approximation For the estimation of differential entropy we further intro. To directly model the connections between binary code duce the well known Kozachenko Leonenko estimator 23. words and real valued features we circumvent the non n. smooth sign function and become interested in the sub H V k n ln cd ln k vi 4. spaces partitioned by sign Note that the number of com n i 1. ponents in a space partition usually plays a central role in. where is the digamma function cd is the volume of the. probability and statistical learning theory and the relation. d dimensional unit ball and k vi is twice the disance from. ship between minimum mean square and mutual informa. vi to its kth nearest neighbor Compared with conventional. tion have been discussed in 38 15 16 All those works. estimators based on binnings Eq 4 has minimal bias and. inspire us to describe the hashing process in information. estimates only from KNN distances This property will ease. theoretic criteria, the later derivations and lead to our hashing method P. From Eq 4 we further estimate H V B j p v,min H B V V f X. Sj H V v Sj in disjoint subspaces which erases the. where B sign V the feature vi is from a continuous error made in the individual integration over definite bins. distribution and bj is the discrete output gallery codes This E q 2 so that. target minimizes the uncertainty in B when V is known In 2 c. other words the optimal feature representation should make. it easy to determine their codewords j 1, The objective conforms to our intuition but the plug in n.

method heavily relies on the approximation of probability dX. k n ln cd ln k vi,density function over bins Formally n i 1. H B V p v H B V v dv 1 k Sj ln cd,X pv b i j d X,pv b i j log 2. ln k vi v Sj,pv i Sj i 1, where pv i i i v dv the integrationRof estimated den. sity v over ith bin and pv b i j i i v Sj dv Here Sj refers to the number of data points in space Sj. Clearly it would be impractical to set the right side of E and k vi v Sj is twice the distance from vi to its KNN. q 1 as the optimization target To bridge over the obstacle in Sj given sign vi bj Note that k vi v Sj makes. Algorithm 1 Unconstrained KNN Hashing Algorithm 2 Orthogonal KNN Hashing. Data A set of data points Xi ni 1 d, R in the form Data A set of data points Xi ni 1 Rd in the form. of X Rn d of X Rn d, Result Codewords B 1 1 n c and Result Codewords B 1 1 n c and rotation.

transformation matrix W Rc c matrix R Rc c,X Rn c X X Rn c X. Dimensional reduction PCA in this work Dimensional reduction PCA in this work. while W not converged do Euclidean KNNSearch X Nn k. V XW Here f is a linear projection Return the indexes of KNN. Euclidean KNNSearch V Nn k for j 1 n do,Return the indexes of KNN Xj mean X j R1 c. for j 1 n do Update X i e KNN Shrinkage,Vj mean V j R1 c end. Update V i e KNN Shrinkage B R arg min sign XR XR 2F. end RT R I,W arg min sign V XW 2F Classical hashing problem. return B R,return B W, a simple heuristic method to construct V which reduces.

H B V from the view of KNN distance This idea begins. each feature point hold its KNN only when its KNN is in the with an unconstrained method and refined by an orthogonal. same space as vi In this case we simplify the double sum constraint. mation j i ln k vi v Sj to i ln k vi Svi where, Svi v sign v bj sign vi By substituting this Unconstrained method As shown in Eq 6 the outlier. term into Eq 5 and approximating digamma function as of the cluster is more likely to drop into different spaces. n ln n 2n 12n 2 O n4 2 H B V can be, and is far more sensitive to its KNN than center points In. further written as another word vi Svi of outliers may hugely change in a. d X k vi Svi small step towards cluster center especially for each feature. H B V ln dimension V i N 0 2 Hence we propose the reduc. n i 1 k vi, z tion in the volume of the cluster based on KNN i e KNN. term I shrinkage To alleviate the impact of outliers we recon. 1 m 1X 1 1 struct each feature point by the mean value of its k nearest. O 2 neighbors recursively from 2 nn to k 1 nn in case 1 nn. 2n n j 1 12 Sj n, z is still an outlier That is updated feature points v will be. term II used in the following iterations for loop in Algorithm 1 2. where m is the number of Sj satisfying Sj 6 0 For m In fact every feature point can be viewed as the weight. n it is clear that H B V dominated by term I with term II ed average with all the others As illustrated in Figure 2. approaching zero This result leaves us two insights 1 the shrinkage step plays a key role in the decrease of H B V. information loss of binarization comes from the data near Benefiting from this information gain min sign v v 2. the boundary as shown in Figure 2 where warm colors are will better preserve the relation between original feature s. distributed among coordinate axes and 2 H B V should pace and Hamming space So far we obtain the uncon. decrease when KNN of vi all come from the same space strained KNNH shown in Algorithm 1. As for m n the second term is still lower bounded by. 1 n n 12 7, 2n n 12 and we can define a larger k vi Svi Orthogonal method Although the unconstrained method.

when k Sj to penalize the singletons spaces with few has taken vi Svi into consideration there are two in. samples and to balance the contribution to the uncertainty evitable problems in practice 1 computational time e g. between two terms KNN search and shrinkage in while loop will take plenty of. time due to massive computation and memory read writes. 2 4 K nearest neighbors hashing, and 2 without constraint the only solution is the trivial one. Although Eq 6 has been more concrete than Eq 1 it is Transformation matrix W minimizes sign V XW F. still hard to put criteria into practice Therefore we propose at the cost of losing KNN relations where most data points. Table 1 Comparsions of different representative unsupervised hashing methods on the MNIST dataset Each image was. represented as a 784 D 28 28 gray scale feature vector by using its intensity. Hamming Ranking mAP precision N 1000 precision r 2. 16 32 64 16 32 64 16 32, LSH 1 20 88 25 83 31 71 37 77 50 16 61 73 25 10 55 61. SH 43 26 64 25 72 24 10 56 29 61 29 61 98 57 52 65 31. PCAH 41 27 33 24 85 21 47 56 56 59 99 57 97 36 36 65 54. SpH 18 25 81 30 77 34 75 49 48 61 27 69 85 51 71 64 26. KMH 17 32 12 33 29 35 78 60 43 67 19 72 65 61 88 68 85. ITQ 12 41 18 43 82 45 37 66 39 74 04 77 42 65 73 73 14. KNNH 47 33 53 25 56 03 67 95 75 89 79 04 71 82 69 08. are packed into few buckets Fortunately an orthogonal labels Places205 is a very challenging dataset due to its. constraint does address both problems simultaneously large size and number of categories which contains 2 5M. Since orthogonal transformation preserves lengths of images from 205 scene categories Following 3 we use. vectors and angles between vectors the KNN relation the CNN features extracted from the fc7 layer of ImageNet. should be maintained during iterations In this context pre trained AlexNet and reduce the dimensionality to 128. i j XR i is equivalent to i j Xi R and we can, move KNN search and shrinkage outside of the loop Note We use the following evaluation metrics to measure the. that the objective then becomes min sign v v 2 That performance of methods 1 mean Average Precision mAP. is a natural two stage scheme KNN based shrinkage and which evaluates the overall performance on different ob. classical hashing problem shown in Algorithm 2 Intuitive ject classes 2 precision of Hamming radius of 2 preci. ly one may argue that we just replace v by v in original sion r 2 which measures precision on retrieved images. hashing problem but on the other hand if we have V which having Hamming distance to query 2 we report zero pre. satisfies H B V 0 a single sign should solve the cision for the queries if no image satisfy 3 precision at. hashing problem At inference time we directly apply the N samples precision N which refers to the percentage of. learned linear projections to unseen data points i e testing true neighbors on top N retrieved samples In our experi. samples without KNN shrinkage ments we strictly follow the same comparison settings in. previous works most of the results are directly reported by. 3 Results the authors Besides in order to improve the statistical sta. bility we repeated the experiments 10 times and took the. 3 1 Datasets Evaluation protocol average as the final result Since the performance on small. We evaluate the proposed K Nearest Neighbors Hashing sample classes is more sensitive to the queries we execute. KNNH on three balanced benchmark datasets CIFAR the experiments on LabelMe 12 50K 50 times to get the re. 10 24 MNIST 27 and Places205 47 3 and we fur sults To prove that min H B X leads to better codewords. ther verify the performance on an extremely imbalanced than straightforward quantization minimization the hashing. K Nearest Neighbors Hashing Xiangyu He1 2 Peisong Wang1 Jian Cheng1 2 3 1 NLPR Institute of Automation Chinese Academy of Sciences Beijing China 2 University of Chinese Academy of Sciences Beijing China 3 Center for Excellence in Brain Science and Intelligence Technology CAS Beijing China xiangyu he peisong wang jcheng nlpr ia ac cn Abstract

Recent Views:

- Harris todaro model of urban unemployment
- Welcome to science 9 myp 4
- Lcd tv owner s manual
- A kompetensi inti dan kompetensi dasar pendidikan agama
- The essential guide to travel planning
- 1 7 21 e0 1 13du srxohv rxeohwwhv k lplwp j udlv
- An introduction to boundary layer meteorology gbv
- Honor code statement for exam sscc home
- Dnv classification note 62 1 guidance for the
- Machine learning for numeric prediction