Object recognition – technology in the field of
computer vision for finding and identifying objects in an image or video sequence. Humans recognize a multitude of objects in images with little effort, despite the fact that the image of the objects may vary somewhat in different view points, in many different sizes and scales or even when they are translated or rotated. Objects can even be recognized when they are partially obstructed from view. This task is still a challenge for computer vision systems. Many approaches to the task have been implemented over multiple decades.
a search is used to find feasible matches between object features and
the primary constraint is that a single position of the object must account for all of the feasible matches.
extract features from the objects to be recognized and the images to be searched.
A method for searching for feasible matches, is to search through a tree.
Each node in the tree represents a set of matches.
Root node represents empty set
Each other node is the union of the matches in the parent node and one additional match.
Wildcard is used for features with no match
Nodes are “pruned” when the set of matches is infeasible.
A pruned node has no children
Historically significant and still used, but less commonly
Hypothesize and test
correspondence between a collection of image features and a collection of object features
Then use this to generate a hypothesis about the projection from the object coordinate frame to the image frame
Use this projection hypothesis to generate a rendering of the object. This step is usually known as backprojection
Compare the rendering to the image, and, if the two are sufficiently similar, accept the hypothesis
There are a variety of different ways of generating hypotheses.
When camera intrinsic parameters are known, the hypothesis is equivalent to a hypothetical position and orientation –
pose – for the object.
Utilize geometric constraints
Construct a correspondence for small sets of object features to every correctly sized subset of image points. (These are the hypotheses)
Three basic approaches:
Obtaining Hypotheses by Pose Consistency
Obtaining Hypotheses by Pose Clustering
Obtaining Hypotheses by Using Invariants
Expense search that is also redundant, but can be improved using Randomization and/or Grouping
Examining small sets of image features until likelihood of missing object becomes small
For each set of image features, all possible matching sets of model features must be considered.
(1 – Wc)k = Z
W = the fraction of image points that are “good” (w ~ m/n)
c = the number of correspondences necessary
k = the number of trials
Z = the probability of every trial using one (or more) incorrect correspondences
If we can determine groups of points that are likely to come from the same object, we can reduce the number of hypotheses that need to be examined
Also called Alignment, since the object is being aligned to the image
Correspondences between image features and model features are not independent – Geometric constraints
A small number of correspondences yields the object position – the others must be consistent with this
If we hypothesize a match between a sufficiently large group of image features and a sufficiently large group of object features, then we can recover the missing camera parameters from this hypothesis (and so render the rest of the object)
Generate hypotheses using small number of correspondences (e.g. triples of points for 3D recognition)
Project other model features into image (
backproject) and verify additional correspondences
Use the smallest number of correspondences necessary to achieve discrete object poses
Keypoints of objects are first extracted from a set of reference images and stored in a database
An object is recognized in a new image by individually comparing each feature from the new image to this database and finding candidate matching features based on Euclidean distance of their feature vectors.
Genetic algorithms can operate without prior knowledge of a given dataset and can develop recognition procedures without human intervention. A recent project achieved 100 percent accuracy on the benchmark motorbike, face, airplane and car image datasets from Caltech and 99.4 percent accuracy on fish species image datasets.