A technology called Photosynth, developed by Microsoft Research in collaboration with University of Washington, has the potential to change the way we look at maps. “Photosynth takes a large collection of photos of a place or object, analyzes them for similarities, and displays them in a reconstructed 3-Dimensional space.”
Call me skeptical but I really wanna know how much “refining” had to be done in order to get a working model. creating a pointcloud model is a tedious and time consuming task, I really wonder how in the world an algorithm is supposed to do that without any help. If it works, that would be an outstanding achievement, but I’m really skeptical about it. It’s difficult for a human being to fix the position of a few photos in a 3d space, let alone for a computer program.
Edited 2006-08-01 19:06
They claim the pointclouds are generated by the software and i tend to believe them, because it is just a more sophisticated version of what “stitcher” programs already do.
I don’t know. I have to admit that I wrote my first comment a bit in a rush. But I’m still skeptical. It sounds too good to be true. I really want to see the building app, not just the client. For example, how does the algorithm know the focal lenght or the horizontal tilt? How can it estimates distance when it doesn’t have any clue of tha scale of the object or its dimensions?
Edited 2006-08-01 19:13
By recognizing patterns. If patterns fit, adjusting scale is not that much of a problem.
What i find striking though is the fact that all photos in their demo have the same color. That leads me to the conclusion that those photos where taken on the same time at the same day with cameras of the same typ. Or do they avarage the color of all associated photos?
“By recognizing patterns. If patterns fit, adjusting scale is not that much of a problem.”
that’s true, but then perspective comes in and confuses things altogether. If two shots are taken in the same positions the pattern would be easy understandable, but if the camera moves a bit the pattern would be distorted and tha computer “shouldn’t” be able to figure out the distance of the points in the pattern, unless it knows the position of the camera and the focal lenght in the first place. As for the same colour, I guess they average the colours in order to compensate the different exposures. I believe there’s other programs doing that, think about the panorama-building apps. Anyway I believe the photos were taken pretty well, maybe with a little attention too (I don’t believe they’re random holiday shots)
The human brain deals with this problem by making canonical representations; adjusting what is seen to try to remove artifacts caused by the angle of view.
For example: let’s say you’ve got two pictures; one has a roughly rectangular object, and the other has a somewhat skewed trapezoidal object. What if you try to apply perspective transformations on the trapezoid to make it into a rectangle? Does the image then roughly match the one with a rectangular object, after scaling? Likewise, you can try to turn ellipses into spheres, look for lines meeting at the horizon and consider them as parallel, etc.
The UW site says that they needed weeks of number-crunching to generate each 3-D reconstruction.
“The human brain deals with this problem by making canonical representations; adjusting what is seen to try to remove artifacts caused by the angle of view.”
True, but let’s not forget stereoscopic vision altogether. Sure it’s not required to figure out objects, proportion and perspective, but it definitely helps.
As for the GPS thing in another post, again that’s true, but I really doubt the GPS resolution is enough for the accuracy required, unless of course you’re just taking pictures of large panoramas far far away. But orientation definitely helps. As for the data inside each shot, true again, that helps. Fact is, not every picture comes with that data and again we’re quite far behind the information required.
“The UW site says that they needed weeks of number-crunching to generate each 3-D reconstruction.”
Well at least that doesn’t surprise me . But I still want to know how much refining had to be done by hand.
“how does the algorithm know the focal lenght or the horizontal tilt?”
Focal length: Digital cameras store that as metadata already. Panorama stitching programs already use that information. I’ve created breathtaking panoramas completely automatically. Only they depict a plane, not a 3D surface.
Exact position and heading: GPS. They’re surprisingly precise now. I’m not sure this is even needed. Panorama stitching programs don’t need this information either, but it might help.
Tilt: for example, using tilt sensors. Google for digital inclinometer.
Nevertheless, it’s an awesome technology.
I used to think the same thing until I used autostitch to automatically stitch my photos into a panorama
One thing that people are forgetting is that some of today’s cameras now include the ability to record GPS coordinates and camera orientation information into a photograph’s metadata. That makes it much easier to do spatial analysis on the textures that can be extracted from a photograph.
If you want more answers, definitely read the paper.
It’s really just an evolution of photo-recon. You still need ground truth data in the form of registered photos, and yes, the sample data is optimized.
There’s a long way between here and photo-tourism, but it is an interesting step in a useful direction.
Check out this video at Channel 9:
http://channel9.msdn.com/Showpost.aspx?postid=220870
It gives you technical details about how they do it.