Archive for the ‘new-stuff’ Category


Google announced Project Tango on February 20, 2014.  It’s a cell phone that captures and reconstructs the environment in 3D, wherever the user points the back cameras. There are 2 cameras, a color imaging camera and a depth camera (or Z-camera), very much like the first generation Kinect. But Project Tango is much more than the Kinect, it performs in real time all the computation of the 3D reconstruction using co-processors from Movidius.

This reminds me of what Dr. Illah Nourbakhsh said in 2007 in the inaugural presentation of the IEEE RAS OEB/SCV/SF Joint Chapter:  that some day, we’d be able to wave a camera and capture the entire 3D image of our environment. Project Tango is just that simple, just aim the cameras to the areas to create the 3D reconstruction.  To complete a room, you’d have to walk around the whole room to capture all the information.

Using SLAM algorithm, aGPS, and orientation sensors, Project Tango is also able to localize the 3D reconstructed image to its location on earth and relative to the location of the device itself.

Project Tango is running a version of Android Jelly Bean, rather than the latest Kitkat release.  What’s more, it apparently is using a PrimeSense sensor, which now is no longer available after Apple’s acquisition of PrimeSense. (Interesting that Google did not push to outbid Apple for PrimeSense. After all, there are plenty of alternative depth sensor technologies out there.) Furthermore, Battery life is very limited. These and other issues will eventually be solved, for real-world deployment.

Applications for real time 3D reconstruction and mapping include augmented reality, architectural design, and many others. Most interesting would be the use in mobile robots to maneuver in the real world.  Just imagine in-door drones, armed with this capability, would be able to move autonomously and safely anywhere in a building, monitoring and transporting items from one location to another.  The applications are endless.

Google has advanced computing technology to enable real interaction with the physical world, by demonstrating the real-time 3D reconstruction and mapping capability in Project Tango.


Read Full Post »

a2On January 14, Dr. Gary Bradski gave a talk about Willow Garage‘s ROS and OpenCV at the IEEE RAS meeting.  It was the first time this RAS meeting has over 60 people and the first time we held it in the brand new auditorium at the CMU Silicon Valley campus.

As the creator of OpenCV, Bradski is still driving new developments in OpenCV, especially with 5 full time programmers and additional part time folks helping out, and collaboration with others like Michael Black, contribution by David Lowe (though not SIFT).  New developments include (but not limited to):

  • stereo (dense stereo at 30Hz)
  • 3D mapping
  • improved optical flow
  • learning algorithms incorporating random forest
  • rewrite of SURF

One of Bradski’s application of these new development is to enable ROS to recognize doorknobs, which is still yet to be solved.

Bradski’s presentation is here.

Copyright (c) 2009 by Waiming Mok

Read Full Post »


The next step to a better experience on TV and computer could be seeing images in 3D.  At CES, there were many demonstrations of 3D, 3dcamerafrom content creation to presentation.

3dwebcamminoru3dContent creation usually involves 2 cameras.

However, if you can move or rotate an object, you can just use one camera:


Presentation usually requires a pair of 3D glasses.  However, Samsung demonstrated a 3D monitor that shows amazing 3D effects without any 3D glasses, just look and walk around it and you see the 3D effect.


 To transmit and deliver 3D data, the following would be needed:

* At least 2X the image size, for multiple angles, the amount data would go up accordingly. 

* Transmission bandwidth would increase accordingly, although compression algorithms might be able to take advantage of the repetitive nature of the 3D images.

* Processing would be needed to present the image, thus vendors like Nvidia is in the space.

 Copyright (c) 2009 by Waiming Mok

Read Full Post »

Rethinking Cars

thrunWhile manning the IEEE RAS SCV booth at Robo Development Conference, I also had the opportunity to listen to Sebastian Thrun’s keynote.  He spoke for some time on the 2 DARPA Grand Challenges, Stanley and Junior.  Toward the end, he made a call to action on juniorchanging the way we build cars to address the issues of traffic fatality and inefficiency of the current designs of the automobile.  The problems and inefficiencies include:

  1. Traffic fatality in the US costs 2% of GDP, at 42K deaths and 2.7M injuries each year.   Traffic is the leading cause of death for ages 3-33.  But the only response to vehicle safety is bigger vehicles (SUV), which is inefficient.
  2. We average 1.25hr / day in traffic.
  3. Of the 660 million cars in the world, the average utilization is 3%.  For much of the time, cars are parked.
  4. 22% of the nation’s energy is used by cars. 
  5. 30% of the energy used by each car is wasted on extra weight added for safety reasons.
  6. At peak capacity, only 8% of the highway’s capacity is used, spacing is required to ensure human drivers have enough distance between cars.  (What if fast moving cars could be paced closer together without accidents?)

Dr. Thrun suggested that the above problems could be addressed with new innovations involving more sensors and intelligence to make cars run autonomously.  He called on engineers to rethink how cars are designed and built, as well as how they are used and shared by people.

Copyright (c) 2008 by Waiming Mok

Read Full Post »

Follow the Money, First Cut

As part of the investigation for an event topic for VLAB, I started looking into government money (such as the $700B financial bailout dollars) instead of private capital for fundings of new products and services.   Not sure where that would lead.  Using Freemind, I’ve captured an evening’s investigation into a mind map:



Copyright (c) 2008 by Waiming Mok

Read Full Post »

Amazon Mechanical Turk

At the DM SIG of SFBay ACM, Rion Snow spoke about using Amazon Mechanical Turk as a cheap and fast way to do various tasks, such as collecting annotations, to train machine learning and natural language recognition algorithms.  He was charging as low as $1 for 1,000 labelling tasks. 

MTurk shows Amazon’s foresight, along with EC2 and other services.  MTurk was established in 2005 to leverage the idea of crowdsourcing, where people online can request specific tasks to be done (such as doing paper review and editing) with a payment.  Other people in MTurk would just the task and get paid.  Most people get paid on average of $1/month, while some really diligent folks could get up to $100/month.

Rion indicated that the result could be made close to experts could produce, at low cost and fast, provided some of the biased/noise data could be eliminated.  The biased/noise data include those that were generated by a few people who were just doing the job for the money (conjecture on my part).

Copyright (c) 2008 by Waiming Mok

Read Full Post »

Adam Coates of Stanford presented at the IEEE RAS meeting.  stanfordhelisHe has his colleagues, including Pieter Abbeel and Andrew Y. Ng, used machine learning to enable a 2-processor computer to quickly learn complicated flight patterns of large model helicopters controlled by a skilled pilot.  The model helicopter pilot took 20 years of learning to perfect some of the complicated and aggressive flights and the computer was able to learn it in a few hours.  Thereafter, the computer can repeatedly control the model helicopters thru the same flight pattern.  Adam indicated that they looking into applying these techniques for other types of robot learning tasks.

Copyright (c) 2008 by Waiming Mok

Read Full Post »

Older Posts »