Feeds:
Posts
Comments

Archive for the ‘robotics’ Category

At the IEEE CIS Chapter presentation, Dr. Jeannette Bohg presented her work on using CNN to learn the parameters for the model dynamics of a multi-link robot that pushes an object on a flat table (2D). The CNN processes real time camera images to determine the object position on the table and the proximity to the robot end-effector. The robot knows where it’s end-effector is on the table through it’s sensors.

The output of the CNN is processed into parameters (robot pusher’s movement and position, friction parameters, object’s linear and angular velocities) for the model dynamics. The dynamics model uses these parameters to compute the robot motion to move the object.

IMG_20180920_191512

The results are good. However, a pure CNN framework that processes the images and predicts the robot motion, produces event better results. However, a trained Bohg network works better than the trained CNN-only network, when the object is change. Thus, this method is more robust to changes.

Dr. Bohg’s talk also covered other topics, such scene flow estimation, all using the idea of using data (via CNN or other deep learning networks) to predict parameters for a dynamics model that account for the physics. The goal is to find a more optimal solution by reducing bias coming from the model-driven approach and reducing the variance coming from the data-driven approach.

 

Read Full Post »

The developer who wrote TensorSwarm walked the group on the paper arXiv:1709.10082, which describes the use of Proximal Policy Optimization (PPO) in a systems of robots that learn, from scratch, by simulation, to avoid collisions for up to 100 robots, each armed with a 180 deg 2D LIDAR.

The developer implemented TensorSwarm based on the techniques described in the paper. Below is the video of the result of the implementation. He recommends training with small number of actors, and slowly increase the number of actors.

Read Full Post »

I completed Andrew Ng’s machine learning class on Coursera. While I’ve done other machine learning (ML) classes, I found this class’ Octave exercises enabled me to understand what’s happening in the algorithms. Many of my previous ML classes involved proofs of the mathematics, whereas this class focused on applying the vector and matrix operations to work with datasets, whether it’s a small m or large m (number of data points). After all, we need to work with the data to derive any meaningful insights.

Much of the supervised learning was devoted to figuring out the cost function and applying gradient descent to minimize the cost function whether it’s linear regression, logistic regression, neural network, or SVM. Unsupervised learning included k-means, PCA, and collaborative filtering. Most important is the approaches to develop better algorithms specific to the data: diagnosing bias and variance, add features to improve underfitting, regularization to reduce overfitting, use of ceiling analysis to determine where to spend effort to get improvements. These concepts all come to life for me in the programming exercises.

 

 

 

 

Read Full Post »

tangoimage

Google announced Project Tango on February 20, 2014.  It’s a cell phone that captures and reconstructs the environment in 3D, wherever the user points the back cameras. There are 2 cameras, a color imaging camera and a depth camera (or Z-camera), very much like the first generation Kinect. But Project Tango is much more than the Kinect, it performs in real time all the computation of the 3D reconstruction using co-processors from Movidius.

This reminds me of what Dr. Illah Nourbakhsh said in 2007 in the inaugural presentation of the IEEE RAS OEB/SCV/SF Joint Chapter:  that some day, we’d be able to wave a camera and capture the entire 3D image of our environment. Project Tango is just that simple, just aim the cameras to the areas to create the 3D reconstruction.  To complete a room, you’d have to walk around the whole room to capture all the information.

Using SLAM algorithm, aGPS, and orientation sensors, Project Tango is also able to localize the 3D reconstructed image to its location on earth and relative to the location of the device itself.

Project Tango is running a version of Android Jelly Bean, rather than the latest Kitkat release.  What’s more, it apparently is using a PrimeSense sensor, which now is no longer available after Apple’s acquisition of PrimeSense. (Interesting that Google did not push to outbid Apple for PrimeSense. After all, there are plenty of alternative depth sensor technologies out there.) Furthermore, Battery life is very limited. These and other issues will eventually be solved, for real-world deployment.

Applications for real time 3D reconstruction and mapping include augmented reality, architectural design, and many others. Most interesting would be the use in mobile robots to maneuver in the real world.  Just imagine in-door drones, armed with this capability, would be able to move autonomously and safely anywhere in a building, monitoring and transporting items from one location to another.  The applications are endless.

Google has advanced computing technology to enable real interaction with the physical world, by demonstrating the real-time 3D reconstruction and mapping capability in Project Tango.

Read Full Post »

This past weekend at Code Camp, I saw Yosun Chang‘s presentation on hacking Google Glass. I liked how her slides flow from 10,000 foot view of all the slides and zoomed into each slides. She was using Prezi.

I’ve seen Prezi presentations before but had not the opportunity to use it. I had been using PowerPoint and Keynote.

I have to give a speech at Startup Speakers, and I turned the opportunity into using Prezi. I like how Prezi enforces a structure to the presentation by just a simple graphical layout.

Here’s my 6-minute speech on my excursion on evenings and weekends into 3d printing in the last few years:

prezi-3d-snap

Read Full Post »

Tried Glass

I finally got to tried Glass. I attended Yosun Chang‘s Code Camp session on hacking Glass and she let the audience test some of the hacks she did with Glass. It was rather light and felt comfortable on the head.

I saw up close the structure that holds the OMAP4430 processor, 500+MB memory, 16G storage, GPS, camera, accelerometer, speaker, mic, light sensor and touch sensor.  There is a battery that’s designed to hang behind the ear when Glass is worn. The most visible piece is the clear block of plastic with a builtin prism to reflect the lights from the LCD screen to the eye’s upper right field of view.

I think it has huge potential for sensor fusion where the user intent could be surmised from the sense information of the accelerometer, mic, light, touch, and other sensors.

But NOT the use case of deploying the head as a mouse. One of hacks had the wearer browse items projected on a virtual cylinder by moving the head.  A person could easily get a neck cram and maybe develop a new form of carpal tunnel. Using gaze of the eye to track cursor position could be interesting, although currently there is no camera point at the eye. Opportunities for future versions of Glass.

In the mean time, I took a picture with me wearing Glass.

Image

Read Full Post »

I just received my certificate for the “Introduction to Artificial Intelligence” online course offered by Sebastian Thrun and Peter Norvig. I’m one of 23,000 who received the certificate, for doing well in the homework, midterm, and final exams. The class was one of three open online courses offered as an experiment by Stanford.   Enrollment reached 160,000 students from 190 countries.  The course started in September and ended in December.

Image

I finally got ot understand how probability and statistics could be applied to make sense of data.  Dr. Thrun explained very well the applications of bayesian statistics and especially particle filter. I had years and years of statistical theories in math, physics, finance, computer science classes.

These 3 classes have launched the Massively Open Online Course revolution in education.

Read Full Post »