Welcome to Vision repository documentation!¶

This project’s goal is to provide Roboy with extensive vision capabilities. This means to recognize, localize and classify objects in its environment as well as to provide data for localization to be processed by other modules. The input will be a realsense camera device, the output should be high-level data about Roboy’s environment provided using ROS messages and services.

The most import task in Vision for human interaction is to detect and recognize faces, which is why this was considered the highest priority of this project. The current main tasks of this project are:

Identification of Roboy Team Members
Pose estimation of a detected face and Roboy Motor Control
Tracking of detected objects
Person Talking detection
Mood Recognition
Gender Recognition
Remebering faces online
Age classification
Scene and object classification

What Roboy Vision can do:¶

Face detection.
Speaker detection.
Object detection.
Multitracking.

Relevant Background Information and Pre-Requisits¶

Our approach to tackle the given tasks in Vision is to use machine learning methods. Therefore a basic understanding of machine learning, specifically also deep Neural Networks and Convolutional Neural Networks will be necessary.

The following links are to be seen as suggestions for getting started on machine learning:

Crash Course on Deep Learning in the form of Youtube tutorials: DeepLearning.tv
Closer Look at the implementation of Neural Networks: The Foundations of deep learning
An introduction to Convolutional Neural Networks (CNNs): Deep learning in Computer vision
The machine learning framework used for implementation: Tensorflow
Furthermore a basic understanding of simple machine learning approaches like Regression, Tree Learning, K-Nearest-Neighbours (KNN), Support Vector Machines (SVMs), Gaussian Models, Eigenfaces, etc. will be helpful.

The papers currently used for implementation should be understood:

Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks
FaceNet: A Unified Embedding for Face Recognition and Clustering
DLIB: Facial landmarks and face recognition
‘You Only Look Once: Unified, Real-Time Object Detection <https://pjreddie.com/media/files/papers/yolo.pdf>`_

Furthermore there are plans to extend the implementation using this paper:

An All-In-One Convolutional Neural Network for Face Analysis

Contents¶

Usage and Installation¶

Development¶

About arc42