Introducing DepthNet

DepthNet results vizualisation. Top left : Input video with added reticule, bottom left : Ground truth, bottom right : DepthNet output, top right: Error map (green means 0)

Abstract

We propose a depth map inference system from monocular videos based on a novel dataset for navigation that mimics aerial footage from gimbal stabilized monocular camera in rigid scenes. Unlike most navigation datasets, the lack of rotation implies an easier structure from motion problem which can be leveraged for different kinds of tasks such as depth inference and obstacle avoidance. We also propose an architecture for end-to-end depth inference with a fully convolutional network. Results show that although tied to camera inner parameters, the problem is locally solvable and leads to good quality depth prediction.

we also propose a multi-range architecture for unconstrained UAV flight, leveraging flight data from sensors to make accurate depth maps for uncluttered outdoor environment. We try our algorithm on both synthetic scenes and real UAV flight data. Quantitative results are given for synthetic scenes with a slightly noisy orientation, and show that our multi-range architecture improves depth inference.

Papers

Two papers were published for this project

UAVg

End-to-end depth from motion with stabilized monocular videos

in UAVg 2017 (Oral)

Full Text / Slides / ArXiV / Hal / cite

UAVg

Multi range Real-time depth inference from a monocular stabilized footage using a Fully Convolutional Neural Network

in ECMR 2017

Full Text / Poster / ArXiV / Hal / Cite

Additional Results

Code

Training code is available on Github

Still Box Dataset

The Still Box Dataset has been used to train our network. It is available to download here. It consists in 4 different image sizes. Here is a brief recap of sizes

Image Size	number of scenes	total size (GB)	compressed size (GB)
64x64	80K	19	9.8
128x128	16K	12	7.1
256x256	3.2K	8.5	5
512x512	3.2K	33	19

Get more information on the official website : https://stillbox.ensta.fr/

Citation

If you use DepthNet in your research, please add the following references.

@Article{depthnet_uavg,
    AUTHOR = {Pinard, Cl{\'e}ment and Chevalley, Laure and
              Manzanera, Antoine and Filliat, David},
    TITLE = {end-to-end depth from motion with
             stabilized monocular videos},
    JOURNAL = {ISPRS Annals of Photogrammetry,
               Remote Sensing and Spatial Information Sciences},
    VOLUME = {IV-2/W3},
    YEAR = {2017},
    PAGES = {67--74},
    DOI = {10.5194/isprs-annals-IV-2-W3-67-2017}
  }

@inproceedings{depthnet_ecmr,
    TITLE = {{Multi range Real-time depth inference from a monocular stabilized
              footage using a Fully Convolutional Neural Network}},
    AUTHOR = {Pinard, Cl{\'e}ment and Chevalley, Laure
              and Manzanera, Antoine and Filliat, David},
    URL = {https://hal.archives-ouvertes.fr/hal-01587658},
    BOOKTITLE = {{European Conference on Mobile Robotics}},
    ADDRESS = {Paris, France},
    ORGANIZATION = {{ENSTA ParisTech}},
    YEAR = {2017},
    MONTH = Sep,
    KEYWORDS = {Deep CNN ;  HDR ;  Drone ;  Depth},
    PDF = {https://hal.archives-ouvertes.fr/hal-01587658/file/Article%20ECMR.pdf},
    HAL_ID = {hal-01587658},
    HAL_VERSION = {v1}
  }

Published Sep 10, 2017