Login | Register

Excavator Pose Estimation for Safety Monitoring by Fusing Computer Vision and RTLS Data


Excavator Pose Estimation for Safety Monitoring by Fusing Computer Vision and RTLS Data

Soltani, Mohammad Mostafa (2017) Excavator Pose Estimation for Safety Monitoring by Fusing Computer Vision and RTLS Data. PhD thesis, Concordia University.

[thumbnail of Soltani_PhD_S2018.pdf]
Text (application/pdf)
Soltani_PhD_S2018.pdf - Accepted Version
Available under License Spectrum Terms of Access.


The construction industry is considered as a hazardous industry because of its high number of accidents and fatality rates. Safety is one of the main requirements on construction sites since an insecure site drops the morale of the workers, which can also result in lower productivity. To address safety issues, many proactive methods have been introduced by researchers and equipment manufacturers. Studying these methods shows that most of them are using radio-based technologies that perform based on the locations of the attached sensors to the moving objects, which could be expensive and impractical for the large fleet of available construction equipment. Safety monitoring is a sensitive task and avoiding collisions requires a detailed information of the articulated equipment (e.g. excavators) and the motion of each part of that equipment. Therefore, it is necessary to install the location sensors on each moving part of the equipment for estimating its pose, which is a difficult, time consuming, and expensive task. On the other hand, the application of Computer Vision (CV) techniques is growing and becoming more practical and affordable. However, most of the available CV-based techniques evaluate the proximity of the resources by considering each object as a single point regardless of its shape and pose. Moreover, the process of manually collecting and annotating a large image dataset of different pieces of equipment is one of the most time consuming tasks. Furthermore, relying on a single source of data may not only decrease the accuracy of the pose estimation system because of missing data or calculation errors, but it may also increase the computation time. Moreover, when there are multiple objects and equipment in the field of view of each camera, CV-based algorithms are under a higher risk of false recognition of the equipment and their parts. Therefore, fusing the cameras’ data with data from Real-Time Location System (RTLS) can help the pose estimation system by limiting the search area for the parts’ detectors, and consequently reducing the processing time and improving the accuracy by reducing the false detections.
This research aims to estimate the excavator pose by fusing CV and RTLS data for safety monitoring and has the following objectives: (1) improving the CV training by developing a method to automatically generate and annotate around-view synthetic images of equipment and their parts using the 3D model of the equipment and the real images of the construction sites as background; (2) developing a guideline for applying stereo vision system in construction sites using regular surveillance cameras with long baseline at a high level; (3) improving the accuracy and speed of CV detection by fusing RTLS data with cameras’ data; and (4) estimating the 3D pose of the equipment for detecting potential collisions based on a pair of Two Dimensional (2D) skeletons of the parts from the views of two cameras.
To support these objectives, a comprehensive database of the synthetic images of the excavator and its parts are generated, and multiple detectors from multiple views are trained for each part of the excavator using the image database. Moreover, the RTLS data, providing the location of the equipment, are linked with the corresponding video frames from two cameras to fuse the location data with the video data. Knowing the overall size of the equipment and its location provided by the RTLS system, a virtual cylinder defined around the equipment is projected on the video frames to limit the search scope of the object detection algorithm within the projected cylinder, resulting in a faster processing time and higher detection accuracy. Additionally, knowing the equipment ID assigned to each RTLS device and the cameras’ locations and heights, it is possible to select the suitable detectors for each equipment. After detecting a part, the background of the detected bounding box are removed to estimate the location and orientation of each part. The final skeleton of the excavator is derived by connecting the start and end points of the parts to their adjacent parts knowing the kinematic information of the excavator. Estimating the skeleton of the excavator from each camera view on one hand, and knowing the extrinsic and intrinsic parameters of all available cameras on the construction site, on the other hand, are used for estimating the 3D pose by triangulating the estimated skeleton from each camera. In order to use the available collision avoidance systems, the 3D pose of the excavator is sent to the game environment and the potential collisions are detected followed by generating a warning.
The contributions of this research are: (1) developing a method for creating and annotating the synthetic images of the construction equipment and their parts using the equipment 3D models and the real images of the construction sites; (2) creating and training the HOG-based excavator’s parts detectors using the database of the synthetic images developed earlier and automatically produced negative samples from the other excavator parts in addition to the real images of different construction sites while the target object is cut from these; (3) developing a data fusion framework after calibrating two regular surveillance cameras with the long baseline to integrate the RTLS data received from GPS with the video data from the cameras to decrease the processing efforts for detecting excavator parts while increasing the detection accuracy by limiting the search scope for the detectors; (4) developing a clustering technique to subtract parts’ background and extracting the 2D skeleton of the excavator in each camera’s view and to estimate the 3D pose of the excavator; and (5) transferring the 3D pose data of the excavator to the game environment using TCP/IP connection and visualizing the near real-time pose of the excavator in the game engine for detecting the potential collisions.

Divisions:Concordia University > Gina Cody School of Engineering and Computer Science > Building, Civil and Environmental Engineering
Item Type:Thesis (PhD)
Authors:Soltani, Mohammad Mostafa
Institution:Concordia University
Degree Name:Ph. D.
Program:Building Engineering
Date:23 October 2017
Thesis Supervisor(s):Hammad, Amin and Zhu, Zhenhua
ID Code:983390
Deposited On:05 Jun 2018 14:52
Last Modified:05 Jun 2018 14:52
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Research related to the current document (at the CORE website)
- Research related to the current document (at the CORE website)
Back to top Back to top