ĐSTANBUL TECHNICAL UNIVERSITY  INSTITUTE OF SCIENCE AND TECHNOLOGY M.Sc. Thesis by Burcu AYTEKĐN Department : Mechatronics Engineering Programme : Mechatronics Engineering JANUARY 2009 CAMERA BASED VEHICLE DETECTION AND TRACKING ĐSTANBUL TECHNICAL UNIVERSITY  INSTITUTE OF SCIENCE AND TECHNOLOGY M.Sc. Thesis by Burcu AYTEKĐN (518051006) Date of submission : 25 December 2008 Date of defence examination: 20 January 2009 Supervisor (Chairman) : Assis. Prof. Dr. Erdinç ALTUĞ (ITU) Members of the Examining Committee : Prof. Dr. Levent GÜVENÇ (ITU) Assis. Prof. Dr. Tankut ACARMAN (GSU) JANUARY 2009 CAMERA BASED VEHICLE DETECTION AND TRACKING OCAK 2009 ĐSTANBUL TEKNĐK ÜNĐVERSĐTESĐ  FEN BĐLĐMLERĐ ENSTĐTÜSÜ YÜKSEK LĐSANS TEZĐ Burcu AYTEKĐN (518051006) Tezin Enstitüye Verildiği Tarih : 25 Aralık 2008 Tezin Savunulduğu Tarih : 20 Ocak 2009 Tez Danışmanı : Yrd. Doç. Dr. Erdinç ALTUĞ (ĐTÜ) Diğer Jüri Üyeleri : Prof. Dr. Levent GÜVENÇ (ĐTÜ) Yrd. Doç. Dr. Tankut ACARMAN (GSU) BĐLGĐSAYARLI GÖRÜ TEMELLĐ ARAÇ BELĐRLEME VE TAKĐBĐ ii FOREWORD I would like to thank my advisor, Assis. Prof. Dr. Erdinç ALTUĞ, for his guidance and support during my M.Sc. studies. This work has been supported by ITU Mekar Mechatronics Research Labs and the Automotive Control and Mechatronics Research Center directed by Prof. Dr. Levent GÜVENÇ. I also would like to thank Prof. Dr. Levent GÜVENÇ for giving me the opportunity to work with him and to help from his broad vision. This work is the end of a period for me. With every end, feeling of excitement for a new beginning and a little fear or maybe a lot due to unknown future must be inevitable. However, there is only one thing I know quite well is that I have a family that is right behind me wheresoever I will step and is that the essence infused into me by them will always make me be a good person. I would like to thank my mother and father, the gift of life to me, Asiye and Mustafa AYTEKĐN; my elder brother, Dr. Murat AYTEKĐN and my one and only sister, Burçak AYTEKĐN. They are the other side of my soul. January 2009 Burcu AYTEKĐN Mechanical Engineer iii iv TABLE OF CONTENTS Page ABBREVIATIONS ................................................................................................... vi LIST OF FIGURES ................................................................................................viii LIST OF SYMBOLS ................................................................................................. x SUMMARY ..............................................................................................................xii ÖZET........................................................................................................................ xiv 1. INTRODUCTION.................................................................................................. 1 1.1 Purpose of the Thesis ......................................................................................... 2 1.2 Background of Vision-Based Intelligent Vehicle Research............................... 4 1.3 Thesis Structure.................................................................................................. 6 2. VEHICLE DETECTION ...................................................................................... 7 2.1 Approaches Proposed in Literature .................................................................... 7 2.1.1 Knowledge-based methods ......................................................................... 7 2.1.1.1 Symmetry ................................................................................................. 7 2.1.1.2 Color......................................................................................................... 8 2.1.1.3 Vertical/ horizontal edges ........................................................................ 8 2.1.1.4 Texture ................................................................................................. 8 2.1.1.5 Vehicle lights ....................................................................................... 8 2.1.2 Stereo-based Methods ................................................................................ 9 2.1.3 Motion-based Methods .............................................................................. 9 2.2 Critique of Vehicle Detection Approaches ........................................................ 9 2.2.1 The first step: hypothesis generation........................................................ 10 2.2.2 The second step: hypothesis verification ................................................. 11 2.3 Objective .......................................................................................................... 11 2.4 The Implemented Methods For Vehicle Detection Within the Thesis............. 12 2.4.1 Road area finding ..................................................................................... 13 2.4.1.1 Hough transform ................................................................................ 13 2.4.1.2 Lane detection .................................................................................... 15 2.4.2 Vehicle detection...................................................................................... 20 2.4.2.1 Hypothesis generation – shadow detection ........................................ 20 2.4.2.2 Hypothesis verification – vertical edges detection............................. 25 3. VEHICLE TRACKING ...................................................................................... 27 3.1 Literature Overview of Object Tracking.......................................................... 27 3.2 Problem Conditions.......................................................................................... 29 v Page 3.3 Objective .......................................................................................................... 31 3.4 The Theory of the Kalman Filter...................................................................... 32 3.4.1 The process to be estimated ..................................................................... 32 3.4.2 The computational origins of the filter .................................................... 34 3.4.3 The probabilistic origins of the filter ....................................................... 35 3.4.4 The summary of the discrete kalman filter algorithm.............................. 35 3.5 Dynamical System Formulation of the Implemented Vehicle Tracking ... 36 3.5.1 The initialization of the kalman filter....................................................... 40 3.6 The Implemented Algorithm............................................................................ 43 3.6.1 To update the filter: horizontal and vertical edges detection ................... 44 4. CONCLUSION AND RECOMMENDATIONS............................................... 47 REFERENCES......................................................................................................... 53 APPENDICES .......................................................................................................... 57 CURRICULUM VITA ............................................................................................ 65 vi ABBREVIATIONS ACC : Adaptive Cruise Control DAS : Driving Assistance Systems ITS : Intelligent Transportation Systems ROI : Region-of-interest fps : Frames per second vii viii LIST OF FIGURES Page Figure 1.1 : Schematic overview of the objective of the thesis. ....................... 6 Figure 2.1 : Basler A601FC color camera. ..................................................... 12 Figure 2.2 : The theory of the Hough transform. ............................................ 14 Figure 2.3 : (a) Detected lines in the left half (320 x 240) part of the image.. 16 (b) Detected lines in the right half part of the image. ................. 16 Figure 2.4 : Two longitudinal edges that can be described as the transition from darker gray values to brighter ones or the transition from brighter gray values to darker ones. ................................... 17 Figure 2.5 : (a) The original half image;......................................................... 17 (b) The filtered half image by the mask [-1 0 1]......................... 17 Figure 2.6 : Detected lines on the same lane line............................................ 18 Figure 2.7 : The output of the algorithm for the left half part of the image: Left-most line and Left line......................................................... 18 Figure 2.8 : (a) Road area identification; (b) Besides scanning each lane independently, it is also possible to group the lane lines that can be detected in the current frame. ................................... 19 Figure 2.9 : Detected shadows. ....................................................................... 21 Figure 2.10 : Successive shadow edges relating to the same vehicle. .............. 22 Figure 2.11 : (a) The edges that could not be eliminated in the combining process......................................................................................... 24 (b) An example of false hypotheses can also be seen at close range............................................................................................ 24 Figure 2.12 : Defining the region-of-interest (ROI). ........................................ 25 Figure 3.1 : (a) Tracking the object without position prediction might be successful; (b) Tracking without position prediction will fail. ... 30 Figure 3.2 : Signal flow representation of a linear, discrete-time dynamical system. ....................................................................... 33 Figure 3.3 : A complete description of the operation of the Kalman filter..... 37 Figure 3.4 : The description of the bounding box and the control points. ...... 38 Figure 3.5 : Assumed probability distribution of the acceleration u............... 42 Figure 4.1 : The summary of the detection algorithm..................................... 48 Figure 4.2 : The flow chart of the implemented algorithms. .......................... 50 Figure 4.3 : Low sun from the side makes that vehicles cast long shadows... 51 Figure A.1 : Detection and tracking of mid-range and distant vehicles. ......... 59 Figure A.1 (contd.) : Detection and tracking of mid-range and distant vehicles............................................................................ 60 Figure A.2 : Detection and tracking of the vehicles at close range. ................ 61 Figure A.2 (contd.) : Detection and tracking of the vehicles at close range. .... 62 Figure A.3 : Detection and tracking of the vehicle in the situation where an overpass occurs shadow areas on the road. ................. 63 ix Page Figure A.3 (contd.) : Detection and tracking of the vehicle in the situation where an overpass occurs shadow areas on the road. ..... 64 x LIST OF SYMBOLS xk : The state vector. Rk : The measurement noise covariance matrix. Qk-1 : The process noise covariance matrix. Pk : The error covariance matrix. Kk : The Kalman filter gain. Hk : The measurement matrix. Φk-1 : The transition matrix. wk-1 : The uncertainty in the process. µk : The uncertainty in the measurement. xi xii CAMERA-BASED VEHICLE DETECTION AND TRACKING SUMMARY In recent years, developing on-board driver assistance systems (DAS) aiming to alert drivers about driving environments, and possible collision with other vehicles is becoming active research area among automotive industries, suppliers and universities. In these systems, robust and reliable vehicle detection and tracking are the basic steps. These basic steps could be accomplished by one or multiple sensors such as optical and radar sensors, etc. Vision-based vehicle detection and tracking for intelligent driver assistance has received considerable attention over the last 15 years. There are at least three reasons for this attention: 1. The startling losses both in human lives and finance caused by accident severity, 2. The growth in technologies within the last 30 years of computer vision research, 3. The exponential growth in processor speeds that makes possible running computation-intensive video-processing algorithms. With the ultimate goal of building autonomous vehicles for reducing accidents caused by the main threats of driver inattention, various projects have been launched worldwide. Monocular vision based vehicle detection and tracking systems are particularly interesting for their low cost and the high-fidelity information they provide about the driving environment. The work presented within this master thesis purposed to study computer vision algorithms for automatic vehicle detection and tracking in monochrome images captured by mono camera. The work has mainly been focused on detecting and tracking vehicles viewed from behind in daylight conditions. The method presented within the thesis includes road area finding which has been implemented by a lane detection algorithm to avoid false detection of vehicles caused by the distraction of background objects. Assuming that lanes are successfully detected, vehicle presence inside the road area is hypothesized by using “shadow” as a cue. Hypothesized vehicle locations are verified using “vertical edges” and “shadow” is also used for verification. After extracting vehicles, the algorithm effectively track them during successive image frames in a long image sequence using a Kalman filter based tracking algorithm. The 2D-vehicle velocity provided by the algorithms implemented within the thesis will be used to estimate parameters of the (3D) real-world motion of vehicles relative to the host vehicle with the aim of forward collision warning as a future work. xiii xiv BĐLGĐSAYARLI GÖRÜ TEMELLĐ ARAÇ BELĐRLEME VE TAKĐBĐ ÖZET Sürücüyü, sürüş koşulları ve çarpışma olasılığına karşı uyaran araç içi sürücü yardım sistemlerinin geliştirilmesi; otomotiv endüstrisi, yan sanayi ve üniversiteler arasında giderek yaygınlaşan bir uygulama alanı bulmaktadır. Bu sistemlerin temelini, dayanıklı ve güvenilir bir şekilde gerçekleştirilmesi amaçlanan araç algılama ve takibi çalışmaları oluşturmaktadır. Araç algılama ve takibi, optik ya da radar algılayıcılar gibi bir ya da çoklu algılayıcılar üzerine temellendirilmiş sistemler ile gerçekleştirilmektedir. Sürücü yardım sistemlerinin geliştirilmesi sürecinde; görü-tabanlı araç algılama ve takibi üzerine, son 15 yıldır, ciddi bir eğilim söz konusudur. Görü-tabanlı araç algılama ve takibi çalışmalarına olan eğilimin başlıca üç sebebi; 1. Giderek artan trafik kazalarının sebeb olduğu hayati kayıpların ve devlet ekonomisine getirdiği zararın endişe verici boyutlara ulaşması, 2. Bilgisayarla görü araştırmalarının son 30 yılı içerisinde teknolojide meydana gelen büyüme, 3. Đşlemci hızının giderek artması sonucu, işlem hızının öncelik taşıdığı video- işleme algoritmalarının çalışmasının mümkün kılınmasıdır. Sürücünün dikkatsizliği, yorgunluğu gibi sürücü kaynaklı etmenlerin sebeb olduğu kazaları azaltmak amacıyla nihai amacı sürücüden bağımsız – otonom araçlar gerçekleştirmek olan pek çok proje, tüm dünyada, uygulama alanı bulmuştur. Tekgözlü imgeleme olarak tabir edilen tek kamera ile gerçekleştirilen görü-tabanlı araç algılama ve takibi, düşük maliyeti ve yüksek kalitede veri sağlaması sebebiyle bilhassa ilgi görmektedir. Bu dokümanda bahsi geçen yüksek lisans tezi kapsamında sunulan çalışmada, tek kamera aracılığıyla toplanan gri seviye görüntüler içerisinde araç algılama ve takibi amaçlanmıştır. Sunulan çalışmada, temel olarak, araçların arka görünümleri algılanmaya ve sonrasında takip edilmeye çalışılmıştır. Đşlenen görüntüler, gün içi saatlerine dairdir. Geliştirilen algoritmalar, gece görüntüleri için tasarlanmamıştır. Tez kapsamında sunulan uygulamada; görüntünün arka planında yer alan araç dışı nesnelerin, algılama sürecinde hatalara sebeb olmaması için doğrudan kameranın önünde gözlemlenen yol yüzeyi, bir şerit algılama algoritması aracılığıyla, belirlenmektedir. Şeritlerin güvenilir bir şekilde algılandıkları varsayılarak, araçların altında oluşan gölgelerin ayırt edici özellik olarak kullanımıyla, belirlenen yol yüzeyinde, muhtemel araçların konumları kestirilir. xv Kestirilen araç konumlarının doğruluğu, dikey kenarlar ve yine araç altında oluşan gölgenin ayırt edici özellik olarak kullanımıyla tetkik edilir. Araç algılama sürecinin tamamlanması sonrasında, algılanan araçların takibi (ardışık görüntüler boyunca araçların konum değişikliklerinin tayini), Kalman filtresi temelli bir algoritma aracılığıyla, ardışık görüntüler boyunca gerçekleştirilir. Tez kapsamında uygulanan algoritmalar, iki boyutlu görüntü düzleminde, araç hızının belirlenmesini sağlamaktadır. Nihai amaç; yoldaki diğer araçların, kameranın bulunduğu araca göre üç boyutlu bağıl mesafe ve hızlarının tayinidir. Üç boyutlu bağıl hız ve mesafe tayini, araçların yer koordinat sistemindeki gerçek hareketlerini belirlemektedir. Dolayısıyla, tehdit oluşturabilecek araçlara karşı sürücülerin uyarılmasını sağlayacak sistemlerin geliştirilmesi mümkün olabilecektir. 1 1. INTRODUCTION Since the first vehicle which moved by its own power was build in Paris in the 18th century, technological and social developments led to today’s dominant place of vehicles, trucks and busses in modern society. Since then, we have constantly been confronted with negative consequences of vehicles. By means of rules, infra- structure, road and car design these negative consequences were tried to be controlled. In attempt to reduce the numbers of vehicles on the road, vehicle-related taxes were introduced and increased and alternative means of transportation were promoted. Nowadays every minute, on average, at least one person dies in a vehicle accident and at least 10 million people are injured each year, two or three million of them seriously. Losses in finance caused by vehicle accidents are also very challenging. This situation requires new solutions. Intelligent Transportation Systems (ITS) provides a modern, more drastic attempt to vehicle related problems we are facing today. By means of (partially) automating driver tasks and by means of communication (vehicle-to-vehicle as well as roadside-to-vehicle) ITS aims to: 1. Increase the capacity of highways: higher speed, closer spacing, less human errors 2. Improve safety: warning systems, intelligent speed adaptation, less human errors 3. Reduce fuel consumption: optimal speed, optimal acceleration, reduced drag force (platooning), cost reduction 4. Reduce pollution: as a direct consequence of first and third item. 2 Researches within ITS can be classified as “road-side intelligence” and “in-car intelligence”. Road-side intelligence systems provide more global information about driving environment or destination such as systems that report about traffic flow, accidents and highway maintenance, dynamic navigation systems or systems that provide parking space information. In-car intelligence systems consider the environment immediately around the vehicle. These systems can be ordered according to the level of autonomy of the vehicle. First the “advisory” and “warning” systems can be identified within this class of intelligence systems. Examples are systems for blind spot monitoring, collision warning, pedestrian warning, lane-departure warning, traffic sign recognition and driver monitoring. Next “driver-assistance systems” can also be identified within this class of intelligence systems. Typical example for this kind of systems is adaptive cruise control. Today’s implementations mainly concern precrash sensing. Several national and international projects have been realized over the past several years to investigate new technologies for improving safety. Developing on-board driver assistance systems aiming to alert drivers about driving environment and possible collision has attracted a lot of attention and is becoming an active research area among automotive industries, suppliers and universities. Vehicle detection and tracking is the first step of these systems and this thesis addresses a fundamental aspect for in-car intelligence systems. 1.1 Purpose of the Thesis Determining the position of other vehicles on the road and their motion relative to your own vehicle is an essential task to develop driver assistance systems like adaptive cruise control (ACC) and platooning. The most important vehicle a driver should pay attention to is the preceding one, to which a security distance should be kept. For this reason, an autonomous system capable of understanding what the position of the preceding vehicle is would be very useful to increase driver’s safety. 3 The problem can be addressed by using “direct range” sensors which include millimeter wave radars, laser radars (lidar) and stereo imaging as many researchers have done. Although radar and laser sensors measure distance to obstacles with a high degree of accuracy, obtaining their lateral positions required for estimating the possibilities of collision is difficult. Since vision is the most important sense used by humans for driving and optical sensors are passive and cheaper, another option is applying computer vision techniques. On the other hand, it is expected that optical sensors, such as normal cameras, should estimate both lateral positions of obstacles and their shape. As opposed to a stereo imaging design that is including the cost of the additional camera and processing power, a monocular visual processing system is easier to mass produce and costs less as an end product. No 3D information about the position of other vehicles is directly available using a monocular camera. But studies to investigate the possibility of performing distance control, to an sufficient accuracy level, by a monocular imaging device (a single camera) using the laws of perspective and putting some constraints such as assuming a flat road have been realized. To estimate parameters of the (3D) real-world motion of other vehicles on the road relative to your own vehicle using vision requires providing 2D-image velocity. The vehicle displacements in the image plane between successive image frames must be computed. In literature, this problem is generally addressed in two steps: vehicle detection and vehicle tracking. These steps are the basis of estimating positions of vehicles present in the scene and their relative motion. This thesis focuses on vision-based on-road vehicle detection and tracking in monochrome (i.e., grayscale) images from a mono camera mounted on the rear-view mirror of the vehicle. All algorithms are implemented in MATLAB and tested on data supplied by the experimental vehicle used for multi-modal data collection and processing within the Drive Safe Project in which Đstanbul Technical University Automotive Control and Mechatronics Research Center is a participant. 4 1.2 Background of Vision-Based Intelligent Vehicle Research A large number of government institutions, automotive industries and suppliers, and R&D companies have launched various projects worldwide. These attempts have produced several prototypes and solutions, based on rather different approaches [1- 4]. Looking at research on intelligent vehicles worldwide, Europe pioneers the research, followed by Japan and United States. In Europe, The PROMETHEUS project (Program for European Traffic with Highest Efficiency and Unprecedented Safety) started this exploration in 1986. A large number of vehicle manufactures and research institutes from 19 European countries were involved. Several prototype vehicles and systems were designed as a result of the project. In 1987, the UBM (Universitaet der Bundeswehr Munich) experimental vehicle VaMoRs demonstrated fully autonomous longitudinal and lateral vehicle guidance by computer vision on a 20 km free section of highway at speed up to 96 km/h. Vision was utilized to provide input for both lateral and longitudinal control. That was the first milestone. Within the PROMETHEUS project, the Institute of Measurement Science has developed real-time vision technology that may be used for a driver support system [5]. Freeways were chosen as the principal domain for testing and demonstrating the visual recognition of objects that are relevant for the understanding of traffic situations. The reason for choosing freeways is that the complexity of the traffic situations and the variety of objects are much lower on freeways than on other roads. Long range autonomous driving has been realized by the VaMP of UBM in 1995. The trip was more than 1,600 km [6]. Another experimental vehicle, mobile laboratory (MOB-LAB) was also part of the PROMETHEUS project [7]. It was equipped with four cameras, several computers, monitors and a control-panel to give a visual feedback and warnings to the driver. One of the most important subsystems in the MOB-LAB was the Generic Obstacle and Lane Detection (GOLD) system. The GOLD system addressed both lane and obstacle detection utilizing a stereo rig. The GOLD system has been ported on ARGO, a Lancia Thema passenger car with automatic steering capabilities [8]. 5 In Japan, MITI, Nissan and Fujitsu pioneered the research by the project “Personal Vehicle System” [9]. In 1996, the Advanced Cruise-Assist Highway System Research Association (AHSRA) was established among automobile industries and many research centers [1]. The Japanese Smartway concept car will implement some driver assistance features, such as, lane keeping, intersection collision avoidance, and pedestrian detection. A model deployment project was planned to be operational by 2003 and national deployment in 2015 [2]. In the United States, many initiatives have been launched about this problem. The US government established the National Automated Highway System Consortium (NAHSC) in 1995. Several promising prototype vehicles and systems have been demonstrated within the last 15 years [10]. The Navlab group at Carnegie Mellon University has a long history of investigations of automated vehicles and intelligent driver assistance systems with a series of 11 vehicles, Navlab 1 through Navlab 11. The latest model in Navlab family is the Navlab 11, a robot Jeep Wrangler equipped with a wide variety of sensors for short range and midrange obstacle detection [10- 12]. Major motor companies, such as Ford and GM, have already demonstrated several promising vehicles. Recently, the US Department of Transportation (USDOT) has launched a five year, 35 million dollar project with GM to develop rear-end collision avoidance system [2]. In March 2004 and November 2007, the world was stimulated by the competitions, “grand challenge” and “urban challenge”, organized by the US Defense Advanced Research Projects Agency (DARPA). In these competitions, fully autonomous vehicles attempted to independently navigate within a fixed time period, all with no human intervention whatsoever – no driver, no remote-control, just pure computer processing and navigation horsepower. 6 Figure 1.1 : Schematic overview of the objective of the thesis. 1.3 Thesis Structure This thesis is organized as follows: Chapter 2 explains the approaches to the vehicle detection that have been proposed in the literature and the algorithms developed for the vehicle detection within the work of the thesis, which includes road area finding. In Chapter 3, the literature overview based on the object tracking is presented. In addition, the theory of the Kalman filter is mentioned and the implemented algorithm for the vehicle tracking based on the Kalman filter is explained in detail. Finally, Chapter 4 sums up the conclusions and presents the results of the evaluation of the developed algorithms. 7 2. VEHICLE DETECTION From a general viewpoint vehicle detection is a problem of object detection, which is always an open issue in computer vision. Vision based vehicle detection requires a system that should be able to separate image data belonging to the background from the data belonging to the vehicles. Detection precedes the vehicle tracking. 2.1 Approaches Proposed in Literature Various approaches have been proposed in the literature, which can be classified into one of the following three categories: 1) knowledge-based, 2) stereo-based, and 3) motion-based. 2.1.1 Knowledge-based methods The Knowledge-based methods employ a priori information to extract vehicles. Different cues have been proposed in the literature and systems often include two or more of these cues to make detection more reliable.♣ 2.1.1.1 Symmetry Images of vehicles observed from rear or frontal views are in general symmetrical in the horizontal and vertical directions. This observation has been used as a cue in several studies [13, 14]. When computing symmetry from intensity, the presence of uniform areas decreases the performance of the algorithm because these areas are sensitive to noise for symmetry estimations. Information about edges was included in the symmetry estimation to avoid from uniform areas [15]. Besides the fact that edges might not always be visible (object-background relation), this approach is still easily distracted by symmetrical background objects, such as houses. ♣ “Shadow” is also a cue within the “knowledge-based methods” used for vehicle detection. Using shadow as a cue for vehicle detection will be discussed in the following sections. 8 2.1.1.2 Color Although color is a rare feature in literature, it is a very useful cue for obstacle detection, lane/road following [16- 18]. Color is liable for false detections and weak for non-colored vehicles. It can help in some situations anyway. 2.1.1.3 Vertical/ horizontal edges Using constellations of vertical and horizontal line structures is one of the strongest cues used in literature for vehicle detection. This is because of the fact that different views of a vehicle contain many horizontal and vertical line structures, such as rear window, bumper, etc. In [19], the generalized Hough transform was used to identify rows and columns that might contain edges of the outer contour of a car. In [20], distant cars were identified by using projected edge information to extract pronounced horizontal and vertical edges, that might be part of a rectangular structure. Disadvantage of using these line structures is that they depend on the relation between object and background intensity and therefore the performance of the algorithm will decrease when e.g. a dark vehicle is observed against a dark background. 2.1.1.4 Texture The presence of a vehicle in an image causes local intensity fluctuations. Due to general similarities among all vehicles, the intensity changes create a certain texture pattern [21]. Two approaches have been suggested in the literature: 1) using the entropy and 2) using the co-occurrence matrices [22]. Major difficulty of using texture as a cue for vehicle detection is that the background is also very likely to have texture. 2.1.1.5 Vehicle lights Vehicle lights could be used as a salient visual feature for night time vehicle detection [23]. However, the vehicle light detection approach should only be seen as a complement to other approaches. Brighter illumination and the fact that vehicle lights are not compulsory to use during daytime in many countries makes it unsuitable for robust vehicle detection. 9 2.1.2 Stereo-based methods Vehicle detection based on stereo vision uses two types of methods: the disparity map and Inverse Perspective Mapping. The difference in left and right images between corresponding pixels is called as disparity. The disparities of all the image points generates the disparity-map. A disparity histogram can be calculated from the disparity map. Since the rear-view of a vehicle is a vertical surface, and the points on the surface therefore are at the same distance from the camera, a peak in the histogram should occur [24]. The Inverse Perspective Mapping transforms an image point onto a horizontal plane in the 3D space. In [25], stereo vision was used to predict the image seen from the right camera, given the left image, using the Inverse Perspective Mapping. Drawbacks of using stereo-vision are that traditional implementations are time consuming and robust solutions for the vehicle detection problem can only be obtained, if the camera parameters have been estimated accurately. 2.1.3 Motion-based methods So far, clues based on spatial features to distinguish between vehicles and background were discussed. Another important cue for vehicle detection is relative motion. Pixels on the images appear to be moving due to the relative motion between the sensor and the scene. The vector field of this motion is referred to as optical flow. Examples of approaches based on the estimation of the optical flow field can be investigated in [26, 27]. In [26], the possibilities and drawbacks of using optical flow for vehicle detection were discussed. Optical flow can provide strong information for vehicle detection but it is sensitive to even small rotations of the camera and other mechanical disturbances and computing optical flow is time consuming because of the complexity. 2.2 Critique of Vehicle Detection Approaches All the cues discussed within “the knowledge-based methods” use spatial features to distinguish between vehicles and background. Remember that the major difficulties of using the cues within this category are caused by the background since the background is also likely to have these features. 10 On the other hand, on-road vehicle detection requires faster processing than other applications related to optical sensors. Another key issue is that robustness to vehicle’s movements and drifts must be considered. Remember that these two issues are the major difficulties of using the cues within “the stereo-based” and “the motion-based” approaches. Consequently, different approaches to vehicle detection have been proposed in the literature as mentioned in the previous texts. Creating a robust system for vehicle detection using optical sensors is a very challenging problem. Special difficulties that make vehicle detection a challenge can be itemized as: 1. Since both camera and objects are in movement, the perceived size and pose of the objects change; 2. The objects exist in an environment that changes. Lighting and weather conditions vary substantially; 3. Vehicles might be occluded by other vehicles, buildings, etc; 4. The actual aspect of vehicles is quite wide; 5. For a precrash system to serve its purpose it is crucial to achieve real-time performance. To cope with these difficulties, approaches in the literature are generally based on two-step vehicle detection: Hypothesis Generation and Hypothesis Verification.♣ 2.2.1 The first step: hypothesis generation In the first step of vehicle detection, a vehicle’s probable existence location is hypothesized. One or multiple cues are used within this step. Hypothesizing the locations of possible vehicles in the first step of vehicle detection decreases the whole image where vehicles are searched into the image regions where the vehicles probably exist. This decrease in the size of the image requires less processing time and therefore speeds up the process. ♣ Most of the information about the vehicle detection approaches in the literature was quoted from [28]. More detailed information about the vehicle detection approaches in the literature can be found in [28]. 11 2.2.2 The second step: hypothesis verification The existence of the located potential vehicles is verified in the second step of vehicle detection. The cues discussed within “the knowledge-based methods” can be used for the verification step. This kind of verification is generally called as “knowledge-based vehicle verification” or “template-based vehicle verification”. Another category of the verification step can be called as “appearance-based vehicle verification”. Appearance-based methods learn the characteristics of the vehicle class from a set of training images, which should capture the variability in vehicle appearance. The verification using appearance models is treated as a two-class pattern classification problem: vehicle versus non vehicle. Usually, the variability of the non vehicle class is also modeled to improve the performance. Appearance-based verification methods are more accurate than template-based methods; however, they are more costly due to classifier training. Nevertheless, due to the exponential growth in processor speed, appearance-based methods are getting popular. 2.3 Objective Although the solutions to the vehicle detection problem are becoming more reliable and robust improving presented approaches and proposing new methods day by day, it is absolutely necessary to strictly define and delimit the problem due to the difficulties in conditions just mentioned in the previous texts. Detecting all vehicles in every possible situation is not realistic. The work in this thesis concerns with detecting trucks and busses as well as focusing largely on detecting personal vehicles. Detection under night illumination is not evaluated. The designed algorithms are tried to be improved to detect vehicles in various weather conditions and at any distance. 12 2.4 The Implemented Methods for Vehicle Detection within the Thesis Template-based verification is used within the thesis in spite of all these advantages attached to the appearance-based verification. The reason is that appearance-based verification requires composing a training dataset and pattern classification background. Providing these requirements may have been a tough process. Implementing appearance-based vehicle verification is one of the future works planned to realize with the aim of improving the quality of the vehicle detection algorithm. In practical applications within the literature, although it is possible to get rid of about two thirds of the image regions in which no vehicle exist using template-based verification, some backgrounds may still cause false detections. To avoid false detections of background, the method implemented within the thesis includes road area finding and searches possible vehicles inside this area. The implemented algorithms for vehicle detection within the thesis can be classified as; 1. Road area finding: Lane detection, 2. Vehicle detection: 2.1. Hypothesis generation: Shadow detection 2.2. Hypothesis verification: Vertical edges detection. The optical sensor used for image data acquisition is Basler A601FC color camera as shown in Figure 2.1. The resolution of the camera is 640 x 480 pixels and the frame rate is 30 frames per second (fps). The interface is IEEE 1394 high performance serial bus, also called as Firewire. Figure 2.1 : Basler A601FC color camera. 13 All algorithms are implemented in MATLAB and monochrome images acquired from just one camera are processed within the thesis. The vision data is supplied by the experimental vehicle used for multi-modal data collection and processing within the Drive Safe Project in which Đstanbul Technical University Automotive Control and Mechatronics Research Center is a participant. More detailed information on the Drive Safe Project can be found in [29, 30]. 2.4.1 Road area finding Finding road area is realized by means of a simple algorithm for detecting the free- driving-space of our vehicle – the host vehicle. The free-driving-space is defined as the road observed directly in front of the camera. Estimation of the free-driving- space is based on the lane detection algorithm implemented by Hough transform. 2.4.1.1 Hough transform Edge detection methods yield pixels lying only on edges. In practice, the resulting pixels seldom characterize an edge completely because of noise, breaks in the edge from nonuniform illumination, and other effects that introduce spurious intensity discontinuities. Thus, edge detection algorithms typically are followed by linking procedures to assemble edge pixels into meaningful edges. One approach that can be used to find and link segments in an image is the Hough transform. In particular, it is used to extract lines, circles and ellipses in the images. The Hough transform, illustrated in Figure 2.2, maps every point (x, y) in the image plane to a sinusoidal curve in the Hough space (ρθ - space) according to: ρθθ =+ sincos xy (2.1) where ρ can be interpreted as the perpendicular distance between the origin and a line passing through the point (x, y) and θ the angle between the x-axis and the normal of the same line. 14 Figure 2.2 : The Hough transform transforms a point in the image plane to a sinusoidal curve in the Hough space. All image points on the same line will intersect in a common point in the Hough space [31]. The sinusoidal curves from different points along the same line in the image plane will intersect in the same point in the Hough space, superimposing the value at that point. In the second graphic, the intersection point corresponds to the line that passes through both (x, y) and (u, v). The computational attractiveness of the Hough transform arises from subdividing the ρθ parameter space into so-called accumulator cell. Usually the expected maximum range of the parameters is – 90° ≤ θ ≤ 90° and – D ≤ ρ ≤ D, where D is the distance between corners in the image (the diagonal of the image). Initially the accumulator cell is set to zero. Then for each of the desired feature points (xk, yk) detected in the image plane, we let θ equal each of the predefined values within the θ range and solve for the corresponding ρ using the equation 2.1. The resulting ρ values are then rounded off to the nearest value within the predefined ρ range. The corresponding element A(i, j) of the accumulator cell defined with parameter space coordinates ( ρi, θj ) is then incremented. At the end of this procedure, a value of Q in A(i, j), means that Q points in the xy-plane lie on the line x cos θj + y sin θj = ρi. By thresholding, dominant line segments can be detected.♣ ♣ Most of the information about the Hough transform was quoted from [32] and pages 393-395. More detailed information about the Hough transform can be found in [32]. 15 2.4.1.2 Lane detection Processing the whole image is unnecessary and thus time consuming while realizing lane detection. To focus on the lines that mark the lanes, the image is divided into two half images: Left half and right half as shown in Figure 2.3. The Hough transform is applied for each half part to detect lines. Each lane line has two longitudinal edges that can be described as the transition from darker gray values to brighter ones or the transition from brighter gray values to darker ones in monochrome images as seen in Figure 2.4. Because of that one of these edges is enough to define the lane line, both half parts of the image are filtered by a simple mask such as [1 0 -1] or [-1 0 1] before applying the Hough transform (See Figure 2.5). There are, of course, many detected lines on the same lane line as seen in Figure 2.6. These lines must be reduced to one line as being one line on the lane line. The algorithm is capable of giving two lines with a particular angle difference between them as an output for each half image. These lines are defined as Left-most, Left for the left half part of the image and Right-most, Right for the right half part of the image (as described in Figure 2.7). 16 50 100 150 200 250 300 50 100 150 200 50 100 150 200 250 300 50 100 150 200 Figure 2.3 : (a) Detected lines in the left half (320 x 240) part of the image. (b) Detected lines in the right half part of the image. (a) (b) 17 Figure 2.4 : Two longitudinal edges that can be described as the transition from darker gray values to brighter ones or the transition from brighter gray values to darker ones. 50 100 150 200 250 300 50 100 150 200 50 100 150 200 250 300 50 100 150 200 Figure 2.5 : (a) The original half image; (b) The filtered half image by the mask [-1 0 1]. (a) (b) 18 Figure 2.6 : Many lines are detected on the same lane line. Figure 2.7 : The output of the algorithm for the left half part of the image: Left-most line and Left line. In figure 2.7, the output of the algorithm for the left half part of the image is illustrated. The same approach is also realized for the right half part of the image. The lines GROUP 1 are reduced to one line as giving a “Left-most” line and the lines within GROUP 2 are reduced to one line as giving a “Left” line. It is possible to obtain lines that are irrelevant with lanes. These lines are easily eliminated utilizing the angle value given as an output for each line by the Hough transform. 19 Assuming that lanes have been successfully detected, vehicle presence is hypothesized by scanning each lane starting from the bottom to a certain vehicle position, corresponding to a predefined maximum distance in the real world. In fact, it is difficult to acquire lane information in every frame of a sequence of images. The lane lines may not be easily eligible or may be interrupted by the vehicles. Developing a lane tracking algorithm may be a solution to this problem in some circumstances. Besides scanning each lane independently, it is also possible to group the lane lines that can be detected in the current frame to avoid from undetectable lane lines as seen in Figure 2.8. Figure 2.8 : (a) Road area identification. (b) Besides scanning each lane independently, it is also possible to group the lane lines that can be detected in the current frame. (a) (b) 20 2.4.2 Vehicle detection As mentioned in the previous texts, vehicle detection process is realized in two steps: 1) Hypothesis generation, and 2) Hypothesis verification. In the following parts of chapter 2, feature extraction techniques used as a basic of the vehicle detection process are not explained in detail. Detailed information about basic image processing operations and feature extraction techniques can be found in [32, 33]. 2.4.2.1 Hypothesis generation – shadow detection Vehicles may appear in many shapes and color. Nevertheless, one feature they all have in common is that they cause shadow on the road. Potential vehicle candidates can be extracted by detecting the shadows underneath vehicles. In the literature, potential shaded areas are defined as intensities with a significant darker color than the road. In [34], a normal distribution is assumed for the intensity of the road surface and the threshold value of the shadow is defined based on the mean and variance of this distribution. The mean and deviation of different regions in a road may be different. Hence, this approach might not always hold true. Another approach is based on looking in the image for vertical transitions from brighter gray values to darker ones. Instead of computing the mean of road pixels, pixels with negative vertical gradient values are considered as local darker regions [35]. To detect the shadows underneath vehicles, vertical transitions from brighter gray values to darker ones are searched in the image as scanning the image bottom-up. Considering the problem within this thesis, this approach can be realized implementing an edge detection algorithm as scanning predefined road area bottom- up. 21 The edges with vertical transitions – horizontal edges are obtained by a vertical edge detector. Sobel edge detector is implemented within the thesis and negative vertical gradient values less than a predefined threshold value are considered as local darker regions, as seen in Figure 2.9. A systematic way to choose appropriate threshold values was not developed within the thesis. Beside the fact that the intensity of the shadow depends on illumination of the image, which in turn depends on weather conditions, it is a weakness of the implemented algorithm. The threshold value was determined as an appropriate fixed value for a series of different training samples after testing on them. Figure 2.9 : Detected shadows are plotted as red dots. Shadow is used as an initial cue for vehicle detection within the thesis. Hence, false detections caused by applying a predefined, fixed threshold can be prevented in the following steps of the hypothesis generation as well as in the hypothesis verification. Nevertheless, in the weather conditions that the shadows underneath vehicles can not be distinctly eligible, the predefined threshold value might not be appropriate to detect the shadows underneath vehicles. Therefore, developing a systematic way to choose appropriate threshold values must be consider as a future work within this study. 22 Before implementing the following steps of the hypothesis generation algorithm, a simple preselection is performed. The shadow edges with the length smaller than a predefined pixel value are eliminated. This predefined pixel value can be selected as a value in the range of 10 – 15 pixel. The pixel values in this range are appropriate as potential bottom edges of ROIs (Region-of-Interest) for both mid-range and distant vehicles within further analysis – hypothesis verification step. As seen in Figure 2.10, there are, of course, many shadow edges in successive rows, relating to the same vehicle. These edges must be reduced to one as representing the bottom edge of the potential vehicle. Figure 2.10 : Successive shadow edges relating to the same vehicle. The edges whose the distance between their “y” coordinates is less than or equal to “2” pixels are combined as giving the bottom edge of the potential vehicle. The value like “2” pixel is appropriate for both mid-range and distant vehicles while realizing this combining process. The detected shadow edges underneath a vehicle do not always have the lengths same as or near to the length of the bottom edge of the vehicle. Shadow length changes during different weather conditions and times of the day. In this case, combining the shadow edges whose the lengths are more than a reasonable value requires facing with a critical situation during defining ROIs of the potential vehicles. 23 Defining ROI whose the size is considerably more than the size of the potential vehicle can cause false detections and thus verification errors for further analysis. In such a case, the background or eligible features of the other vehicles might be in the ROI defined for the hypothesized vehicle. Evaluating each lane independently during the hypothesis generation step as solution to the problem described above might provide to obtain more reliable ROIs. However the fact that every lane is not detectable in each frame of an image sequence, grouping the detected lanes given a reasonable road area might be necessary, as mentioned in the previous texts. Besides this, if each lane is evaluated independently, detecting a vehicle while it is changing the lane might not be easy. Consequently, grouping the detected lanes as defining a reasonable road area and evaluating them, in this manner, for the presence of vehicles is realized within the thesis. The problems within the hypothesis generation step are eliminated under these circumstances. The width of a vehicle in an image is related to the width of the lane where the vehicle is currently located. Therefore a reasonable value for the width of the potential vehicle can be determined according to the width of the lane where the vehicle is currently present. Since which lane the potential vehicle is present and the width of the lane where the potential vehicle is currently present are known, it is possible to calculate a value for the width of a potential vehicle according to the lane where it is present. To define ROIs that represent the potential vehicle for further analysis in the best possible way, the calculated value, just mentioned above, is utilized as a reference length for the bottom edge of the vehicle and, consequently, for the width of the potential vehicle. The calculated value for the width of the potential vehicle and the proposed approach to calculate this value is more appropriate for passenger cars. ROIs defined for large size vehicles utilizing the mentioned approach does not sufficiently cover the area of the vehicle. However this is not a critical problem as much as defining ROIs whose the sizes are considerably more than the size of the vehicle, since the defined ROIs for large size vehicles are still have distinctive features for hypothesis verification step even if they do not sufficiently present the related vehicles. 24 In spite of the combining process, there might be still more than one edge over the same vehicle that could not be eliminated, as seen in Figure 2.11. The final step of the hypothesis generation is implemented to reduce these edges to one bottom edge for each hypothesized vehicle. Figure 2.11 : (a) The edges that could not be eliminated in the combining process. (b) An example of false hypotheses can also be seen at close range. Consequently, the final bottom edges that represent each potential vehicle are utilized to determine the width of ROIs for the hypothesis verification step. In the hypothesis verification step, the hypothesized presence of vehicles is verified and false hypotheses (one of the false hypotheses can be seen in Figure 2.11) are eliminated. (a) (b) 25 2.4.2.2 Hypothesis verification – vertical edges detection Potential vehicles can be detected and located using shadow as discussed in the hypothesis generation step. Meanwhile, shadow can also be used for vehicle verification, since the located potential vehicle should have a shadow proper to its expected width corresponding to its location in the image. If the shadow is too wide or narrow, then it is rejected. For each remaining potential vehicle, a region-of-interest is defined as described in Figure 2.12. The final bottom edge that represents a potential vehicle designate the width of a rectangular box hypothesized as forming the area of the vehicle. The potential bottom edge of ROI corresponding to the potential vehicle is defined enlarging the width of this hypothesized rectangular box. The bottom edge of the ROI is set as adding 6 pixels to the x coordinate of the end point of the shadow edge and subtracting 6 pixels from the x coordinate of the start point of the shadow edge. The value like “6” pixel is appropriate for different ranges the vehicles locate in the image. The side edge length of the ROI is determined as the half of the shadow edge length. Figure 2.12 : Defining the region-of-interest (ROI). 26 Once the ROI is determined, refined search for the target vehicle is started in ROI. In the refined search, the horizontal projection vector w of the vertical edges V (Remember that the horizontal edge detector detects the vertical edges, [32, 33]) in the region defined as an n x m matrix is computed as follows: ( ) ( ) ( )     == ∑ ∑ = = n j n j jmjn ttyxVtyxVtwwww 1 1 121 ,,,,,,,,,,, KK (2.2) The projection vector of the vertical edges is searched starting from the left and also from the right. The largest projection values found in both directions during the search determine the positions of the left and right sides of the potential vehicle. To verify that the potential object is a vehicle; If one horizontal edge and two vertical edges can be found in the same ROI, then it is considered that a vehicle exists in the image. Since there are no consistent cues associated with the top of a vehicle, it can be detected by assuming that the aspect ratio of any vehicle is a predefined, specific value. 27 3. VEHICLE TRACKING One of the essential qualities of intelligent driver assistance systems is the ability of tracking other vehicles on the road. There are three key steps in video analysis: 1) detection of interesting moving objects, 2) tracking of such objects from frame to frame, and 3) evaluation of object tracks to recognize their behavior. Chapter 2 described how vehicles could be detected and recognized from a single image. However, as we assume to analyze long image sequences, if the objects have been identified in the current frame or previous frames, this information could be used and will be helpful in the detection of objects in the next frame. In its simplest form, tracking can be defined as the problem of estimating the trajectory of an object in the image plane as it moves around a scene. In other words, a tracker assigns consistent labels to the tracked objects in different frames of a video [36].♣ Vehicle tracking forms the basis for estimating parameters of the (3D) real world motion of the vehicles on the road. In this chapter, the algorithm used to track vehicles and extract 2D motion parameters is presented. 3.1 Literature Overview of Object Tracking In machine vision, visual tracking is the process of extracting geometric information of the motion of an object from image data. The goal of visual tracking is to analyze specific attributes of a target via measurements obtained from a sequence of image data. For example, determining the image position (2D) of a target as the target object moves through the camera’s field of view, or obtaining the pose of an object (3D position and orientation) may be intented to determine. Visual tracking is the problem known as the temporal correspondence problem: the problem of matching a target region through successive frames of a sequence of images typically taken at closely-spaced intervals. ♣ Most of the information about the object tracking and problem conditions was quoted from [36]. More detailed information can be found in [36]. 28 The motion of an object in space causes changes in the image. The motion detected on the image, visual motion, is related to the motion in space. The motion field is defined as the 2D vector field of velocities of the image points, caused by the motion relative to the viewing camera. The motion field can be thought of as the projection of the 3D velocity field on the image plane. Determining the motion field provides the basis information so that one can obtain the 3D motion of objects. Detecting 2D motion in the image is generally classified into two categories: 1) optical flow, and 2) tracking. Optical flow, as mentioned in the motion-based detection methods, is based on estimating the apparent motion of the image brightness pattern. Optical flow differs from the true motion field except where the image gradients are strong. Much work in tracking is realized by utilizing the other category – the feature-based approach. The basis of the feature-based approach is the processing of the images to extract “features” (edges, regions of homogeneous color and/or texture, etc.). The feature-based approach has advantages. First, feature extraction reduces the vast amount of data present in the image, without necessarily eliminating salient information. Second, optical flow can only analyze the motion field along edges hence computing dense flow field can be counter-productive and computational expensive. In feature-based method, feature extraction reduces the whole image into subimage regions. Thus a comparatively computational efficiency can be provided. Feature-based tracking generally works in such a way: an object template is prestored as the basis of recognition and position, then in every next frame, template is matched. The matching is based on the output of a cost function. If the cost function is less than a predefined threshold value then it is assumed that target is present in the current frame. There are various cost functions, of which the most popular is sum-of-squared-diffrence (SSD). In [37], the detected vehicles are tracked using a combination of distance based matching, SSD and edge density of detected vehicle regions. In [38], recognition and localization of the preceding vehicle in the image is realized utilizing a correlation-based approach. 29 Due to the constraint of real-time performance, the challenge in visual tracking is to match the amount of data to be processed to the available computational resources. This can be done in a number of ways: simplifying the problem, utilizing specialized image processing hardware, designing clever algorithms, or all of them. Target of interest is not searched in whole image frame to increase efficiency of the algorithm. Template is matched in the Region of Interest (ROI) where target was likely to be found. ROI is determined based on the assumption that target can not move too much in consecutive two frames. Therefore, ROI will be somewhere in surrounding of the region where the last time the object was presented. However, it is possible that there may be significant change in target shape or orientation in the next frame. The image changes due to motion, illumination, and occlusions may causes errors in the measurements. If this is the case, then tracker starts losing the target. In order to tackle the above-mentioned problem, the use of a sufficiently rich and accurate predictive model is required. In [20], The position and size of the target of interest is determined by a simple recursive filter with the aim of real-time multiple vehicle tracking from a moving vehicle. The Kalman filter is exactly useful as a solution to the problem mentioned above, handling noisy measurements (and also a noisy process). In [39], a real-time vision-based approach for detecting and tracking vehicles from a moving platform is developed. Tracking is realized by combining a simple image processing technique with a 3D extended Kalman filter and a measurement equation that projects from 3D model to image space. In [40,41], Kalman filter is used to produce optimal estimates of the state of a dynamic system with the aim of motion estimation of vehicles for in-car systems. 3.2 Problem Conditions Ideally, a tracking algorithm would be able to locate the object anywhere within the image at any point in time. However typically only a limited region of the image is searched. Reasons for this are efficiency (especially necessary for real-time applications) and the possibility that there might be many other similar looking objects in the image. 30 The intuitive approach is to search within a region around the last position of the object. But as seen in Figure 3.1, this approach will fail if the object moves outside the target range. There are many possible reasons that occur this case: 1. The object is moving too fast. 2. The frame rate is too slow. 3. The searched region is too small. Figure 3.1 : (a) Tracking the object without position prediction might be successful. (b) Tracking without position prediction will fail. These problems are related to each other and can be avoided by ensuring a high enough frame rate for example. But given other constraints, these problems are often inevitable. In addition, even when the target can be accurately located. It seldomly appears the same in all images. Changes in orientation, lighting, occlusions, and imperfections in the camera continuously affect the appearance of the same target. So essentially, to observe the true location of the target in a certain manner is very difficult under the usual circumstances. One can simplify tracking problem by imposing some constraints on the motion and/ or appearance of objects. For example, almost all tracking algorithms assume that the object motion is smooth with no abrupt changes. One can further constrain the object motion to be of constant velocity or constant acceleration based on a priori information. Prior knowledge about the number and size of objects, or the object appearance and shape, can also be used to simplify the problem. 31 3.3 Objective If a summary of the above-mentioned discussions is made, two major problems can be identified: 1. The object can only be tracked if it does not move beyond the searhed region. 2. Various factors such as lighting and occlusions can affect the appearance of the target, thus making accurate tracking complex. To solve the first problem, making predictions about the locations of the detected vehicles in successive frames of a long image squence is attempted. But in making predictions, it is necessary to consider the second problem as well. Thus the prediction method needs to be robust enough to handle this source of error. A Kalman filter which estimates the positions and uncertainties of moving vehicles in the next frame is used within this master thesis. How large a region should be searched in the next frame for each target, that is, where to look for the target objects, around the predicted positions is determined by the Kalman filter to be sure to find the locations of the target objects within a certain confidence. The region that covers the detected vehicle is called as “the bounding box”. Two control points for each bounding box are considered. The image coordinates of these control points are predicted for each next frame through an image sequence using the Kalman filter. The width of the bounding box in the image plane is computed using the image coordinates of the predicted control points. The ROI where the new target is searched, is defined expanding the predicted width of the bounding box in the image plane to a predefined pixel value. 32 3.4 The Theory of the Kalman Filter The Kalman filter, rooted in the state-space formulation or linear dynamical systems, provides a recursive solution to the linear optimal filtering problem. The solution is recursive in that each updated estimate of the state is computed from the previous estimate and the new input data, so only the previous estimate requires storage. The Kalman filter is essentially a set of mathematical equations that implement a predictor-corrector type estimator that is optimal in the sense that it minimizes the estimated error covariance. In addition to eliminating the need for storing the entire past observed data, the Kalman filter is computationally more efficient than computing the estimate directly the entire past observed data at each step of the filtering process. The Kalman filter has been the subject of extensive research and application, particularly in the area of autonomous or assisted navigation. The Kalman filter has also been used extensively for tracking in interactive computer graphics [42]. Consider a linear, discrete-time dynamical system described by the block diagram shown in Figure 3.2. The Kalman filter addresses the general problem of trying to estimate the state of the discrete-time dynamical system that is governed by the linear stochastic difference equation. The state vector or simply state, denoted by xk, is defined as the minimal set of data that is sufficient to uniquely describe the unforced dynamical behavior of the system; the subscript k denotes discrete time. In other words, the state is the least amount of data on the past behavior of the system that is needed to predict its feature behavior. Typically, the state xk is unknown. To estimate it, a set of observed data, denoted by the vector yk, is used. In mathematical terms, the block diagram of Figure 3.2 embodies the following pair of equations: 3.4.1 The process to be estimated A discrete time process that is governed by the linear stochastic difference equation is defined as, kkkkk wxFx += ++ ,11 (3.1) 33 with a measurement equation that is kkkk vxHy += (3.2) where Fk+1,k is the transition matrix taking the state xk from time k to time k + 1, yk is the observable at time k and Hk is the measurement matrix. The random variables wk and vk represent two additive noise terms: the process and measurement noise (respectively). They are assumed to be independent (of each other), white, with normal probability distributions and with covariance matrices defined by [ ] { knforQ knfor T kn k ww = ≠ =Ε 0 (3.3) where Q is the process noise covariance matrix and [ ] { knforR knfor T kn k vv = ≠ =Ε 0 (3.4) where R is the measurement noise covariance matrix. If noises are uncorrelated, as is usually assumed to be the case, the off-diagonal terms are zero as described in the equation 3.3 and the equation 3.4. Most commonly the noise processes are assumed to be stationary; i.e., their statistics do not vary with time. The covariance matrices related to the noises are assumed to be constant. Figure 3.2 : Signal flow representation of a linear, discrete-time dynamical system [43] . 34 The Kalman filtering problem, namely, the problem of jointly solving the process and measurement equations for the unknown state in an optimum manner may now be formally stated as follows: Use the entire observed data, consisting of the vectors y1, y2, …., yk, to find for each k ≥ 1 the minimum mean-square error estimate of the state xk. 3.4.2 The computational origins of the filter A priori state estimate at step k is defined as nkx ℜ∈−ˆ (note the “super minus”) given knowledge of the process prior to step k, and a posteriori state estimate at step k is defines as nkx ℜ∈ˆ given measurement yk. Then, a priori and a posteriori estimate errors can be depicted as .ˆ ,ˆ kkk kkk xxe andxxe −≡ −≡ −− (3.5) The a priori estimate error covariance is then [ ] ,Tkkk eeP −−− Ε= (3.6) and the a posteriori estimate error covariance is [ ].Tkkk eeP Ε= (3.7) In deriving the equations for the Kalman filter, finding an equation that computes an a posteriori state estimate kxˆ as a linear combination of an a priori state estimate − kxˆ and a weighted difference between an actual measurement yk and a measurement prediction −kxH ˆ is the initial goal, as shown below in equation (3.8). ( )−− −+= kkkk xHyKxx ˆˆˆ (3.8) The difference ( )−− kk xHy ˆ in equation (3.8) is called the measurement innovation, or the residual. The residual reflects the discrepancy between the predicted measurement −kxH ˆ and the actual measurement yk. A residual of zero means that the two are in complete agreement. 35 The matrix K in equation (3.8) is chosen to be the gain or the blending factor that minimizes the a posteriori estimate error covariance equation (3.7). The implementation of this minimization can be found in [30,31]. One form of the resulting K that minimizes equation (3.7) is given by ( ) RHHP HP RHPHHPK T k T k T k T kk + = += − − − −− 1 (3.9) Looking at equation (3.9), as the measurement noise covariance R approaches zero, the gain K weights the residual more heavily. On the other hand, as the a priori estimate error covariance −kP approaches zero, the gain K weights the residual less heavily. 3.4.3 The probabilistic origins of the filter The Kalman filter maintains the first two moments of the state distribution, [ ] ( ) ( )[ ] .ˆˆ ˆ k T kkkk kk Pxxxx xx =−−Ε =Ε (3.10) The a posteriori state estimate equation (3.8) reflects the mean (the first moment) of the state distribution – it is normally distributed if the conditions of equation (3.3) and (3.4) are met. The a posteriori estimate error covariance equation (3.6) reflects the variance of the state distribution (the second non-central moment). More details on the probabilistic origins of the Kalman filter can be found in [42].♣ 3.4.4 The summary of the discrete Kalman filter algorithm The equations for the Kalman filter fall into two groups: time update equations and measurement update equations. The time update equations are responsible for projecting forward (in time) the current state and error covariance estimates to obtain the a priori estimates for the next time step. The measurement update equations are responsible for incorporating a new measurement into the a priori estimate to obtain an improved a posteriori estimate. ♣ Most of the information about the theory of the Kalman filter was adapted from [42]. More detailed information about the Kalman filter can be found in [42]. 36 The time update equations can also be thought of as predictor equations, while the measurement update equations can be thought of as corrector equations. The specific equations for the time and measurement updates are presented in the following equations. A complete description of the operation of the filter can also be found in Figure 3.3: kkkk xFx ˆˆ 1, − − = (3.11) k T kkkkkk QFPFP += −−−− 1,11, (3.12) The equations described above (3.11 and 3.12) are the discrete Kalman filter time update equations. How the time update equations project the state and covariance estimates forward from time step k – 1 to step k can be seen clearly. ( ) 1−−− += RHHPHPK TkTkk (3.13) ( )−− −+= kkkk xHyKxx ˆˆˆ (3.14) ( ) −−= kkk PHKIP (3.15) The equations described above (3.13 – 3.15) are the discrete Kalman filter measurement update equations. 3.5 Dynamical System Formulation of the Implemented Vehicle Tracking The attribute sought at any point in time described by the state vector xk. Often this state vector contains the coordinates of the target with respect to a chosen reference frame. The two control points of the bounding box of the vehicle in the image are considered. The bounding box refers to a rectangular that covers the area of the vehicle. These two control points are chosen as the bottom left and right points of the bounding box [ ] Tkkkkk yxyxp 2,2,1,1, ,,,= where the subscript k denotes the frame of the sequence under consideration (See Figure 3.4). Within the image, the control points related to the bounding box of the vehicle move with velocity [ ] Tkykxkykxk vvvvv 2,,2,,1,,1,, ,,,= . 37 Figure 3.3 : A complete description of the operation of the Kalman filter [43] . 38 A state vector [ ] Tkykxkykxkkkkk vvvvyxyxx 2,,2,,1,,1,,2,2,1,1, ,,,,,,,= (3.16) can be chosen to describe the motion of the bounding box on the image plane. Nevertheless, since the two control points of the bounding box that is chosen to track are the points on the same horizontal edge (the bottom edge of the bounding box), the state vector is reduced to [ ] .,,,,, 2,,1,,1,,2,1,1, Tkxkykxkkkk vvvxyxx = (3.17) The position and size of the region-of-interest, in other words the tracking window, in subsequent frames is determined by predicting this state vector in terms of the theory behind the Kalman filter. Therefore the chosen state vector in Equation (3.17) is appropriate to predict the position and size of the region-of-interest in subsequent frames. Figure 3.4 : The description of the bounding box and the control points. If a sufficiently small sampling interval, δt, is assumed, a constant velocity between frames can also be assumed. The motion can be expressed as: THE BOUNDING BOX 39 11 111 −− −−− += +∗+= kkk ktkkk vv vpp η ξδ (3.18) where 11, −− kk ηξ are the uncertainty in the model, usually taken to be zero-mean, white, Gaussian random processes. Re-writing this in terms of the state vector, a dynamical model of the target motion is obtained as 111 −−− +Φ= kkkk wxx (3.19) where                   =Φ − 100000 010000 001000 00100 00010 00001 1 t t t k δ δ δ (3.19a) and     = − − − 1 1 1 k k kw η ξ (3.19b) wk-1 is the uncertainty in the process; i.e., process noise. ξk-1 can be assumed to be zero and the uncertainty in the process can thus be defined as: [ ] [ ]Tkxkykxk T kxkykxk uuu uuuw 2,1,1,1,1,1,1 2,1,1,1,1,1,1 000 −−−− −−−− = = η (3.20) As to measurements, the positions of the bounding box control points, pk, at every frame of a sequence are evaluated. Therefore, the measurement model of the Kalman filter becomes kkkk xHz µ+= (3.21) 40 where           = 000100 000010 000001 kH (3.21a) and µk is the uncertainty in the measurement; i.e., measurement noise (again, often assumed to be a zero-mean, white, Gaussian random process). 3.5.1 The initialization of the Kalman filter The main problem with Kalman filtering is that statistical models are required for the system and the measurement instruments. Unfortunately, they are typically not available, or difficult to obtain. In the actual implementation of the filter, the measurement noise covariance R is usually measured prior to operation of the filter. Measuring the measurement error covariance R is generally practical (possible). An off-line analysis of the measurement instruments prior to running the process (system identification) can be made to determine the variance of the measurement noise. The determination of the process noise covariance Q is generally more difficult because the process can not be observed directly. In other words, if the measurements in the off-line analysis also contain errors, the process can not be accurately profiled. Sometimes a relatively simple (poor) process model can produce acceptable results if enough uncertainty is injected into the process via the selection of Q. Certainly in this case, the process measurements must be reliable. Whether or not a rational basis for choosing the parameters is provided, often times superior filter performance (statistically speaking) can be obtained by tuning the filter parameters. These parameters can be pre-computed, for example, by determining the steady-state value under conditions where Q and R are in fact constant. Since Hk is a 3 x 6 matrix, the three additive noises are assumed zero-mean white, uncorrelated with each other, and with variances ( )k kx 2 1, σ , ( )k ky 2 1, σ , ( )k kx 2 2, σ , respectively. The measurement noise covariance matrix, needed for the Kalman filter implementation, is thus given by [ ]           =Ε= 2 2 2 2, 1, 1, 00 00 00 k k k x y x T kkkR σ σ σ µµ (3.22) 41 The process noise covariance matrix is formally defined as [ ]Tkkk wwQ 111 −−− Ε≡ . Using the definition of the noise vector wk-1 and the assumption that the process- noise terms 1,1, −kxu , 1,1, −kyu , 2,1, −kxu are uncorrelated;                   = − 2 3 2 2 2 1 1 00000 00000 00000 000000 000000 000000 σ σ σk Q (3.23) where [ ]2 1,1,21 −Ε= kxuσ , [ ]2 1,1,22 −Ε= kyuσ and [ ]2 2,1,23 −Ε= kxuσ represent the variance of the noise terms. Remember that these terms ( 1,1, −kxu , 1,1, −kyu , 2,1, −kxu ) represent the change in the velocity (Equation 3.18). Specific numbers must of course be put in for those variances in order to define the Kalman filter numerically. To do this, a model for the vehicle acceleration that is simple and appears reasonable on physical grounds [44] is used to model the 2D image motion within the thesis. The vehicle acceleration u in either of the two directions (image coordinates; x and y) is assumed to be random and equally likely to be positive or negative with some maximum value A. The acceleration is assumed to be uniformly distributed between ±A. The probability density function of the acceleration in either direction is thus assumed to have the form of Figure 3.5. Three impulse functions representing discrete probabilities at ±A and 0 acceleration have been superimposed to make the model a little flexible. These then simply say that there is a probability P2 that the vehicle will proceed at constant image velocities, while there is probability P1 that its acceleration (deceleration) in either direction is at the maximum value A. The height of the uniform distribution is just ( ) APPa 2/21 21 −−= and that the variance of the random variable u is given by ( )21 2 2 41 3 PPAu ++=σ (3.24) 42 To find 21σ , 22σ and 2 3σ , un Tσσ = must be considered (T is time interval and n = 1,2,3). Thus, ( ) .3,2,1,41 3 21 22 222 =++== nwherePPTAT nn σσ (3.25) Figure 3.5 : Assumed probability distribution of the acceleration u. Although the noise is assumed to be stationary, so that the variances do not vary with time, it may be possible to employ an algorithm which adjusts these process noise variances after each time step based on the observed measurements and evaluated change in the velocities considering these measurements. Filter initialization requires a first error covariance matrix as well as the noise covariance matrices. From its definition, the error covariance matrix is given as ( ) ( )[ ] kTkkkk Pxxxx =−−Ε ˆˆ . The diagonal terms are just the mean-squared errors in the signal vector estimates. To initialize the filter, a first estimate is required as well as a first covariance matrix corresponding to the use of that first estimate. A first estimate can be found in several ways [44]. In some estimation problems an optimal (least mean squared error) can be found using the orthogonality principle, or, equivalently, by starting with a previous 0ˆ0 =x , which is indeed the optimal estimate of the zero-mean signal components when no observations are available. In such cases, the corresponding error covariance matrix P0 would be simply the steady-state covariance matrix C of the signal vector since [ ][ ][ ] [ ] [ ] CxxxxxxxxP TkkTT =Ε=Ε=−−Ε= 0000000 ˆˆ (3.26) 43 3.6 The Implemented Algorithm The tracking algorithm implemented within the thesis uses the following steps: 1. After the recognition of the vehicles is realized in the vehicle detection process and the current state vector is determined, tracking starts from the next image. 2. Repeat for each frame in the image sequence: • Use the dynamical model to predict the position of the detected vehicle in the image. • Calculate the region-of-interest for the predicted vehicle position. The ROI is determined as described in the hypothesis verification step of the vehicle detection process explained in Chapter 2 (See Section 2.4.2.2). • In determined ROI, search for the corresponding vehicle by using pronounced horizontal (shadow edge) and vertical edges. • Once the tracked vehicle is found, get the optimal estimation of the tracked vehicle in the current frame. • Update the position of the tracked vehicle based on the measurements corresponding to the position of the vehicle in the current frame. The same tracking process is realized for each vehicle recognized in the vehicle detection process. The vehicle detection algorithm is called every 10th frame due to the possibility of the presence of the new vehicles. It is possible for an object in the image not to able to detect from one or two images. Hence, any of the detected and tracked vehicles may not be detected in the next call of the vehicle detection algorithm after 10 frames, even if it is still there. 44 Employing the capabilities of the tracking algorithm might be reasonable to avoid this problem. Before the elimination of the vehicle that is no longer detectable in the current frame, the sub-region that is the boundary box of the vehicle in the previous frame is correlated with the sub-region in the current frame whose the size and the position is the same with the boundary box in the previous frame. If the normalized correlation of the image regions is high, it is inferred that the vehicle might be still there. 3.6.1 To update the filter: horizontal and vertical edges detection In each determined ROI, a refined search is realized to detect horizontal edge (the shadow edge) and vertical edges that is the vertical sides of the vehicle. The ROI is determined as explained in the hypothesis verification step of the vehicle detection algorithm and the vertical sides of the vehicle are also extracted by the same way defined in the hypothesis verification step. Nevertheless, in contrast with the vehicle detection algorithm, the horizontal edge (the shadow edge) detection must be realized in each determined ROI to locate the corresponding vehicle in the image. The horizontal edges are extracted implementing a Sobel edge detector. The projection vector of the horizontal edges in the ROI (defined as an n x m matrix) is computed as follows: ( ) ( ) ( )     == ∑ ∑ = = m i m i niim ttyxHtyxHtvvvv 1 1 121 ,,,,,,,,,, KK (3.27) Because of that the top horizontal edge of the vehicle is not tried to detect, the projection vector of the horizontal edges is searched from the bottom of the vector to the middle of it. The largest projection value determines the position of the bottom edge of the vehicle. Another difference from the vertical edges detection in the hypothesis verification step is the selection of the threshold value. Even if the ROIs are accurately representing the vehicles in the image, there might be still distinctive features of the other vehicles in the same ROI, that can cause false detections in the situations such as vehicle occlusions. 45 To handle with this problem (especially the false detections problem caused by the occlusions), two threshold values are determined based on the literature survey and the observations obtained within the thesis: 1) The half of the largest projection value and 2) the largest projection value. Firstly, the projection vector is searched from the left and also from the right until a vector entry that is more than or equal to the half of the largest projection value is found. The maximum change in the image coordinates of the vertical sides of each vehicle is stored during the execution of the tracking algorithm. If the positions of the vector entries found based on the first threshold value (the half of the largest projection value) cause a change in the coordinates more than the maximum one, a new search is started using the second threshold value (the largest projection value). Otherwise, the positions of the vector entries selected using the first threshold determine the positions of the left and right sides of the vehicle. 46 47 4. CONCLUSION AND RECOMMENDATIONS This thesis proposed a multiple vehicle detection and tracking system, which includes road area finding. Video captured by just one camera is used to detect and track multiple vehicles. The system gives information of the ongoing traffic via the camera mounted on the rear-view mirror of the host vehicle. In the first step of the detection algorithm, the locations of the potential vehicles in the image are hypothesized using the shadows underneath a vehicle (as a distinctive feature) scanning the defined road area bottom-up to avoid false detections of delineators. The road area is defined using the lane information obtained by the Hough transform. In the second step of the detection algorithm, the hypothesized locations of the potential vehicles in the image are verified using the vertical edges as well as the shadows underneath a vehicle. During the verification, the presence of a vehicle is considered if one horizontal edge and two vertical edges can be found. The summary of the vehicle detection algorithm is illustrated in Figure 4.1. After extracting vehicles, the developed tracking algorithm effectively track them during successive image frames in a long image sequence using a Kalman filter based approach. Finally, the 2D image velocity relative to the host vehicle for each detected vehicle is provided. The flow chart of the implemented algorithms can be seen in Figure 4.2. The shadow detection step of the vehicle detection algorithm can be considered as a coarse search, while the detection algorithm is employed only for small regions represented each potential vehicle after the region-of-interests defined for each potential vehicle. The coarse search is implemented over the defined road-area, while ROIs make possible to employ a refined search over located small regions. Hence, the coarse search takes a substantial amount of time. It can take about 1–1.5 seconds depending on the number of the detected shadows. 48 However, if the dynamics of the moving objects are known, predictions can be made about the positions of the objects in the current image and the relevant positions of the moving objects can be estimated in successive frames of an image sequence. The Kalman filter based tracking algorithm implemented within the thesis can reduce the processing time needed to execute the vehicle detection algorithm to approximately 0.02 seconds. Figure 4.1 : The summary of the detection algorithm. The developed algorithms within the thesis were implemented by MATLAB. Besides the fact that MATLAB does not provide a sufficient performance for this kind of vision applications related to the time constraint in comparison with the environments C/C++ based programs can be executed and mainly real-time computer vision is aimed, the developed algorithms should be translated from the MATLAB implementation to C/C++ using this kind of environments just mentioned about. The most serious drawback of using the shadow cue for vehicle detection is scenes with low sun, making vehicles cast long shadows (See Figure 4.3). Hence, the detected shadows become wider in the case of a sun from the side or ill positioned in the case of the camera facing the sun. Reducing the shadow edges to one bottom edge for each hypothesized vehicle Two vertical edges and one horizontal edge that represent each vehicle Detected shadows Combining shadow edges in successive rows (for 2 pixels) 49 As mentioned in the hypothesis generation step (See section 2.4.2.1), the shadow lengths change due to the different weather conditions and even times of the day. This situation can cause defining ROI whose the size is considerably more than the size of the potential vehicle. Defining such a ROI can cause false detections thus verification errors for further analysis due to the background or eligible features of the other vehicles where might be in the same ROI. Such false detections might especially occur in the situation the vehicles in adjacent lanes. Surprisingly, this problem has not been mentioned enough in the literature. As a solution to the mentioned problem, the information of the detected lanes is utilized within the thesis. Beside the fact that the width of a vehicle in an image is related to the width of the lane where the vehicle is currently present, it is possible to calculate a reference value for the width of a potential vehicle according to the lane where it is present. If the width of the potential shadow edge is too wide compared to the reference value, then this shadow edge is eliminated (See section 2.4.2.1). Most of the previous vehicle detection and tracking methods used lane or determined free driving space as in the work implemented within this thesis. However, if the lane does not exist or due to an intersection, etc., it is difficult to acquire such information. Although searching the road area defined via the lane information to extract distinctive cues reduces the computational cost in comparison with searching the whole image, using appearance-based verification methods to verify the presence of a potential vehicle hypothesized by searching the whole image can provide a more robust algorithm to handle the problem associated with the lane detection. 50 Figure 4.2 : The flow chart of the implemented algorithms. START Divide the current image into two half images. Apply the Hough transform for each half part to detect lanes. Scan each lane or the groups of the lanes bottom-up to detect the shadows underneath the vehicles. The shadow edges are sufficient length to represent a vehicle ? Remove the shadow edge Combine the shadow edges in successive rows (for 2 pixels). Reduce the shadow edges on the same vehicle to one bottom edge for each potential vehicle. The lcoated vehicles have a shadow proper to the expected width corresponding the location in the image ? Remove the shadow edge Define the ROIs for each potential vehicle using the final bottom edges. One horizontal edge and two vetical edges found in the same ROI ? Remove the hypothesized vehicle Track each recognized vehicle during 10 frames. 10 frames past ? NO NO YES NO YES YES NO YES STOP 51 Despite the problems related to the lane information, it should be taken into account that the lane information in which the observed vehicle is moving is an important parameter. In the case that the shadows underneath vehicles are used as a cue for the detection, the lane information is especially important due to the change in the shadow length during different weather conditions and times of the day, as mentioned in the previous paragraphs. Hence extracting the lane information might require in the vehicle detection and tracking applications. If the dynamics of the lanes due to the moving camera is known, developing a lane tracking algorithm as a solution to the problems related to the lane information might be reasonable in some circumstances within the applications the lane infromation is needed. Figure 4.3 : Low sun from the side makes that vehicles cast long shadows. In Appendix A, the experimental results of the implemented algorithms can be seen. The developed algorithms are executed for the images of the daylight. The detected vehicles are tracked during the frames of an image sequence. Each vehicle is represented by a different color. The aspect ratio of any vehicle is assumed to be 1 and the bounding boxes of the recognized vehicles are plotted based on this aspect ratio. It is also possible to modify the algorithms to detect and track the vehicles in the night, as a future work. In Figure A.1, the mid-range and the distant vehicles are detected and tracked during the frames of an image sequence. In these frames, it is also possible to observe the performance of the developed algorithms in detecting and tracking the vehicles which make a lane change maneuver. 52 In Figure A.2, the vehicles at close range are detected and tracked during an image sequence. In these frames, the host vehicle is approaching to another vehicle from the rear. This image sequence is quite usable to illustrate a dangerous situation. In such a case, estimating Time-to-Collision will make the driver to be warned about the distance of the vehicle from the leading vehicle and can make the driver take an action for avoiding a possible collision. In Figure A.3, a drawback of the shadow-based algorithm is illustrated. The shadow an overpass occurs on the road causes false detections. The area underneath the vehicle is still distinctly darker than any other areas underneath the overpass. Thus, the shadow underneath the vehicle can be detected when the vehicle passes underneath the overpass. However, the ROIs in which no vehicles exist can not be eliminated using the vertical edges as a clue for verification, as seen in the following frames. Using a combination of different clues in the hypothesis verification step might prevent such false detections. In the illustrated frames, using a combination of vertical edges and texture pattern might be considered as a solution to the mentioned problem. As a future objective, the 2D-vehicle velocities provided by the algorithms implemented within the thesis are intended to be used for estimating parameters of the (3D) real-world motion of the vehicles relative to the host vehicle with the aim of preventing possible dangerous situations. Providing the information about the driving environment for drivers makes possible to warn about the time it takes for other vehicles to contact with them and thus the situations rear-end collisions might occur or the collisions caused by sudden lane change might be avoided by estimating Time-to-Collision. 53 REFERENCES [1] Bertozzi, M., Broggi, A., Cellario, M. and Fascioli, A., 2002. Artificial Vision in Road Vehicles, Proceedings of IEEE , vol. 90, pp. 1258-1271. [2] Bishop, R., 2000. Intelligent Vehicle Applications Worldwide, IEEE Intelligent Systems, vol. 15, pp. 78-81 [3] Heimes, F. and Nagel, H., 2002. Towards Active Machine-Vision-Based Driver Assistance for Urban Areas, International Journal of Computer Vision, vol. 50, pp. 5-34. [4] Franke, U. et al., 2001. From Door to Door – Principles and Applications of Computer Vision for Driver Assistant Systems, chapter 6 in Intelligent Vehicle Technologies, eds. L. Vlacic and F. Harashima and M. Parent, Butterworth Heinemann, Oxford, UK, pp. 131-188. [5] Graefe, V., 1993. Vision for Intelligent Road Vehicles, Proceedings of IEEE Symposium on Intelligent Vehicles, Tokyo, pp. 135-140. [6] Dickmanns, E., 2002. The Development of Machine Vision for Road Vehicles in the Last Decade, Proceedings of IEEE Intelligent Vehicle Symposium, vol. 1, pp. 268-281. [7] Bertozzi, M. and Broggi, A., 1998. Gold: A Parallel Real-Time Stereo Vision System for Generic Obstacle and Lane Detection, IEEE Trans. Image Processing, vol. 7, pp. 62-81. [8] Bertozzi, M., Broggi, A. and Fascioli, A., 1997. Obstacle and Lane Detection on Argo Autonomous Vehicle, IEEE Intelligent Transportation Systems, pp. 1010-1015. [9] Tsugawa, S. and Sadayuki, 1994. Vision-Based Vehicle on Japan: Machine Vision Systems and Driving Control Systems, IEEE Trans. Industrial Electronics, vol. 41, pp. 398-405. [10] Thorpe, C., Carlson, J.D., Duggins, D., Gowdy, J., MacLachlan, R., Mertz, C., Suppe, A. and Wan, C., 2003. Safe Robot Driving in Cluttered Environments, Proceedings of 11th International Symposium of Robotics Research, Siena, Italy. [11] Thorpe, C. and Kanade, T., 1985. Vision and Navigation for Carnegie-Mellon Navlab, Proceedings of DARPA Image Understanding Workshop. [12] Thorpe, C., Hebert, M., Kanade, T. and Shafer, S., 1988. Vision and Navigation for Carnegie-Mellon Navlab, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 10, pp. 362-373. [13] Kuehnle, A., 1991. Symmetry-Based Recognition of Vehicle Rears, Pattern Recognition Letters, vol. 12, pp. 249-258. 54 [14] Zielke, T., Brauckmann, M. and von Seelen, W., 1993. Intensity and Edge- Based Symmetry Detection with an Application to Car-Following, Computer Vision, Graphics, and Image Processing: Image Understanding, vol. 58, pp. 177-190. [15] Bertozzi, M., Broggi, A. and Fascioli, A., 2000. Vision-Based Intelligent Vehicles: State of the Art and Perspectives, Robotics and Autonomous Systems, vol. 32, pp. 1-16. [16] Crisman, J. and Thorpe, C., 1988. Color Vision for Road Following, Proceedings of SPIE Conf. Mobile Robots, Cambridge, Massachusetts, pp. 246-249. [17] Buluswar, S.D. and Draper, B.A., 1998. Color Machine Vision for Autonomous Vehicles, International Journal of Engineering Applications of Artificial Intelligence, vol.1, no. 2, pp. 245-256. [18] Guo, D., Fraichard, T., Xie, M. and Laugier, C., 2000. Color Modelling by Spherical Influence Field in Sensing Driving Environments, Proceedings of IEEE Intelligent Vehicles Symposium, Dearborn, Mi. Usa, pp. 249-254. [19] Dellaert, F. and Thorpe, C., 1997. Robust Car Tracking using Kalman Filtering and Bayesian Templates, Proceedings of SPIE Conf. Intelligent Transportation Systems, vol. 3207, pp. 17-83. [20] Betke, M., Haritaoglu, E. and Davis, L.S., 2000. Real-time multiple vehicle detection and tracking from a moving vehicle, Machine Vision and Applications, vol. 12, no. 2, pp. 69-83. [21] Kalinke, T., Tzomakas, C. and von Seelen, W., 1998. A Texture-Based Object Detection and Adaptive Model-Based Classification, Proceedings of IEEE Intelligent Vehicles Symposium, Stuttgart, Germany, pp. 143- 148. [22] Haralick, R., Shanmugam, B. and Dinstein, I., 1973. Texture Features for Image Classification, IEEE Trans. System, Man, Cybernetics, vol. 3, pp. 610-621. [23] Kim, S. and Kim, K et al, 2005. Front and Rear Vehicle Detection and Tracking in the Day and Night Times using Vision and Sonar Sensor Fusion, Intelligent Robots and Systems, IEEE/RSJ Internatioanl Conference, Alberta, Canada, pp. 2173-2178. [24] Franke, U. and Kutzbach, I., 1996. Fast Stereo based Object Detection for Stop&Go Traffic, Proceedings of IEEE Intelligent Vehicles Symposium, Tokyo, Japan, pp. 339-344. [25] Zhao, G. and Yuta, S., 1993. Obstacle Detection by Vision System for An Autonomous Vehicle, Intelligent Vehicles, pp. 31-36. [26] Giachetti, A., Campani, M. and Torre, V., 1998. The use of optical lfow for road navigation, IEEE Trans. On Robotics and Automation, vol. 14, no. 1, pp. 34-48. 55 [27] Morimoto, C., DeMenthon, D., Davis, L.S., Chellappa, R. and Nelson, R.C., 1995. Detection of independently moving objects in passive video, Proceedings of IEEE Intelligent Vehicles Symposium, Detroit, Michigan, pp. 270-275. [28] Sun, Z., Bebis, G. and Miller, R., 2006. On-Road Vehicle Detection: A Review, IEEE Transactions on pattern analysis and machine intelligence, vol. 28, no. 5, pp. 694-711. [29] Erçil, A., Abut, H., Erzin, E., Göçmençelebi, A., Göktan, A., Güvenç, L., Özatay, E. and Tandoğdu, H., 2005. The drivesafe project, Proceedings of the 1st AUTOCOM Workshop on Preventive and Active Safety for Road Vehicles, Đstanbul. [30] Daniş, S., Aytekin, B., Dinçmen, E., Sezer, V., Ararat, Ö., Öncü, S., Güvenç, B.A., Acarman, T., Altuğ, E. and Güvenç, L., 2008. Framework for Development of Driver Adaptive Warning and Assistance Systems That Will Be Triggered by A Driver Inattention Monitor, Otekon’08 4th Automotive Technologies Congress, Bursa. [31] Lundagards, M., 2008. Vehicle Detection in Monochrome Images, M.Sc. Thesis. Linköping University. [32] Gonzalez, R.C., Woods, R.E. and Eddins, S.L., 2004. Digital Image Processing Using Matlab, Pearson Prentice Hall Press. [33] Nixon, M. and Aguado, A., 2002. Feature Extraction and Image Processing, Butterworth Heinemann, Oxford. [34] Tzomakas, C. and Seelen, W., 1998. Vehicle Detection in Traffic Scenes Using Shadows, Technical Report 98-06, Institut für Neuroinformatik, Ruht-Universitat, Bochum, Germany. [35] Liu, W., Wen, X., Duan, B., Yuan, H. and Wang, N., 2007. Rear Vehicle Detection and Tracking for Lane Change Assist, Proceedings of IEEE Intelligent Vehicles Symposium, Đstanbul, pp. 252-257. [36] Yılmaz, A., Javed, O. and Shah, M., 2006. Object Tracking: A Survey, ACM Computing Surveys, vol. 38, no. 4. [37] Srinivasa, N., 2002. Vision-based Vehicle Detection and Tracking Method for Forward Collision Warning in Automobiles, Proceedings of IEEE Intelligent Vehicle Symposium, vol. 2, pp. 626-631. [38] Broggi, A., Cerri, P. and Ghidoni, S., 2005. A Correlation-Based Approach to Recognition and Localization of the Preceding Vehicle in Highway Environments, International Conference on Image Analysis and Processing, vol. 3617, pp. 1166-1173. [39] Dellaert, F. and Thrope, C., 1997. Robust Car Tracking Using Kalman Filtering and Bayesian Templates, Proceedings of SPIE, Intelligent Transportation Systems, vol. 3207, pp. 72-83. [40] Liu, X., 2000. Development of A Vision-Based Object Detection and Recognition System for Intelligent Vehicle, Ph.D. Thesis. University of Wisconsin – Madison. 56 [41] Leeuwen, van MB., 2002. Motion Estimation and Interpretation for In-Car Systems, Ph.D. Thesis. University of Amsterdam. [42] Welch, G. and Bishop, G., 2001. An Introduction to the Kalman Filter, Lecture Notes. University of North Carolina, Department of Computer Science. [43] Cuevas, E., Zaldivar, D. and Rojas, R., 2005. Kalman filter for vision tracking, Technical Report B 05-12, Freie Universitӓt Berlin, Fachbereich Mathematik und Informatik. [44] Schwartz, M. and Shaw, L., 1975. Signal Processing: Discrete Spectral Analysis, Detection, and Estimation, McGRAW-HILL International Book Company. 57 APPENDICES APPENDIX A : Experimental results of the implemented algorithms. 58 59 APPENDIX A Figure A.1 : Detection and tracking of mid-range and distant vehicles. FRAME 11169 FRAME 11223 60 Figure A.1 (contd.) : Detection and tracking of mid-range and distant vehicles. FRAME 11306 FRAME 11260 FRAME 11325 61 Figure A.2 : Detection and tracking of the vehicles at close range. FRAME 35251 FRAME 35406 FRAME 35467 62 Figure A.2 (contd.) : Detection and tracking of the vehicles at close range. FRAME 35498 FRAME 35620 FRAME 35650 63 Figure A.3 : Detection and tracking of the vehicle in the situation where an overpass occurs shadow areas on the road. FRAME 17 FRAME 203 FRAME 225 64 Figure A.3 (contd.) : Detection and tracking of the vehicle in the situation where an overpass occurs shadow areas on the road. FRAME 247 FRAME 253 FRAME 265 65 CURRICULUM VITA Candidate’s full name: Burcu AYTEKĐN Place and date of birth: Đstanbul, 23.07.1982 Permanent Address: Şehitler Caddesi, Güldeniz Sitesi, No: 73/ 2, Tuzla/ Đstanbul Universities and Colleges attended: Kadir Has High School and Kocaeli University Publications:  Daniş, S., Aytekin, B., Dinçmen, E., Sezer, V., Ararat, Ö., Öncü, S., Güvenç, B.A., Acarman, T., Altuğ, E. and Güvenç, L., 2008. Framework for Development of Driver Adaptive Warning and Assistance Systems That Will Be Triggered by A Driver Inattention Monitor, Otekon’08 4th Automotive Technologies Congress, Bursa.  Aytekin, B., Altug, E., 2009. Bilgisayarlı Görü Yöntemi ile Araç Belirleme ve Takibi, Submitted to IEEE 17. Sinyal Đşleme ve Đletişim Uygulamaları Kurultayı.