預(yù)制板生產(chǎn)線-自動劃線裝置的設(shè)計含NX三維及14張CAD圖
預(yù)制板生產(chǎn)線-自動劃線裝置的設(shè)計含NX三維及14張CAD圖,預(yù)制板,生產(chǎn)線,自動,劃線,裝置,設(shè)計,NX,三維,14,CAD
Automatic real-time road marking recognition using a feature driven approach
Alireza Kheyrollahi · Toby P. Breckon
Abstract Automatic road marking recognition is a key problem within the domain of automotive vision that lends support to both autonomous urban driving and augmented driver assistance such as situationally aware navigation systems. Here we propose an approach to this problem based on the extraction of robust road marking features via a novel pipeline of inverse perspective mapping and multi-level binarisation. A trained classifier combined with additional rule-based post-processing then facilitates the real-time delivery of road marking information as required. The approach is shown to operate successfully over a range of lighting, weather and road surface conditions.
Keywords Computer vision · Mobile robotics · Road marking recognition · Vanishing point detection · Intelligent vehicles
1 Introduction
Autonomous driving and road intelligence have been the focus of attention for many computer vision researchers over the last 20 years [1]. Although significant achievement has been made in developing a vehicle that can perform some form of autonomous guided driving, progress has been slow because of the problems of speed, safety and the real-time complexity of the on-road situation. A human driver gathers constant and numerous visual information from the road and the surroundings. Our brain is quite efficient in analysing this information and responding quickly by an appropriate course of action. For a computer vision system to be able to display a similar ability, it must encompass various detection abilities, each of which has been subject of significant research activity [2–4].
Whilst work on lane detection and tracking is significant [5,22], the literature on road marking recognition is limited with no reported work for real-time on-road text recognition. While road marking (including arrows) and text recognition is a relatively simple task for human drivers, its automatic detection would be very useful—and perhaps essential in some cases—for an autonomous vehicle or as an aid to driver situational awareness in an increasingly complex road environment.
Here we propose a multi-step processing pipeline for robust recognition of road markings and text. First image frames from an on-board camera are captured and pre-processed to remove the perspective effect via an inverse perspective mapping (IPM) driven by automatic vanishing point (VP) detection. After removing the effects of perspective a multi-level thresholding approach is applied to extract bright on-road objects that contrast against the road surface. These objects are then simplified to a contour representation that is passed to an artificial neural network (ANN) classifier for recognition. The results of this per-symbol (i.e. glyph level) classification are post-processed for either driver display or potential use by an autonomous driving decision
engine. This approach is shown to operate in real-time under a variety of driving, lighting and road conditions.
2 Previous work
Prior work in this area is limited and we briefly review the main seminal works in this area [6,7,29,30].
Charbonnier et al. [6] report a marking recognition process, which relies on finding stretches of lines using a horizontal scan line and then using Radon transform to find the most probable two lines which make up the start and the end of a rectilinear marking. Recognition of an arrow is based on comparing projection of the left and right of the identified rectilinear marking. In this work no perspective correction is done prior to recognition process and resulting performance is not real-time.
Rebut et al. [7] describe an extensive method recognising four classes of arrows and linear markings. They initially use Hough transform for linear marking detection and an arrow pointer template to find the location of the arrow symbols. When marking objects are located on the road surface a Fourier descriptor is then used to extract key features from which a k-Nearest Neighbour (k-NN) classifier is used for final recognition. Training is achieved using a database of sample arrow marking images with further samples created by adding noise to the limited initial data set. A Fourier feature descriptor of degree 34 resulted in an overall global detection error of 6% but a significant false alarm rate of 30%. Again real-time performance was not achieved as processing was carried out off-line in a post-analysis application setting.
More recent work on this topic [29] follows a similar shape-based methodology to that proposed here but is limited to on-road arrow recognition and uses a limited feature set that drives a classifier poorly suited to the complex alpha-numeric character sequences, under degraded quality conditions, considered here. In other recent work [30] an Eigenspace recognition approach is proposed but is reliant on good automated road glyph extraction and sample alignment (as per seminal Eigenspace recognition approaches [31]). Unlike the methodology proposed here to deal with such variation under in-situ on-vehicle operation under varying environmental conditions, [30] does not address either of these issues. Detection rates (for a small set of isolated on-road glyphs only) are comparable to those achieved here but [29,30] do not consider complex sequence recognition in the presence of glyph extraction and road position related noise.
By contrast our method uses a range of features that include invariant spatial moments, histogram projections and normalised angular measurements that are input to a trained neural network for real-time symbol recognition. This approach offers a significantly lower false alarm rate, within the bounds of real-time performance over a much larger set of symbol classes (6 on-road arrow types and 17 alpha-numeric characters) than prior work in the field [6,7,29,30]. In addition it facilitates the recognition of complex multi-glyph sequences under both varying road, lighting and marking quality conditions
3 Perspective image correction
As a pre-processing stage to our feature extraction approaches we first perform perspective correction on the images obtained from the vehicle mounted, forward facing camera (e.g. Fig. 2). This is performed via a one-time calibration process of vanishing point (VP) detection and subsequent inverse perspective mapping (IPM).
3.1 Vanishing point detection
A vanishing point (VP) is a point in perspective images to which parallel lines converge. Conventional 2D images are essentially a transformation of the 3D world onto 2D plane
(image plane). Following a classical pinhole camera model [23] parallel lines (e.g. road edges) within the 3D scene appear to meet at a point within the 2D image-dependent camera angle and lens characteristics [8,23]. An example is shown in the road edges shown in Fig. 1 .
Fig. 1 Temporal filtering of Canny edge detector output. Upper standard Canny edge detection output. Lower temporal filtering result of Canny edge image sequence.
In general, images that illustrate such a perspective (i.e. perspective images) can have up to three such vanishing points located on the image boundary, outside the boundary (external) or at infinity (i.e. far distance within the image, denoted as the infinite vanishing point—e.g. Fig. 2 ).
Fig. 2 IPM Transform applied to example road image
The vanishing point closest to the centre of the image, the dominant vanishing point, is commonly used for perspective correction in road scene images via camera calibration. The first stage in this process is the detection of the VPs within the image.
Classical VP detection is based on mapping edge lines detected within the image onto a unit Gaussian sphere as first described by Barnard [8]. Each line creates a circum-circle on the sphere with the maximal accumulated intersecting region of these circum-circles defining the vanishing
point locations. Further developed by various authors [9–11] and the capability for boundary, external and infinite VP detection makes this a popular approach.
However, recent studies show that such Gaussian sphere techniques, although simplifying the unbounded problem to a bounded search space, can produce spurious and false results especially in the presence of noise or texture [12,13]. An alternative, less prevalent approach is the use of a polar space accumulator as originally described by Nakatani [14]. As each point can be represented in polar space by a sinusoid, this improvement uses error minimising of the sinusoids to find the convergence [15]. Clustering of lines in Hough space has also been proposed as an alternative method whereby regular Hough line detection is then followed by clustering line segments to candidate vanishing points [16].
In this work we have used a variant on this latter approach [11,12] which also uses the classical Hough transform as in the initial stages of VP detection. The output of Canny edge detection [32] performed on a down-scaled (320 × 240) and Gaussian smoothed version of the image is fed into a temporal filter defined as follows:
From Eq. 1, this temporal filter Tt at time t operates over a number of images i in a given calibration sequence. Fi is the processed image frame at time i and n is the number of cumulative frames used to generate an accumulated edge output, Tt . This output, Tt , is then normalised before further processing for Hough-based line detection. As shown in Fig. 1 this temporal filtering will attenuate edge fluctuations that are associated with noise (trees, shadows, etc., shown in Fig. 1 upper) in any given frame and will enhance edges that are constant over n frames such as the road markings and boundaries we desire for VP detection (Fig. 1 lower). A short sequence of n frames, readily obtainable at 25 Hz from a modern video source, over a short distance of roadway gives a suitably stable scene for such a multi-frame temporal approach to be applicable.
This output of Tt is then used to find linear edge features, for VP detection, using a classic Hough transform method [11] within each frame t based on the previous n frames. The maximally detected l lines are extracted from each frame based on their Hough space accumulator value after the exclusion of lines falling within orientation threshold lt of the vertical or horizontal [11,12].
From this set ofl lines (here using l = 60), we then find the intersection points of all possible line pairings. These points are then clustered based upon a k-NN clustering approach in 2D image space (here using k = 3 for maximal presence of 3 VP in image). Each resulting cluster is then given a suitability score as follows:
The score for a given cluster U is calculated as the sum of Manhattan distances of all intersection points, (xi, yi), within the cluster from the vanishing point of the previous frame (xc, yc),calculated using the same process with Ttfor frame t ? 1 (Eq. 2). The Manhattan distance was found to empirically offer more stable results at reduced computational cost than the standard Euclidean approach. Where no previous VP is available, an arbitrary point (such as (0, 0)) is used. A simple averaging on the points from the winning cluster of this scoring method is used to determine the final candidate vanishing point for the frame. This resulting VP is then further averaged with VP of previous frame t ? 1.
This overall VP detection process converges to the correct vanishing point in approximately 100 frames (i.e. 4 s video @ 25 fps) and acts as a one-time computationally expensive calibration process for a given vehicle camera installation. The detected VP is then used to drive the inverse perspective mapping (IPM) of the on-vehicle camera image.
3.2 Inverse perspective mapping
As previously mentioned, the perspective effect within the 2D images introduces artefacts which could interfere with successful feature extraction and recognition. This issue is particular prevalent in the example of ground plane objects that appear at a ~45–90? angle to the image plane of the camera. This is illustrated in Fig. 2 (left) with regard to the speed limit symbol (40) on the roadway in front of the vehicle camera.
This effective of perspective can be overcome, although not entirely, by applying an inverse perspective transform using a technique known as inverse perspective mapping (IPM) [24]. The application of IPM requires six parameters [5]: (1) focal length of camera α, (2) height of camera from ground plane h, (3) distance of the camera from the middle of the road d, (4) vertical position of the camera along the road axis l, (5) pitch angle θ , (6) yaw angle γ.
Whilst the first four parameters are obtainable empirically from the vehicle camera installation, the final yaw and pitch angle parameters are retrieved via the vanishing point (xvp, yvp) identified in the prior detection exercise via Eq. 3 [5]
The IPM transform [24] then maps point (u, v) in the vehicle camera image (dimension M × N) to point (x, y) onto the road plane (Eq. 4). This mapping to the road plane of the image pixels (flattened as z is always zero, Eq. 4) can then be extracted as an image of the ground plane with perspective distortion effects removed [24].
The required mapping (Eq. 4) although computationally expensive only requires calculation once for a given set of calibrated vanishing points. It can then be stored for use, as a real-time mapping, for all subsequent image frames from the on-vehicle camera.
An example of this inverse mapping transform is shown in Fig. 2 where we see an image frame from an on-vehicle camera (Fig. 2, left) transformed to an inverse perspective mapping image of the roadway ground plane (Fig. 2, right) based on the detection of vanishing points as outlined previously. As is apparent in Fig. 2, the on-road marking in the transformed image have had the effects of perspective apparent in the original partially removed. This mapped image is significantly more viable as an input for constructing a robust method of road-marking extraction.
4 Road-marking extraction
Road-marking extraction involves the binarisation (i.e. threshold based extraction) of the road scene resulting from the application of the IPM transform to facilitate road surface glyph isolation using a contour-based approach.
Achieving robust thresholding in the presence of extreme light and shadow variations is a classical challenge in image processing that has plagued earlier work [5,6]. Numerous noise sources (shadows, sun/headlight/streetlight reflection, road surface debris and decay) interfere with the process. Broggi [5] proposed an image enhancement method using custom localised adaptive thresholding which produced successful results albeit with a significant, non-real-time computational cost.
In this work we propose a related adaptive global thresholding approach driven from global histogram information on a per image basis within any given road image sequence.
Fig. 3 Choosing four thresholds with p = 0.02 and q = 0.17 and a 256 bin cumulative histogram
4.1 Adaptive image thresholding
In general an N-value adaptive global threshold approach is employed to create N-separate binary images for subsequent shape isolation. The normalised cumulative histogram [17] of the resulting IPM transformed image (in grey scale) is used to establish these thresholds. An upper and lower border, p and q, for the range of interest in this histogram are established as percentile offsets of the normalised cumulative histogram maximum value (1.0). This range is within the histogram is then equally sub-divided into N ? 1 subranges via the creation of N thresholds. Here, for road scenes, we use N = 4 and create four thresholds from three subranges. For example if p = 0.02 and q = 0.17 (2nd and 17th percentile) then we then choose cumulative threshold value k = {0.02, 0.07, 0.12, 0.17} (for N = 4) and then find the corresponding image pixel value thresholds as the lowest index (i) cumulative histogram (Hi) bin with a value greater than or equal to 1.0 ? k.
Fig. 4 Adaptive thresholding under extreme lighting variations
As illustrated in Fig. 3 for p = 0.02 and q = 0.17 (2nd and 17th percentile) the corresponding upper and lower thresholds fall at 254 and 247 with the two intermediate thresholds (7th and 12th percentile) falling at the equidistance index positions of 252 and 250. Overall, this algorithmic approach isolates N-boundaries based on the cumulative distribution of the pixel values within the IPM transformed image from which N binary images, corresponding to differing shape isolation characteristics, can thus be extracted.
A remaining problem is that the distribution of the IPM image will vary substantially depending on the presence/absence and size of any road markings in the image frame. Using fixed values for p and q thus leads to spurious false-positive glyph detection due to poor threshold selection under certain conditions. This is dealt with by reference to the overall mean intensity of the grey scale image frame, avg(Image), and the scaling of p and q on a per frame basis as i = C(i/avg(Image)) where constant C is set empirically to C = 128 for 8-bit grey scale images and i = {p, q}.
The most challenging glyph extraction conditions are generally found in bright sunlight conditions. In Fig. 4 (right) we see an example of thresholding using the proposed approach under such conditions. Of the four binary images produced (Fig. 4, left) we see the arrow glyph easily becomes disconnected in all but one (Fig. 4, top leftmost). This use of a multi-level adaptive threshold approach facilitates robust connected glyph extraction even in the presence of extreme lighting variations and noise (e.g. shadows of Fig. 4, right).
Overall the approach performs well as a robust, real-time methodology for glyph extraction from the road surface that operates successfully both under daylight and night driving conditions over a wide range of illumination conditions.
4.2 Shape isolation
From these binary images, a set of connected image contours are extracted using a backtracking approach [17] prior to simplification into a closed polygon shape representation using the Douglas–Peuker derivative of [18]. This is performed on over all four versions of the road surface IPM transformed image that result from earlier multi-level adaptive thresholding.
Figure 5 shows some examples of the closed contour con- figurations extracted from these images for differing types of road-marking glyph. On the left (in Fig. 5), we see the IPM input image whilst on the right we show the simplified polygon shapes extracted from the four levels of binary thresholding applied to the IPM input. Notably the complexity of extracted contours does vary significantly
Fig. 5 Examples of shape isolation in the post-adaptive threshold images
4.3 Shape post-processing
In order to simplify the later recognition task, and also as an initial method of false-positive glyph filtering we perform two additional stages of shape post-processing: complexity rejection and orientation normalisation.
Complexity rejection considers the complexity of the resulting simplified polygon representation extracted from the image contours [18] with a view to excluding overly simple or complex shape contours from further processing. At present this is performed using explicit minimal and maximal bounds on the number of segments each polygon contains. Currently, polygons with less than three segments or more than 35 segments are excluded (by empirical selection). Segments below this complexity are commonly found to be the rectilinear lane marks on the road surface (e.g. Fig. 2, left/right) after the contour smoothing applied by [18]. Those abov
收藏