Quantitative Evaluation of Overlaying Discrepancies in Mobile 1 Augmented Reality Applications for AEC/FM 2

14 Augmented Reality (AR) is a trending technology that provides a live view of the real and 15 physical environment augmented by virtual elements, enhancing the information of the scene 16 with digital information (sound, video, graphics, text or geo-location). Its application to 17 architecture, engineering and construction, and facility management (AEC/FM) is 18 straightforward and can be very useful to improve the on-site work at different stages of the 19 projects. However, one of the most important limitations of Mobile Augmented Reality (MAR) 20 is the lack of accuracy when the screen overlays the virtual models on the real images captured 21 by the camera. The main reasons are errors related to tracking (positioning and orientation of 22 the mobile device) and image capture and processing (projection and distortion issues). 23


Introduction
for achieving the aim of the present work, but it is a rough approximation and does not take 239 into consideration that the virtual flag is not really placed on the plane of the bullseye, but at 240 any point of the sightline crossing that plane. As a result, the virtual flag could represent the 241 projection of any of the points of that sightline, not only the one intersecting the bullseye.

3.1.
CEsARe, the MAR application 257 This paper will show the benefits of its contributions applied to a new MAR application, 258 CEsARe (Construction Engineering software for Augmented Reality). This is a software-tool 259 specifically designed to represent in AR, by means of a portable electronic device, the 3D 260 model of the project in the construction site or in any other environment. As a result, the 261 virtual model (and its attached attributes) can be seen superposed to the real scenario of the 262 construction site taken by the camera (Fig. 2). The application permits interaction with the 263 virtual objects on the screen, representing existing elements of the environment, already built 264 elements of the project or future elements still to be erected. Therefore, it is possible to obtain 265 scene that has to be implemented with all the information available for the user. Then, the 286 virtual models and the additional information (images, texts, web pages, documents, etc.) 287 have to be stored in a web server, permitting access to the authorized users of the application. 288 Information and virtual data can be downloaded in real-time from the server in such a way 289 that it can be previously added to the repository by another designer at the studio and, from 290 then on, can also be incorporated to the mobile device via 3G/4G or Wi-Fi. This quick-response 291 function allows the user to ask for changes to the technical office that can be visualized in the 292 application almost immediately. 293 The mobile device can receive continuous information about its position via GPS, either 294 directly through the internal GPS receiver (uncorrected location data) or indirectly by 295 Bluetooth from an external GPS collector, providing higher accuracy (corrected location data). 296 This auxiliary GPS device requires data connection, which can be provided by the mobile 297 device using tethering over Wi-Fi or directly by means of a 4G connection. 298 Therefore, for obtaining an accurate superposition of the virtual models over the reality 299 captured by the mobile camera, four main challenges had to be fulfilled: i) generation of the 300 virtual scene in an AR platform after modelling it by means of CAD or BIM, ii) exact geo-301 location of the device, iii) correct orientation of the scene and iv) precise overlaying or 302 superposition of the virtual models over the real image through the camera lens. 303

3.3.
Generation of the AR scene 304 The need of creating a multi-platform application led, among other factors, to choose Unity 305 3D (Unity Technologies 2015) as the AR engine for developing it. Unity 3D allows the 306 deployment of the code in C# or JavaScript to the full range of mobile, VR, desktop, Web, 307 Console and TV platforms. Nevertheless, all the different tests and trials for this work have 308 been performed on Android operating system with a tablet Samsung Galaxy Tab S2 9.7". 309 In order to produce the full virtual scene for the implementation of each project, it is necessary 310 to generate and locate the 3D models previously, which can be imported to the scene in 311 different formats. For this project, Autodesk Civil 3D was used to create the BIM models of 312 the linear infrastructures. Then, after a post-processing phase, they have been segregated 313 upon certain criteria, e.g. constructions phase, material, type of infrastructure, etc. 314 Subsequently, these virtual objects have been converted to OBJ because this format permits importing them before compiling in the engine platform or after the compilation, in run-time 316 on the actual MAR application. 317

3.4.
Geo-location: accuracy test and assessment 318 The combination of position and orientation is referred to as the pose of an object or user.  372 Once the mobile device is correctly geo-located in place, it is necessary to know where it is 373 focusing at. Therefore, one of the main challenges of the MAR is the correct orientation of the 374 mobile device in the real scene with regard to the six degrees of freedom, e.g. position X, Y, Z 375 and rotations around these axis: pitch, yaw (or heading) and roll respectively (Fig. 6). there could arise two main kind of inaccuracies: i) orientation is not perfectly aligned with the magnetic or true north because magnetometers suffer from noise, jittering and temporal 388 magnetic influences (Schall, Mulloni, and Reitmayr 2010) and ii) there could exist a drift of the 389 3D models related to the background camera image (Schall, Zollmann, and Reitmayr 2013). 390

Orientation: evaluation of magnetometer and gyroscope
Related to the first issue, the magnetometer of a high-end mobile device may have a precision 391 of not less than ±2 degrees, which could be insufficient accuracy for some measuring purposes.   and N320, where N20 means heading 20° North) when the mobile device is rotated along its 410 X axis, varying the pitch from 0° (looking forward) to 90° (looking downward). This variation is 411 not consistent, increasing in some cases (N190 and N320) and decreasing in others (starting 412 N20, N125 and N170), changing the North signal in only 15° (N20) or up to 160° (N170). Finally, 413 another limitation is that the signal from the magnetometer is not smooth and shows a lot of 414 trepidation, which can be appreciated in  478 Once the MAR device has been correctly geo-located and orientated, there may be some 479 misalignments between the contours of the real objects and the virtual objects. In Fig. 10

490
Virtual projection of a 3D scene onto a 2D plane on the AR engine is achieved through a 491 perspective projection camera (Unity Technologies 2015). Therefore, it was necessary to apply 492 the same projection model of the real camera to the virtual camera configured in the AR 493 engine. 494 The first concern affects specially to the angular field of view (AFoV). Even though some 495 mobile devices identify the optical characteristics of their built-in cameras, sometimes the 496 specifications are not reliable or unambiguous enough to be included as input data in the MAR. 497  with the camera a tabulated grid from different distances and thus obtaining the angular size 505 of the view cone. It was observed that AFoV changed depending on the distance to the 506 panorama captured by the real camera, being slightly wider when the tabulated grid was 507 further (Fig. 11). In fact, it could be observed that in all the cases the squares of the tabulated 508 grid appeared more expanded at the edges of the picture than at the center. It was thus 509 concluded that the most influent deviation had to be originated by the distortion produced by 510 the lens, which will be analyzed in the following section. It is well known that optical lens may produce deviation from rectilinear projection, arising to 517 a deformation of the image captured by the device camera. The most commonly encountered 518 distortions are radially symmetric, classified as either barrel, pincushion or moustache 519 distortions, depending on the shape of the optical aberration. The deformation of the image, 520 especially in its perimeter, modifies the theoretical AFoV and makes it impossible to measure 521 angles and distances. Additionally, it creates some misalignments between the real and virtual 522 objects of the scene, which is more relevant for this application. 523 3.6.2. Test No. 6: Distortion of the camera lens 524 Therefore, it was necessary to define the distortion of the device camera and apply it to the 525 virtual camera. To do so, it was used the Brown-Conrady distortion model (Brown 1966), 526 calculating the parameters that rule the angular and tangential distortions produced by the 527 lens by means of a Matlab Toolbox (Bouguet 2015). Fig. 12 shows the complete distortion model of the camera of the tablet Samsung Galaxy Tab S2 9.7", its calibration parameters 529 (focal length, principal point and the skew, radial and tangential coefficients) and

3.7.
Quantitative evaluation of overlaying discrepancies 540 The last experiment was carried out in the same test field, where virtual and real points were 541 strategically positioned to calculate the overall accuracy of the superposition. Fig. 13 shows 542 several scenes taken with CEsARe: above, there is a screenshot placing the mobile device at The unit vector n comes from the vector N (Eq. 5), being N = (Pprp -Pref) the direction for the 594 Zv axis. V is the view-up vector, which in our case should be (0,0,1) if the camera is correctly 595 balanced with null roll. Then, u is defined as a unit vector perpendicular to both v and n (Eq. 596 6). Finally, v is the cross product of n and u (Eq. 7). 597 The transformation from viewing to perspective-projection coordinates is defined by the 598 following perspective matrix: 599 Eq. 8 Being θ the field-of-view angle of the cone of vision of the camera, and AR the Aspect Ratio 600 (width / height) of the view plane. Znear and Zfar are the distances from the projection 601 reference point (Pprp) to the near clipping plane and the far clipping plane of the frustum view 602 volume. 603 The transformation from perspective-projection coordinates to screen pixels (referred to the 604 center of the screen) is defined by the following matrix: 605

Eq. 9
Being xVmax=w/2, xVmin=-w/2, yVmax=h/2 and yVmin=-h/2 the corner positions of the 606 screen, defined by the resolution of the screen in pixels. 607 Finally, the last transformation changes coordinates from center-screen (Xcs, Ycs, Zcs) to 608 upper-left-screen (Xul, Yul, Zul) referenced pixels, as measured in most image editor software. 609 It is necessary to translate the origin of coordinates and to mirror the Y axis, so the 610 transformation matrix is defined by: to be followed in order to perform for estimating the evaluation on any AR application: 652 8) Calculating the shortest distance, in 3D world coordinates, between the sightline 672 (Pprp-Pv) and the position of the real element (P) (eq. 14). 673 9) Applying steps 3 to 6 to several scenes, from different points of view, capturing one 674 or several same points. 675 10) Calculating the least-squares intersection of the sightline (Pprp-Pv) of each scene to 676 find the point that better fits the intersection of those sightlines of a same point 677 from different points of view (one for each scene) (eq. 15 and 16 This quantitative evaluation can be illustrated in the following example, taking into 684 consideration three scenes (the first and third scenes shown in Fig. 13), taken with the tablet 685 Samsung Galaxy Tab S2 (screen width w=2048, height h=1536, vertical field-of-view θ=50º). 686 The point P chosen for the estimation of discrepancies is P101 (435416.240, 4813495.555, 687 33.987), as this element is observable in the three screenshots. Table 1 exposes the initial  688 parameters, conditions and final results after the calculations for every scene. It should be 689 remarked that V is not always (0,0,1) exactly, as it depends on the levelling of the tripod. 690 Attending to the outcomes, it can be concluded that the DL-SQ, the distance between the optimum point achieved at the least-squares solution (Ṗ) and the real position of the point (P), 692 is 0.054 m (5.4 cm), while DM, the maximum distance between the sightlines (Pprp-Pv) and 693 the real position of the point (P), is 0.085 m (8.5 cm

698
It has been stated that there are several sources of possible flaws that do not permit to obtain 699 a perfect superposition of virtual models over their corresponding real entities. The synthesis 700 of the results, including factors, methodology for contrast and evaluation, partial accuracy and 701 remedial actions is presented in Table 2