Toward Understanding Crowd Mobility in Smart Cities through the Internet of Things

Understanding crowd mobility behaviors would be a key enabler for crowd management in smart cities, benefiting various sectors such as public safety, tourism and transportation. This article discusses the existing challenges and the recent advances to overcome them and allow sharing information across stakeholders of crowd management through Internet of Things (IoT) technologies. The article proposes the usage of the new federated interoperable semantic IoT platform (FIESTA-IoT), which is considered as"a system of systems". The platform can support various IoT applications for crowd management in smart cities. In particular, the article discusses two integrated IoT systems for crowd mobility: 1) Crowd Mobility Analytics System, 2) Crowd Counting and Location System (from the SmartSantander testbed). Pilot studies are conducted in Gold Coast, Australia and Santander, Spain to fulfill various requirements such as providing online and offline crowd mobility analyses with various sensors in different regions. The analyses provided by these systems are shared across applications in order to provide insights and support crowd management in smart city environments.


I. INTRODUCTION
Sustainable development of cities is a major global challenge as more than half of the world population is living in urban areas. The smart city concept allows optimizing services for urban areas because or as a result of the advancement of the new technologies ranging from very small devices to big data centers. These technologies can be considered in the context of IoT, where many objects, devices, machines, and data centers are connected. The usage of IoT technologies for crowd management in urban environments is promising for the future of smart cities.
IoT technologies can enable many improvements for crowd management, which spans sectors such as transportation services (e.g., operating public transport or directing pedestrian traffic), public safety (e.g., detection of fighting incidents), and tourism (e.g., event management for enhanced visitor experience). For instance, movement behaviors of crowds may indicate situations such as traffic congestion, emergency incidents, and panic situations during certain events such as large gatherings in city squares.
While cities aim to achieve smart urban services, new challenges arise due to the limitations and deficiencies of the current systems and technologies in terms of scalability of the connected systems, information transparency between different systems (i.e., semantic interoperability) or stakeholders, data federation, and information privacy. When mobility information must be shared across multiple stakeholders, a proprietary infrastructure cannot fulfill all the different requirements that they impose. For example, some of the stakeholders expect real-time mobility monitoring service for event detection while others require historical mobility data analytics to analyze efficiency of services in different urban environments (e.g., train station, stadium, city square). It is very difficult to re-design a one-size-fit-all IoT system when new requirements arise for various environments or different time periods. A better solution is to provide "a system of systems" in which a new service can be easily developed or setup to handle any new requirement by leveraging existing technologies and infrastructure. To make such a system of systems useful, semantic models based on an appropriate ontology are needed for transparently exchanging data, analytics results, and allowing to share new insights from different crowd management applications. The federation of data, results, and learned insights is the key technical enabler to understand the crowd mobility behaviors in a smart city. Finally, privacy preservation is a problem of utmost importance for smart cities. While various data from vast deployment of sensors travel through the IoT systems, preserving privacy at a level closer to the data contributors (providers) is an important challenge.
This article describes the recent advances in IoT for understanding crowd mobility in smart cities. The federated and interoperable semantic IoT (FIESTA-IoT) platform for smart cities is introduced for the specific perspective of crowd management applications. Fig. 1 illustrates the outlook of the smart city applications leveraging the smart city platform for sharing information across various stakeholders. While the platform is currently in use for several smart city testbeds, the article focuses on two IoT systems for crowd mobility, namely Crowd Mobility Analytics System (CMAS) and Crowd Counting and Location System (CCLS) and discusses the aspects related to the aforementioned limitations.
Two pilot studies are conducted in Gold Coast, Australia and Santander, Spain, where various sensors are deployed in urban areas. The first pilot study uses CMAS in Gold Coast for a medium-scale smart city deployment. The requirements of the pilot include analyzing heavy or light pedestrian traffic at streets with or without vehicles. The second pilot study uses CCLS in an indoor market in Santander. The requirements include detecting people (crowd size) and locating their positions at public buildings of a city and other critical infrastructures. In both pilots, data anonymisation limits tracking devices for long time periods. On the other hand, online and offline analytics information needs to be shared across various stakeholders such as city councils and visualized in several interfaces using IoT technologies and infrastructure to provide insights for crowd management in smart cities.

A. Federated and Interoperable IoT Platform
Smart city data is often gathered by solutions where dedicated networks of sensors or data sources produce observations to be consumed by specific applications. The systems usually differ from each other, serving for distinct purposes, and they are mostly not interoperable [1], [2]. In this regard, creating crowd management services that harness the abundant data from a smart city (e.g., environmental data, road traffic information) would require either ad-hoc integration or creation of new systems. This situation raises a new requirement of an integrated "system of systems" or "container of systems".
To overcome this challenge, we propose a crowd mobilitybased instantiation of the FIESTA-IoT platform [3] and provide semantic interoperability from IoT deployments to the services (shown in Figure 2). The heterogeneous IoT deployments on the IoT Devices and Systems (bottom layer) are integrated to the Cloud and data is anonymised with salting and hashing. In this layer, in addition to the two crowd mobility systems (  FIESTA-IoT systems that can be leveraged. Currently more than 5000 sensors (from 11 integrated testbeds [4]) report environmental data (e.g., temperature, humidity, illuminance, noise level), road traffic information (e.g., vehicle speed, traffic intensity), car and bike parking spots, estimated arrival times of buses, and smart building information (e.g., human occupancy, power consumption). At the Federated Cloud Infrastructure (middle layer), the data from the bottom layer is modelled using the FIESTA-IoT Semantic Model and stored in the Linked-Data Storage. In particular, the semantic model for crowd mobility data is described in Section II-B. The data in the Cloud infrastructure is accessible through the Federated Context Management which exposes NGSI and SPARQL interfaces. Our open source IoT Broker (Aeron Broker) component provides scalable federation for the context management, whereas IoT Discovery (NEConfMan) enables easy registration and discovery of resources with features such as geo-discovery.
The crowd management-related IoT data is harnessed by Crowd Management Applications (top layer) which contain IoT services provided by the platform and crowd mobility applications. These services enhance the crowd mobility data through reasoning by aggregating the semantic data and assessing the situations related to physical objects (i.e., Contextualization Service) at different levels of abstraction such as buildings level or street level. Assessment of the situations can be performed through; a) pre-defined thresholds, b) anomaly detection, c) time-series analysis, d) artificial intelligence. The obtained situations are displayed on the dashboard in Figure  1, named Smart City Magnifier, which reports alerts regarding traffic status, crowd flows, critical events (e.g., fire bursting), and so on. Moreover, crowd mobility applications such as Gold Coast Operation Center and SmartSantander Maps receive the results (generated by CMAS, CCLS or other IoT services) from the Cloud and provide visualizations.

B. Crowd Mobility Semantic Model
In order to provide seamless interoperability and information transparency from IoT systems to the crowd management applications, the crowd mobility outcomes are semantically annotated following the FIESTA-IoT ontology [5] as shown in Fig. 3 (with a stress on the specific taxonomy of M3-lite for crowd mobility).
Rich and complex knowledge is represented with an ontology as things are connected to each other through relationships. Things are not identified as individuals, but as classes of individuals. Moreover, a class might have sub-classes. For example, peopleCounterX is an instance of PeopleCountSensor class which is a subclass of Counter (see Fig. 3). The classes can be defined and described in taxonomies and an ontology may use classes from different ontologies or taxonomies. Relationships between classes are known as properties and it is possible to define properties' cardinality. Each class and property used in an ontology is uniquely identified by a namespace prefix and the class or property specific name. For example, m3-lite:PeopleCountSensor is a class defined in the M3-lite ontology. For the sake of readability, in this paragraph we are omitting the namespace prefix while they are shown with prefix in Fig. 3.
The core concept is the SensingDevice, representing a sensor that produces Observation, which is a measurement (or computation) of a phenomenon related to an object happened at a specific Instant. For example, a crowd mobility detector can be seen as a Device composed of multiple SensingDevices. In this sense, such a detector can have one PeopleFlowCountSensor and one StayingPeopleCountSensor, which are subclasses of PeopleCountSensor. The Observation(s) is expressed with a QuantityKind having a Unit. Following our example, the QuantityKind associated to the data generated by the People-FlowCountSensor is CountPeopleMoving (subclass of Quanti-tyKind) with Item as its Unit and with the Direction property expressed either in geodetic DirectionAzimuth or as a generic DirectionHeading. The directions start from the Point that is the location of the physical Platform. Platform is meant as the supporting dock to which the Device is attached. The Staying-PeopleCountSensor generates CountPeopleStaying values expressed in Item. The system also consists of PeopleStayDura-tionSensor that generates PeopleStayDurationAverage values measured in SecondTime. Each SensingDevice might have a Coverage, specified either as Polygon/Rectangle/Circle or as a simple Point. This indicates the geographic extent of the Observation.
C. Integrated IoT Systems 1) Crowd Mobility Analytics System: The CMAS (extended from our system in [6]) is integrated with the platform via semantic annotation of the outcome. The developed system consists of Wi-Fi sniffers, stereoscopic cameras, IoT gateways, and data analytics modules. The Wi-Fi sniffers are capable of capturing wireless probes broadcasted by mobile devices.
Based on the captured Wi-Fi probes, the system can count the mobile devices in these sensing areas. The cameras are colocated with specific Wi-Fi sniffers deployed at the dedicated calibration choke points. A built-in people counting software runs in the cameras. Both Wi-Fi device detection and people counting results are reported to to the Cloud, where data analytics modules reside, through the IoT gateways. Three analytics modules are developed: crowd estimation, people flows, and stay duration. The crowd estimation module outputs number of people by correlating the stereoscopic camera counts and the number of Wi-Fi enabled devices at the calibration points. Based on the correlation between the two data modalities, the calibration of the data analytical results are applied in other sensing areas without cameras. The module monitoring people flows infers crowd movement in these areas. Finally, the stay duration module estimates the waiting times and the number of waiting people. All analytics results are exported to the Federated Cloud Infrastructure so the crowd analytics results are discoverable and available for applications in the smart city platform.
2) Crowd Counting and Location System: Different from CMAS, CCLS aims at analysing crowd behaviour in public buildings of a city, as well as critical infrastructures. The system relies in the analyses of IEEE802.11 frames to discover devices in the surroundings of the deployment, normally within the monitored areas. Similar to CMAS, the deployed nodes capture "Probe Request" frames sent by smartphones, which include a Wi-Fi interface in "active search" mode, incorporated in most of them. However, CCLS does not only aim at detecting people, but also aims at locating them. For this, the system stores the RSSI and sequence number from the captured frames. It is possible to locate people by processing this information using RSSI-based algorithms. All the postprocessing is performed in an edge server, where all the measurements are sent after the corresponding anonymisation techniques are applied. Once the anonymised raw measurements are analyzed and the counting and location analytics applied over them (i.e., the estimated crowd size and positions are obtained), these observations are semantically annotated and pushed to the Federated Cloud Infrastructure. For the semantic modelling, each crowd estimator is modelled as an PeopleCountSensor, with a specific Coverage (representing the area to which the estimations apply), that generates CountPeople observations expressed in Item.

D. Privacy Considerations
One of the essential requirements is dealing with tracked devices' privacy. Nowadays, privacy is one of the major public concerns. In this sense, data protection laws have to be observed when handling data that could be personal. Quite restrictive rules apply in most countries of the world, being the countries from the European Union (EU) some of the most restrictive ones. These rules are recently updated through the General Data Protection Regulation (GDPR) [7]   of data protection should apply to any information concerning an identified or identifiable natural person". Therefore, Wi-Fi-based tracking services in public or private spaces can be performed only if the service obtains the user's opted-in permission, or data is anonymised in such manner that the user is no longer identifiable, as mentioned in the 26 th article from the aforementioned regulation. The Article 29 Working Party, recently replaced by the European Data Protection Board (EDPB), is in charge of analysing the compliance of the privacy rules. In a document released to analyze the ePrivacy regulation compliance with GDPR [8], the Data Protection Working Party states that Wi-Fi tracking can only be performed either if there is consent or the personal data is anonymised. Within the same document, four conditions are mentioned for the latter case to be compliant with the GDPR: • The purpose of the data collection from terminal equipment is restricted to mere statistical counting. • The tracking is limited in time and space to the extent strictly necessary for this purpose. • The data will be deleted or anonymised immediately afterwards. • There exist effective opt-out possibilities.
Considering that user's permission request is impossible to obtain in normal conditions within the subject of the experimentation, the only option is to anonymise data regarding to MAC addresses. Thus, experimentation security measures must be undertaken to address both, data integrity and anonymisation. Therefore, any type of experimentation or service provision must take into account this concern, which is usually underestimated by system developers.

CCLS in Santander is based on the Spanish Personal Data
Protection Laws and the Spanish Law Protection Office recommendations for data anonymisation [9]. The recommendation consists on the use of a cryptographic hash function with randomly generated hash keys. More precisely, the HMAC protocol, which provides such mechanisms, is recommended. In the SmartSantander deployment, we implement the HMAC algorithm along with the SHA256 hashing function, with a 12-bytes randomly generated key. Finally, in order to ensure a non-reversible process, this implementation also comprises a procedure to destroy and renew the key during specific session periods. For CMAS in Gold Coast, the hashed and salted Wi-Fi probe data is sent to the Cloud. The stereoscopic cameras do not record video or perform face detection. The cameras simply count the passage of people through predefined lines at the choke points. The outputs of the camera are people countin and -out values. The main drawback of this procedure is the limitation of tracking devices throughout long periods (as in [10]) or longer travels within the city, but it is the price that must be paid to meet the privacy requirements. The deployments target two regions. These sensors deployed in these areas are considered as Cluster 1 for (expected) heavy pedestrian traffic and Cluster 2 for light traffic places. Each cluster has a stereoscopic camera for the calibration. The collected data is sent to the Cloud where two virtual machines are created for the clusters. Clustering the areas allows applying CMAS to city-scale by sharing the raw data load.
2) Pilot Operation: The pilot study activities started in September 2017 and CMAS has been in use starting from November 2017. Various types of pilot tests are conducted on the field during the operation of the pilot. Manual counting is performed using video footages taken from different deployment areas. In comparison to manual counting, the cameras provide an accuracy between 88% and 98%, which mainly depends on the weather and lightning conditions. Furthermore, field tests for heavy and light traffic areas resulted in 93% and 89% crowd size accuracy compared to manual counting. The results obtained from outside the choke points give further confidence to treat stereoscopic camera results as near ground truth as proposed in [6].
Gold Coast pilot successfully tests the crowd mobility analytics services by leveraging federation of clusters and interoperability using the semantic model to share the results with stakeholders. This shows that similar systems can be developed and leveraged by future crowd management applications using the smart city platform.
B. Pilot Deployment in Santander 1) Pilot Setup: CCLS is deployed in the "Mercado del Este" market, a restored symmetric building that contains shops, restaurants, a regional tourist office, and a museum. This building is particularly interesting as it usually receives significant numbers of visitors due to its central location, with exceptionally crowded periods.
The system is composed of 8 devices installed within the market building. These devices include a Wi-Fi interface aimed at detecting surrounding Wi-Fi enabled visitors' devices. Internet connectivity is provided through the Municipality Network, and the devices are powered using Power over Ethernet connected to the market's electrical grid. In addition to the wireless interfaces, half of the devices also include environmental sensors measuring temperature and humidity. Device deployment is carried out with the collaboration and supervision of the municipality and the market managers. Considering the main goal of monitoring people within the market, two parameters are considered in order to get market status snapshots over the time. First, the number of visitors within the market in different time frames and second, the location of the visitors in the different areas of the market.
2) Pilot Operation: Firstly, in order to monitor the visitors within the market, we follow a deterministic approach, in which we consider that a device is inside the building if a minimum of 6 sensor nodes detect it with a certain level of RSSI. In our deployment, this solution is feasible considering the particular symmetric distribution of the building and the location of the sensor nodes, covering the external wall of the building. Secondly, device locations are estimated using the Weighted Centroid Algorithm [11], which provides a reasonable approximation of 5 meters to the ground-truth measurements without any ad-hoc calibration. For the cases that require more precision, these positioning methods are able to introduce less than 2 meters error if the system is calibrated in advance. Synthesized information including realtime visitor location and detected number of visitors per unit of time is provided through a web portal to the market managers and municipality responsibles. Fig. 4 shows the heat map of the market in a specific moment. Other parameters, such as the visitors' dwell time in different long-term periods, are not analysed due to the privacy safeguards that have to be addressed.

IV. CITY-SCALE EXPERIMENTS
This section discusses some of the experimental observations from the Gold Coast pilot with CMAS. Specifically, it includes the variance in the crowd estimation for the Wi-Fi sensors and cameras. Our focus in the experimental study is to observe the dynamic changes in the number of unique Wi-Fi devices detected and the correlation coefficient (or simply camera/Wi-Fi ratio), which is a dynamic parameter that is computed by the Adaptive Linear Calibration Algorithm [6]. The coefficient basically indicates the proportion of the number of people (count-in and count-out events) detected by the camera to the number of devices detected by the Wi-Fi sensors every time interval. We analyze the hourly results for the two   Figure 5-a shows the average number of Wi-Fi devices detected for one-week period. There exists an increased activity in Cluster 1 region especially during Friday (23/03/2018) and the following weekend. This can be due to crowdedness in the shopping street and the beach area contained in this region. Moreover, there is a peak in Saturday that can be due to an event or gathering. Figure 5-b shows the change of the coefficients (ratios). The ratios are computed at the calibration choke points (providing near-ground truth for the measurements). The hourly ratio is computed such that number of people count-in and count-out events are divided to the number of Wi-Fi probes. First, for Cluster 2 with light traffic, correlation coefficient is mostly (almost all days) higher compared to Cluster 1. Second, correlation coefficient values lie mostly in the range of (0.2, 2), whereas the peak value is about 2.8. This indicates that the results based on Wi-Fi-only measurements are likely to have less accuracy most of the times of the days and the correlation changes throughout the days. Lastly, there exists certain regularity in the correlation from one day to another, which can be learned through a time period and then applied to other time periods where camera is temporarily inactive or removed. On the other hand, as seen in the peak hours of Cluster 2, the ratios do not lie within a narrow range. One reason can be events affecting the volume of pedestrians. Lastly, Fig. 5-b shows relatively higher variance of the coefficient for areas with light pedestrian traffic. Calibration could be necessary for shorter time intervals.
Overall, it is observed that effective use of Wi-Fi sensing and combining them with sensing by stereoscopic cameras produce accurate sensing in large scale for both the heavy and light pedestrian traffic areas. Moreover, the variance between heavy and light traffic shows the usefulness of the clustering approach which treats these regions separately.

V. RELATED WORK
There are recent studies that focus on understanding of human mobility through IoT devices such as wireless sensors. Jara et al. [12] observed the relation between traffic behavior and temperature conditions as a smart city application through deployment of IoT devices in Santander. Tong et al. [13] propose usage of Wi-Fi sensors to understand passenger flows. Evaluation through simulation results shows high accuracy. Zhao et al. [14] survey the recent advances in understanding human mobility in urban environments. The study lists some of the existing urban human mobility datasets collected such as GPS, GSM, Wi-Fi, and Bluetooth traces. Similarly, Zhou et al. [15] discuss the topic of human mobility in urban environments and present a taxonomy of crowdsensed input data types and application outcomes such as crowd density and flows within building, and people transportation mode identification (cycling, running, bus riding). Lastly, Montjoye et al. [10] focus on the privacy aspect by analyzing long period Wi-Fi traces and show that 95% of the individuals can be uniquely identified using spatiotemporal datasets.

VI. FUTURE WORK AND CHALLENGES
The current work focuses on finding insights behind crowd mobility such as detecting crowdedness. However, understanding more complex crowd mobility behaviour in a largescale city area such as movements of groups (e.g., family) could be helpful for crowd management and enhancing smart mobility in the cities. The collected mobility information can serve as input of human mobility simulations to further study how city dynamics are affected by crowd mobility patterns. With the combination of real mobility dataset in a simulated environment, learning new mobility insights opens up new opportunities for new crowd management strategies (e.g., congestion avoidance, evacuation planning, demand management) that can further improve the public service and safety in smart cities.
In our future developments, the semantic interoperability through ontologies can be leveraged more extensively for cross-infrastructure communication and knowledge sharing. The new advancements of the NGSI protocol by the ETSI Industry Specification Group (ISG) on Context Information Management (CIM) are centered around the concepts of linked data. This opens a new horizon where knowledge graphs are shared among various infrastructures and, while their administrators own the produced data, it is still accessible seamlessly and transparently by all actors in the multi-infrastructure federation.

VII. CONCLUSIONS
This article discusses the new advancements towards understanding crowd mobility in smart cities using IoT. While there exist certain limitations, the CMAS and CCLS systems using the smart city platform offer improvements for more efficient crowd management. The pilot studies in Gold Coast and Santander show the capability to fulfill various requirements and share information across stakeholders by leveraging the IoT technologies and infrastructure.