MTR Travel Record and Coronavirus Analysis
![](https://cdn.arcgis.com/sharing/rest/content/items/58a6340a266b4222b91aec36ae30439b/resources/J_S5cv-xYQ_CmahrAh3PU.jpeg?w=20)
Project Background
This is the third year since the outbreak of COVID-19. New variants keep flooding the world. As one of the largest, most crowded and busiest cities in the world, Hong Kong is undergoing the fifth severe surge of the pandemic.
Whilst citizens make their best effort to fight the virus, MTR, the most extensive railway system carrying millions of passengers every day, continues its operations to facilitate the essential routine of the whole megacity. Initialized by HKU and MTRC, this project aims to analyze the correlations among passenger traveling behavior and coronavirus cases, based on the large scale travel database provided by MTRC. With ArcGIS's advanced maps in hand, our team make some important visualization available for the experts to have a deeper understanding about the intricate interaction between passenger travel patterns, lockdown policies, and coronavirus, which can allow policymakers to figure out better travel ways to cope with coronavirus in order to balance travel convenience, infection risks, and economic developments.
Here we will introduce our analysis&visualization platform to you, including platform structure and some highlighted visualization examples.
![](https://cdn.arcgis.com/sharing/rest/content/items/58a6340a266b4222b91aec36ae30439b/resources/k7BcIQiwvsZPK_KVIwjTH.jpeg?w=20)
Backend Infrastructure
Performance Analysis
The performance of the three metrics available on the platform was measured with chrome developer tools, where the runtime taken to send requests, to wait, and to download content were recorded automatically.
For travel pattern, the runtime for 1 month, 3 months, 5 months, and 7 months with two starting dates were recorded and presented on the left of the below figure. A watershed can be spotted at 2 months, where a timespan shorter than 2 months would complete rendering in 10 seconds, but that longer than 2 months all consistently complete rendering at about 2 minutes. For station density, the runtime for 1 month, 3 months, 5 months, and 7 months with two starting dates were recorded and presented in the middle of the below figure. These tasks with various timespans were all completed in 1 second. For passenger volume, the runtime for 1 month, 3 months, 5 months, and 7 months with two starting dates were recorded and presented on the right of the below figure. Tasks were all completed within 1 second, presenting an increasing trend as the timespan lengthened.
Runtime to render travel pattern (left), station density (middle), and passenger volume (right)
Even though the three metrics were precomputed by day in tables, aggregation was still needed to sum up the number from each day. The long runtime taken for the travel pattern likely resulted from the complexity of its grouping requirement. For travel pattern, the precomputed table needed to be further grouped by two variables, namely the entry station and the exit station; for station density, grouping by station should be enough; for passenger volume, card type was the grouping variable. There are 96 MTR stations currently, and considering the size of our trip records, the grouping requirement of O(n 2 ) complexity would likely be a heavy burden on the server. The potential reason for such system lag may be due to the volume of data and its computation can’t fit into a buffer, whose capacity was equivalent to the memory needed for computation of the 2-month timespan. Those with a timespan shorter than 2 months can retrieve data directly from the buffer, and that longer would have to retrieve from the hard disk, resulting in a consistently increased runtime.
Database Migration to NoSQL
It was clear from the above analysis that the travel pattern had the longest runtime and the grouping operation was a key bottleneck. Thus, we decided to migrate our data onto a NoSQL database for its efficient retrieval and aggregation of data. The runtime of the travel pattern was recorded for 1 month, 3 months, 5 months, and 7 months by directly querying from MongoDB and MySQL servers respectively, as presented in the below graph. In MySQL, the travel pattern of 1 month was computed quickly, but the runtime increased greatly to over two minutes when the timespan was more than 2 months, a phenomenon that was consistent with the findings in the above section. As for MongoDB, the runtime increased on a linear trend as the timespan lengthened, but all of them were complete within 10 seconds, a strict improvement from the MySQL implementation.
Runtime to render travel pattern on MySQL/MongoDB server
Platform Functionality
The platform can accomplish data filtering and querying for both COVID-19 confirmed cases and the MTR passenger database. Some of the basic data visualization tasks can be realized on the frontend platform, including travel pattern, station density, and passenger volume. It can also perform a few contact and behavior-based analysis.
Query
We provide data querying for both our database with multiple filters, including date, station, card type, time distribution and station aggregation methods.
Visualization
The platform support 6 different visualization functionalities. Most of the visualizations allow user to select the time frame they want to study. Here is an example for station density:
Analysis
The platform can support data analysis topics such as "Someone like you" and "Sensor individuals". "Someone like you" helps identify passengers travelling with you in approximately the same period. "Sensor individuals" helps identify passengers having the most potential physical contact with other passengers.
Case Analysis
Equipped with Esri's ArcGIS Map, gigabytes of data can be visualized clearly. The user can further make use of different types of visualization to conduct analysis and case study to help this city understand more about the ongoing pandemic.
Here we introduce three major research ideas. The first one is to monitor the pandemic development and the railway system's operation. The second one is to examine the effects of some control policies by MTR or the government. The third one is to identify and investigate passengers and stations with special identities.
Travel Pattern Analysis
By proper visualization, we can clearly see how the railway system operates for each period. Take the following travel pattern of a single day as an example:
Passenger's Top 200 Travel Pairs (Entry-Exit Station Pairs) of One Day in March, 2020
We can identify the Entry-Exit Station Pairs with the most passenger volume, meanwhile displaying the COVID cases. Those stations pairs with one side closed to COVID cases, or even connecting COVID cases may need more attention since many passengers share the same travel pattern, also the infection risk between the stations.
Essential Station Analysis
The map can also show changes after certain implementations of policies. For example, we define the essential station as the stations with comparatively less passenger volume loss.
When the Government Advised Employees to Work From Home (Left) vs. A Normal Day in Pandemic (Right)
The above two maps are the top ten essential stations for two single days. Yellow stands for stations that passengers kept using for the entry of the railway system despite the pandemic, and blue stands for stations that passengers kept using for the exit of the railway system. We can see from the map that Admiralty Station is normally a destination for many passengers even during pandemic. However, on the first day that the government advised employees to work from home in January, 2020, as the place for Central Government Offices, Admiralty Station disappeared from the top ten essential exit stations. This can to some extent show that the advice is effective in reducing passenger flows of the railway system.
Commuter Analysis
Commuters are a particular group of MTR travelers. They are usually very dependent on public transportation and travel on a regular basis. We classified those who travel between two points each day, stay at their job station for at least 6 hours and repeat this pattern at least three times a week as commuters. With the visualization by ArgGis, we can see the distribution of home and job stations for commuters in Hong Kong. The picture below shows the distribution of the top 20 home and job stations before and after the COVID outbreak. The pandemic causes the MTR to lose many of these travelers, possibly due to work from home policy.
Commuters density for home and job stations before COVID outbreak (left) and after COVID outbreak (right)
We can also identify some of the extreme riders and visualize their travel behavior. For example, early birds are defined as the travelers who take their trip to the job station before 6 AM, and night owls are defined as the travelers who take their trip back to the home station after 11 AM. The distribution of these two groups also shows some differences.
Early bird commuters (left) and Night owl commuters (right) in a week
To further investigate the travel pattern of commuters, we analysis the travel records and visualize the most traveled routes by the commuters. The visualization is mostly aligned with our intuition, as most of the trip starts in residential areas and ends in business areas. The pattern mostly remains the same before and after the COVID-19 outbreak.
Top 100 traveled routes for commuters of one week in March 2020
Implication
As the COVID-19 pandemic is still ongoing as a major public health burden for the world, this project aims to facilitate mobility research taking into account the epidemic development by building a scalable and secure one-stop platform for data query and ArcGIS visualizations. To increase the scalability of the backend, the team is working on replacing the current relational database with a new NoSQL database. Such improvements from both the front-end and backend can help build the current platform to be more robust and scalable, which is of core importance with the increasing number of concurrent operations on the platform. Our team hopes the platform can facilitate the use of mobility data by researchers to look into ways to contain the spreading of the coronavirus, and most importantly, infer knowledge to keep the world prepared in case of future outbreaks.