Abstract
The sharing of large-scale transportation data is beneficial for transportation planning and policymaking; however, there are privacy concerns with data sharing, as it can include identifiable personal information, such as individuals’ home locations. To address these concerns, synthetic data generation based on real transportation data offers a promising solution that allows privacy protection while potentially preserving data utility. Although there are various synthetic data generation techniques, they are often not tailored to the unique characteristics of transportation networks. In this paper, we use New York City taxi data as a case study to conduct a systematic evaluation of the performance of widely used tabular data generative models. In addition to traditional metrics such as distribution similarity, coverage, and privacy preservation, we propose a novel graph-based metric tailored specifically for transportation data. This metric evaluates the similarity between real and synthetic transportation networks, providing potentially deeper insights into their structural and functional alignment. We also introduce an improved privacy metric to address the limitations of current metrics. Our experimental results reveal that existing tabular data generative models often fail to perform as consistently as claimed in the literature, particularly when applied to transportation data. Furthermore, our novel graph metric reveals a significant gap between synthetic and real data. This work underscores the need to develop generative models that take advantage of the unique characteristics of transportation networks. Full version athttps://www.arxiv.org/abs/2502.08856.
| Original language | English |
|---|---|
| Title of host publication | Unknown book |
| Publisher | Springer Science and Business Media Deutschland GmbH |
| DOIs | |
| State | Published - 2025 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 11 Sustainable Cities and Communities
Fingerprint
Dive into the research topics of 'A Systematic Evaluation of Generative Models on Tabular Transportation Data'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver