Dataset: on the similarity of web measurements under different experimental setups

Abstract: Measurement studies are essential for research and industry alike to understand the Web's inner workings better and help quantify specific phenomena. Performing such studies is demanding due to the dynamic nature and size of the Web. An experiment's careful design and setup are complex, and many factors might affect the results. However, while several works have independently observed differences in the outcome of an experiment (e.g., the number of observed trackers) based on the measurement setup, it is unclear what causes such deviations. This work investigates the reasons for these differences by visiting 1.7M webpages with five different measurement setups. Based on this, we build dependency trees' for each page and cross-compare the nodes in the trees. The results show that the measured trees differ considerably, that the cause of differences can be attributed to specific nodes, and that even identical measurement setups can produce different results. Abstract: Measurement studies are essential for research and industry alike to understand the Web's inner workings better and help quantify specific phenomena. Performing such studies is demanding due to the dynamic nature and size of the Web. An experiment's careful design and setup are complex, and many factors might affect the results. However, while several works have independently observed differences in the outcome of an experiment (e.g., the number of observed trackers) based on the measurement setup, it is unclear what causes such deviations. This work investigates the reasons for these differences by visiting 1.7M webpages with five different measurement setups. Based on this, we builddependency trees' for each page and cross-compare the nodes in the trees. The results show that the measured trees differ considerably, that the cause of differences can be attributed to specific nodes, and that even identical measurement setups can produce different results. TechnicalRemarks: This repository hosts the dataset corresponding to the paper "On the Similarity of Web Measurements Under Different Experimental Setups", which was published at the Proceedings of the 23nd ACM Internet Measurement Conference 2023.

Cite this as

Demir, Nurullah, Hörnemann, Jan, Große-Kampmann, Matteo, Urban, Tobias, Holz, Thorsten, Pohlmann, Norbert, Wressnegger, Christian (2023). Dataset: Dataset: on the similarity of web measurements under different experimental setups. https://doi.org/10.35097/1719

DOI retrieved: 2023

Additional Info

Field Value
Imported on November 28, 2024
Last update November 28, 2024
License CC BY-NC 4.0 Attribution-NonCommercial
Source https://doi.org/10.35097/1719
Author Demir, Nurullah
Given Name Nurullah
Family Name Demir
More Authors
Hörnemann, Jan
Große-Kampmann, Matteo
Urban, Tobias
Holz, Thorsten
Pohlmann, Norbert
Wressnegger, Christian
Source Creation 2023
Publishers
Karlsruhe Institute of Technology
Production Year 2022
Publication Year 2023
Subject Areas
Name: Computer Science