Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 28 tok/s
Gemini 2.5 Pro 40 tok/s Pro
GPT-5 Medium 16 tok/s Pro
GPT-5 High 13 tok/s Pro
GPT-4o 103 tok/s Pro
Kimi K2 197 tok/s Pro
GPT OSS 120B 471 tok/s Pro
Claude Sonnet 4 38 tok/s Pro
2000 character limit reached

Cookiescanner: An Automated Tool for Detecting and Evaluating GDPR Consent Notices on Websites (2309.06196v1)

Published 12 Sep 2023 in cs.CY

Abstract: The enforcement of the GDPR led to the widespread adoption of consent notices, colloquially known as cookie banners. Studies have shown that many website operators do not comply with the law and track users prior to any interaction with the consent notice, or attempt to trick users into giving consent through dark patterns. Previous research has relied on manually curated filter lists or automated detection methods limited to a subset of websites, making research on GDPR compliance of consent notices tedious or limited. We present \emph{cookiescanner}, an automated scanning tool that detects and extracts consent notices via various methods and checks if they offer a decline option or use color diversion. We evaluated cookiescanner on a random sample of the top 10,000 websites listed by Tranco. We found that manually curated filter lists have the highest precision but recall fewer consent notices than our keyword-based methods. Our BERT model achieves high precision for English notices, which is in line with previous work, but suffers from low recall due to insufficient candidate extraction. While the automated detection of decline options proved to be challenging due to the dynamic nature of many sites, detecting instances of different colors of the buttons was successful in most cases. Besides systematically evaluating our various detection techniques, we have manually annotated 1,000 websites to provide a ground-truth baseline, which has not existed previously. Furthermore, we release our code and the annotated dataset in the interest of reproducibility and repeatability.

Citations (2)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.