Polisis: Automated Analysis and Presentation of Privacy Policies Using Deep Learning

Abstract

Privacy policies are the primary channel through which companies inform users about their data collection and sharing practices. In their current form, policies remain long and difficult to comprehend, thus merely serving the goal of legally protecting the companies. Short notices based on information extracted from privacy policies have been shown to be useful and more usable, but face a significant scalability hurdle, given the number of policies and their evolution over time. Companies, users, researchers, and regulators still lack usable and scalable tools to cope with the breadth and depth of privacy policies. To address these hurdles, we propose Polisis, an automated framework for privacy Policies analysis. It enables scalable, dynamic, and multi-dimensional queries on privacy policies. At the core of Polisis is a privacy-centric language model, built with 130K privacy policies, and a novel hierarchy of neural network classifiers that caters to the high-level aspects and the fine-grained details of privacy practices. We demonstrate Polisis's modularity and utility with two robust applications that support structured and free-form querying. The structured querying application is the automated assignment of privacy icons from the privacy policies. With Polisis, we can achieve an accuracy of 88.4% on this task, when evaluated against earlier annotations by a group of three legal experts. The second application is PriBot, the first free-form Question Answering about Privacy policies. We show that PriBot can produce a correct answer among its top-3 results for 82% of the test questions.

Publication
To appear in the 27th USENIX Security Symposium.
Date