Scalable Privacy Analysis
Lead PI:
Serge Egelman
Abstract

One major shortcoming of the current "notice and consent" privacy framework is that the constraints for data usage stated in policies—be they stated privacy practices, regulation, or laws—cannot easily be compared against the technologies that they govern. To that end, we are developing a framework to automatically compare policy against practice. Broadly, this involves identifying the relevant data usage policies and practices in a given domain, then measuring the real-world exchanges of data restricted by those rules. The results of such a method will then be used to measure and predict the harms brought onto the data’s subjects and holders in the event of its unauthorized usage. In doing so, we will be able to infer which specific protected pieces of information, individual prohibited operations on that data, and aggregations thereof pose the highest risks compared to other items covered by the policy. This will shed light on the relationship between the unwanted collection of data, its usage and dissemination, and resulting negative consequences.

We have built infrastructure into the Android operating system, whereby we have heavily instrumented both the permission-checking APIs and included network-monitory functionality. This allows us to monitor when an application attempts to access protected data (e.g., PII, persistent identifiers, etc.) and what it does with it. Unlike static analysis techniques, which only detect the potential for certain behaviors (e.g., data exfiltration), executing applications with our instrumentation yields real-time observations of actual privacy violations. The only drawback, however, is that applications need to be executed, and broad code coverage is desired. To date, we have demonstrated that many privacy violations are detectable when application user interfaces are “fuzzed” using random input. However, there are many open research questions about how we can yield better code coverage to detect a wider range of privacy- related events, while doing so in a scalable manner. Towards that end, we plan to virtualize our privacy testbed and integrate crowd-sourcing. By doing this, we will develop new methods for performing privacy experiments that are repeatable, rigorous, and gen- eralizable. The results of these experiments can then be used to implement data-driven privacy controls, address gaps in regulation, and enforce existing regulations.

Serge Egelman

Serge Egelman is the Research Director of the Usable Security and Privacy group at the International Computer Science Institute (ICSI), which is an independent research institute affiliated with the University of California, Berkeley. He is also Chief Scientist and co-founder of AppCensus, Inc., which is commercializing his research by performing on-demand privacy analysis of mobile apps for compliance purposes. He conducts research to help people make more informed online privacy and security decisions, and is generally interested in consumer protection. This has included improvements to web browser security warnings, authentication on social networking websites, and most recently, privacy on mobile devices. Seven of his research publications have received awards at the ACM CHI conference, which is the top venue for human-computer interaction research; his research on privacy on mobile platforms has received the Caspar Bowden Award for Outstanding Research in Privacy Enhancing Technologies, the USENIX Security Distinguished Paper Award, and privacy research awards from two different European data protection authorities, CNIL and AEPD. His research has been cited in numerous lawsuits and regulatory actions, as well as featured in the New York Times, Washington Post, Wall Street Journal, Wired, CNET, NBC, and CBS. He received his PhD from Carnegie Mellon University and has previously performed research at Xerox Parc, Microsoft, and NIST.

Performance Period: 01/01/2018 - 01/01/2018
Institution: International Computer Science Institute