Deep Learning with Differential Privacy

ABSTRACT

Machine learning techniques based on neural networks are achieving remarkable results in a wide variety of domains. Often, the training of models requires large, representative datasets, which may be crowdsourced and contain sensitive information. The models should not expose private information in these datasets. Addressing this goal, we develop new algorithmic techniques for learning and a refined analysis of privacy costs within the framework of differential privacy. Our implementation and experiments demonstrate that we can train deep neural networks with non-convex objectives, under a modest privacy budget, and at a manageable cost in software complexity, training efficiency, and model quality.

Source: http://arxiv.org/pdf/1607.00133v1.pdf

Applying ML to InfoSec — Startup.ML Conf

This should be very cool, offering more details of the kind that Elie Bersztein talked about at Usenix Enigma (Gmail’s spam and virus filters also use tensorflow.)

There seems to be very little overlap currently between the worlds of infosec and machine learning. If a data scientist attended Black Hat and a network security expert went to NIPS, they would be equally at a loss. This is unfortunate because infosec can definitely benefit from a probabilistic approach but a significant amount of domain expertise is required in order to apply ML methods.Machine learning practitioners face a few challenges for doing work in this domain including understanding the datasets, how to do feature engineering (in a generalizable way) and creation of labels.

To address some of the issues unique to adversarial machine learning, Startup.ML is organizing a one-day special conference on September 10th in San Francisco. Leading practitioners from Google, Coinbase, Ripple, Stripe, Square, etc. will cover their approaches to solving these problems in hands-on workshops and talks.  The conference will also include a hands-on, 90 minute tutorial on TensorFlow by Illia Polosukhin one of the most active contributors to Google’s new deep learning library. Reference Franc, Vojtech, Michal Sofka, and Karel Bartos. “Learning detector of malicious network traffic from weak labels.” Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer International Publishing, 2015.Seni, Giovanni

Source: Applying ML to InfoSec — Startup.ML Conf

OpenAI

OpenAI is a non-profit artificial intelligence group. Our goal is to advance digital intelligence in the way that is most likely to benefit humanity as a whole, unconstrained by a need to generate financial return.

…it’ll be important to have a leading research institution which can prioritize a good outcome for all over its own self-interest.

We’re hoping to grow OpenAI into such an institution. As a non-profit, our aim is to build value for everyone rather than shareholders. Researchers will be strongly encouraged to publish their work, whether as papers, blog posts, or code, and our patents (if any) will be shared with the world. We’ll freely collaborate with others across many institutions and expect to work with companies to research and deploy new technologies.

Emphasis mine.

Source: OpenAI

The Unofficial Google Data Science Blog: Causal attribution in an era of big time-series da…

The Unofficial Google Data Science Blog: Causal attribution in an era of big time-series da…: by KAY BRODERSEN For the first time in the history of statistics, recent innovations in big data might allow us to estimate fine-grained c…

Many solutions to large volume log processing simply involve visualization: I can sample the aggregated stream and show analysts a graph or other visual indication that will let them decide when some factor is substantially changed enough for them to investigate the cause. But this isn’t sufficient; as the post states:

What would we gain from solving automatic and fine-grained causal inference? The answer is straightforward — a way of determining just how much each impression, click, or download contributed to the desired outcome, such as a website visit, an app download, or an account sign-up. A way of assessing, therefore, how much value each of these events provided, and to suggest how limited resources should be spent to provide the highest marginal return.

In addition to impressions, causal inference can be used to detect any sort of anomalous behavior and attribute it to a source, allowing automated response. No more waiting for the analyst to spot the trend, dig through logs, and initiate a response. So methods like this, along with stateless infrastructure like containers, will allow more robust systems that manage themselves, leaving the professionals time to address edge cases and other problems of ever increasing complexity.