Survivable Key Compromise in Software Update Systems

Today’s software update systems have little or no defense
against key compromise. As a result, key compromises have
put millions of software update clients at risk. Here we identify
three classes of information whose authenticity and integrity
are critical for secure software updates. Analyzing
existing software update systems with our framework, we
find their ability to communicate this information securely
in the event of a key compromise to be weak or nonexistent.
We also find that the security problems in current software
update systems are compounded by inadequate trust revocation
mechanisms. We identify core security principles that
allow software update systems to survive key compromise.
Using these ideas, we design and implement TUF, a software
update framework that increases resilience to key compromise

Source: https://www.freehaven.net/~arma/tuf-ccs2010.pdf

Advertisements

Canary Analysis Service – ACM Queue

In 1913, Scottish physiologist John Scott Haldane proposed the idea of bringing a caged canary into a mine to detect dangerous gases. More than 100 years later, Haldane’s canary-in-the-coal-mine approach is also applied in software testing.

In this article, the term canarying refers to a partial and time-limited deployment of a change in a service, followed by an evaluation of whether the service change is safe. The production change process may then roll forward, roll back, alert a human, or do something else. Effective canarying involves many decisions—for example, how to deploy the partial service change or choose meaningful metrics—and deserves a separate discussion.

Google has deployed a shared centralized service called CAS (Canary Analysis Service) that offers automatic (and often autoconfigured) analysis of key metrics during a production change. CAS is used to analyze new versions of binaries, configuration changes, data-set changes, and other production changes. CAS evaluates hundreds of thousands of production changes every day at Google.

Source: https://queue.acm.org/detail.cfm?id=3194655

DevOps and containerization at scale

Since the split from eBay in 2015 PayPal has turbo charged DevOps. A key example of this in action is adopting Docker as containerization technology to enhance our developer experience, reduce drift in different environments like test and production and to bring higher efficiency of resource utilization in our data centers. This session will discuss PayPal’s journey to docker-ize 2,500 apps and hundreds of thousands of container instances.

Making “Push On Green” a Reality: Issues & Actions Involved in Maintaining a Production Service

Updating production software is a process that may require dozens, if not hundreds, of steps. These include creating and testing new code, building new binaries and packages, associating the packages with a versioned release, updating the jobs in production datacenters, possibly modifying database schemata, and testing and verifying the results. There are boxes to check and approvals to seek, and the more automated the process, the easier it becomes. When releases can be made faster, it is possible to release more often, and, organizationally, one becomes less afraid to “release early, release often” [6, 7]. And that’s what we describe in this article—making rollouts as easy and as automated as possible. When a “green” condition is detected, we can more quickly perform a new rollout. Humans are still needed somewhere in the loop, but we strive to reduce the purely mechanical toil they need to perform.

Source: https://www.usenix.org/system/files/login/issues/login_1410_online.pdf

Borg, Omega, and Kubernetes – Lessons learned from three containermanagement systems over a decade

Though widespread interest in software containers is a relatively recent phenomenon, at Google we have been managing Linux containers at scale for more than ten years and built three different container management systems in that time. Each system was heavily influenced by its predecessors, even though they were developed for different reasons. This article describes the lessons we’ve learned from developing and operating them.

Source: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/44843.pdf