Auctions in Do-Not-Track Compliant Internet Advertising ABSTRACT
easy to link tracking information with PII [16]. Concern
Online tracking of users in support of behavioral advertis-
over tracking continues to grow, and has led for instance to
ing is widespread. Several researchers have proposed non-
the Do-Not-Track initiative launched by the Federal Trace
tracking online advertising systems that go well beyond the
Commission (FTC), and proposals for a do-not-track reg-
requirements of the Do-Not-Track initiative launched by the
istry [23]. Do-not-track, however, requires a public that is
US Federal Trace Commission (FTC). The primary goal of
largely unaware of tracking to opt-out. It also forces a trade-
these systems is to allow for behaviorally targeted advertis-
off between tracking and behavioral targeting: a compromise
ing without revealing user behavior (clickstreams) or user
that industry is not prepared to make. Not surprisingly, do-
profiles to the ad network. Although these designs purport
not-track has not gained a lot of traction with the public,
to be practical solutions, none of them adequately consider
and ad networks are actively fighting it.
the role of the ad auctions, which today are central to the
Several recent research projects take a different tack (Ad-
operation of online advertising systems. This paper looks at
nostic [24], Nurikabe [18], and Privad [13]). Rather than
the problem of running auctions that leverage user profiles
simply opting out of tracking, or pushing for data-protection
for ad ranking while keeping the user profile private. We
after the fact, they propose alternative advertising technolo-
define the problem, broadly explore the solution space, and
gies that allow for behavioral targeting without requiring
discuss the pros and cons of these solutions. We analyze
tracking. These are meant to be practical alternatives that
the performance of our solutions using data from Microsoft
industry finds attractive. None of these systems, however,
Bing advertising auctions. We conclude that, while none of
adequately explore how to operate the auctions that are
our auctions are ideal in all respects, they are adequate and
critical to current advertising systems. Without this com-
ponent, these systems leave unanswered what revenue thebroker (i.e. an ad network like Google) can earn, therebyreducing the likelihood that a non-tracking1 advertising sys-
Categories and Subject Descriptors
tem will be of commercial interest. In this paper, we look at
the problem of running auctions that leverage a user profilefor ad ranking while keeping the user profile private. General Terms
Although the non-tracking advertising systems cited
above differ in significant ways, they all share several key de-
sign components. All systems propose a software agent thatruns at the client and generates a user profile. All systems at
Keywords
least share the privacy goal that this profile not be revealed. All these designs propose that the broker transmit multiple
Auctions, Online Advertising, Privacy, Targeting
ads to the client, not all of which match the user profile. Forinstance, the ads may all be within a given interest category. INTRODUCTION
The client then locally selects from among these ads which
Third-party tracking of users is widespread and increas-
best match the user profile, and display these to the user.
ing [17]. One of the primary purposes of user tracking is
The client reports the result of this selection anonymously,
behavioral advertising. Although many companies that par-
and without letting the broker link together different com-
ticipate in tracking and behavioral profiling claim to not
ponents of the user profile. The key privacy mechanisms are
gather Personally Identifying Information (PII), it is often
therefore anonymity and unlinkability.
common pricing model for online advertising systems to-day is Pay Per Click (PPC): the advertiser does not pay
Permission to make digital or hard copies of all or part of this work for
the broker for showing an ad to a user, rather it pays
personal or classroom use is granted without fee provided that copies are
only if the user clicks on an ad. The broker selects which
not made or distributed for profit or commercial advantage and that copies
ads to show through an auction whereby advertisers bid
bear this notice and the full citation on the first page. To copy otherwise, to
against each other. In a PPC system, the broker maxi-
republish, to post on servers or to redistribute to lists, requires prior specific
mizes revenue by ranking the competing ads according to
permission and/or a fee. CCS’11, October 17–21, 2011, Chicago, Illinois, USA.
Copyright 2011 ACM 978-1-4503-0948-6/11/10 .$10.00.
By non-tracking, we mean no third-party tracking.
the Bid × ClickP robability product, and transmitting the
ability. Denoting bid as B, the ranking then is in order of
highest ranking ads to the client where they are displayed in
rank order. Of course, the broker doesn’t know the precise
When a user clicks on an ad, the ad ID is transmitted to
click probability for every ad. Rather, the broker tries to
the broker. The broker computes Cost per Click (CPC), that
predict the click probability as best it can. This prediction
is, the price that the advertiser must pay, using a second-
is based on a number of inter-related factors such as the ad
price auction [8]. In this approach, the price paid is pegged
keywords, the landing page keywords, the user search terms
to the bid of the ad that is ranked immediately below the
or keywords associated with the web-page being browsed,
clicked ad. To give a simple example, suppose that adver-
stored user characteristics, and so on.
tiser A is willing to pay as much as $5 for a click, and ad-
The user profile has a strong effect on click probability. To
vertiser B is willing to pay $10. In a second-price auction,
give a simple example, say a user searches for “running shoe”.
A could go ahead and bid $5 and B could bid $10. B would
Whether the user is a man or a woman, or prefers brand-
win, but would only pay incrementally more than A’s (2nd-
name products or discount products, plays an important role
in which running shoe ad he or she is more likely to click on.
Second-price auctions allow bidders to bid the maximum
In a do-not-track compliant advertising system, the broker
that they are willing to pay, rather than frequently modify
does not know the user profile: if the auction takes place
their bid in search of the value incrementally higher than the
at the broker in the same way that it does today, then the
next lower bidder. Specifically, the CPC is computed as:
user profile will not be factored into the result. Therefore
the highest Bid × ClickP robability ads won’t be selected,
leading to less revenue than should otherwise be possible.
This paper characterizes the problem, and proposes three
where Bn and Qn are the bid and quality score of the next
basic solutions, one of which is a variant of a method pro-
lower ranked ad, and Qc is the quality score of clicked ad.
posed in [12]. Following the lead of the initial work on non-
This CPC formula captures the minimum amount the ad-
tracking advertising, this paper takes a pragmatic approach
vertiser would have had to bid to beat the next-ranked ad
to the problem. It looks for a good trade-off between strict
in a first-price auction. Note that it prevents the advertiser
privacy guarantees and practical business and deployment
from paying more than it bid, even when the next-ranked in
concerns. As such, this paper explores the pros and cons of
the three approaches in terms of not just privacy (both userand advertiser), but also revenue, overhead, and vulnerabil-
Abstract Non-tracking Advertising
ity to attack. It uses around 2TB of auction traces from
In this section, we describe an abstract non-tracking ad-
Microsoft Bing to guide and validate the design choices.
vertising system that captures key aspects of the three ex-
Altogether, this paper makes the following contributions:
isting non-tracking advertising designs.
1) It proposes two new non-tracking auction designs, and
The principle components of the abstract non-tracking ad-
a third based on the previous work but substantially im-
vertising system include those of today’s tracking systems,
proved. 2) It analyzes the trade-offs between these three
the broker, the client, and the advertiser. A user profile is
designs in terms of privacy properties, auction properties,
stored at the client (i.e. the user’s computer, or a device
and fraud resistance. 3) It analyzes the effect of bid churn
trusted by the user: the distinction is not important for our
and auction timing on revenue and ad ranking using a trace
purposes). Each ad is associated with targeting information.
of Bing search advertising auctions, and uses this analysis
The user profile is defined as that information needed to de-
to argue for the feasibility of the solutions.
termine how well an ad’s targeting matches the user. Toproduce the user profile, the client monitors user behavior
BACKGROUND
(i.e. the user’s searching, browsing, purchases, and so on).
The privacy goals of the abstract non-tracking system are:
This section describes how current online advertising sys-
tems such as Google and Microsoft work, and then describes
• Anonymity: the broker cannot associate any unit of
an abstract alternative advertising model patterned after ex-
learned information with any user PII (including net-
isting proposals for non-tracking advertising. In the process,
we establish terminology and define the basic components.
• Unlinkability: the broker cannot associate separate
Current 2nd Price Ad Systems
units of learned information with a single (anonymous)client. This prevents a broker from building up a user
In current ad systems [7,11,20], advertisers submit ads to
profile, and then associating it with a known user using
a broker. Associated with each ad is a bid, one or more
keywords, and optionally some targeting information likedemographics (location, age, gender) or interests. When
The broker is assumed to be honest-but-curious. We be-
a client computer does a search or receives a web page
lieve that this is close to reality (brokers like Google can
with adboxes (space to place an ad), the broker identifies
generally be trusted to do what they claim they are doing).
the ads that match the search terms or keywords associ-
Nevertheless, we believe it is wise to avoid making it pos-
ated with the web page, and runs an auction. The auction
sible for brokers to obtain high-value information through
ranks the selected ads in order of highest expected revenue
hard-to-detect cheating, and our designs reflect this belief.
(Bid × ClickP robability), and transmits some number of
Figure 1 illustrates the basic architecture and message ex-
ads to the client. As already discussed, many factors are
change of an abstract non-tracking advertising systems. The
considered in estimating click probability. In this paper, we
network layer address of all messages is anonymized, which
refer to all of these factors taken together as a quality score
we represent as an anonymizing proxy. Messages are en-
Q, where a higher value means higher expected click prob-
crypted to prevent viewing by the anonymizing proxy. The
less than 1 that raises or lowers the click probability propor-tionally to its effect on the click probability defined by G.
Section 7 briefly discusses how U may be computed.
In current tracking advertising systems, the click normally
takes place almost immediately after the view, and so CPCis normally computed shortly after the ranking. As a result,the parameters that go into determining the ranking (B and
Q) do not change much between ranking and CPC. In non-tracking advertising systems, as explained later, some time
Figure 1: Abstract non-tracking advertising system.
may pass between when B is set by the advertiser and when
B is the broker, C is the client. Communications is
the ad is ranked, or between when an ad is ranked and when
through an anonymizing proxy. [x] denotes encryp-
CPC is calculated. Therefore, we set the following goals with
• The B, G, and U used for CPC calculation are the
client requests a set of ads of a given type (i.e. for a given
same as the B, G, and U used for ranking. Note in
product or service). The request must be generic enough
particular that if they are not the same, then it is
that a substantial set of clients can have legitimately made
possible for instance for the CPC to be higher than
the request (i.e. K-anonymity and L-diversity). A set of ads
the submitted bid of the clicked ad.
matching the type, each with identifier A and associated tar-geting information, are transmitted to the client and stored.
• The delay between ranking and CPC calculation is
When an adbox is presented to the client, for instance on a
small enough that the churn in B, G, or U does not
web page, the client selects among the stored ads those that
have a significant impact on rankings, CPC values, and
best match the user profile, and puts them in the ad box for
viewing by the user. The client reports the view and click of
What exactly comprises user score U depends on the client
the ad A on a webpage with the given URL. There is noth-
profiler and can vary from system to system. We can, how-
ing in the messages that allows the broker to link different
ever, classify user information into three time frames. At
messages as coming from the same user.
the time frame of months or even years are user demograph-
Our model has two necessary channels of communications
ics like gender, location, language, age, salary, and so on.
between client and broker, ad delivery, and view and click
User interests can also last years (e.g. coin collecting), but
reporting. Both channels present opportunities for the bro-
more typically last weeks (a new car), days (a new pair of
ker to learn information contained in user profiles, and cur-
shoes), or minutes (a pizza). If we assume that matching
rent non-tracking advertising designs protect these channels.
ads to the content of a web page or search page increases
Any new information that must be conveyed for the purpose
click probability, then user score can change in seconds or
of the auction must also be protected.
less. For instance, a user might be interested in tennis andmusic, but the user score for tennis ads may increase while
AUCTION GOALS
the user is looking at a tennis website, and vice versa formusic ads.
The privacy goals for the auction component of a non-
We do not make any assumptions about the relative im-
tracking advertising system are the same as described in
portance of B, G, or U . An ideal auction design however
Section 2: anonymity and unlinkability. This section de-
scribes the goals of the auction itself.
Besides the basic goal of running an auction that lever-
The primary goal of the auction in a non-tracking adver-
ages the user profile while prohibiting the broker from re-
tising system is to provide a second-price auction mechanism
constructing it, there are a few additional related goals that
that achieves close-to-ideal ranking of ads (i.e., in order of
Bid × ClickP robability). For today’s tracking advertisingsystems, leveraging the user profile is straightforward, since
• to maintain the privacy offered to the advertisers them-
the broker itself accumulates and maintains this informa-
selves. In particular, to prevent advertisers from learn-
tion. In a non-tracking system, the broker does not have
ing each other’s bids and budgets.
user profile information, but does have other information
• to maintain the level of click-fraud defense in current
that goes into the quality score Q. In other words, part of
the information used to produce Q is in the broker, and partis in the client. Therefore, we define a user score U which
• to minimize the overhead of the auction.
directly reflects the effect of the user profile, when matchedagainst an ad’s targeting information, on click probability.
As will become apparent in subsequent sections, our de-
We define a second quality score G, that reflects the remain-
signs do not perfectly achieve all of these goals. Rather, our
ing “global” information known to the broker.
designs offer trade-offs between these goals.
Specifically, this results in an ideal ranking and CPC of:
Note finally that it is possible that the value of U may cor-
relate with the probability that the user will buy the prod-uct or service being advertised assuming that the user has
clicked. If this is the case, then the advertiser would want
to express multiple bids as a function of U , since different
U ’s would produce different revenues for the advertiser. We
do not address this capability in this paper, but rather leave
For example, U could be a positive real value greater or
it for future work should it turn out to be important.
an ad, then the client computes the following values and
Rank at Client (RaC)
(Bn × Gn) product of the next-ranked ad times the
ratio (Un/Uc) of the user score of the next-ranked adUn and the user score of the clicked ad Uc.
[Bc,Gc]: The encrypted B and G for the clicked ad as re-
Upon reception of this message, the broker decrypts
[Bc,Gc]. It uses Gc and ((Bn × Gn) × (Un/Uc)) to com-
Rank at Broker (RaB)
pute the CPC as shown in Equation 2. The broker alsocompares the resulting CPC with the decrypted Bc.
CP C > Bc, then the broker knows that the client is en-
gaged in click fraud, and the broker can ignore the message. If CP C ≤ Bc, then the broker can accept the message, al-
though this doesn’t mean that the client is not engaged in
click-fraud. Other mechanisms, such as statistical analysis,
must be used to detect it as is done today.
Variation: It may not be necessary to transmit the en-
crypted values [Bc,Gc]. This is because Bc and Gc can be
looked-up using the ad ID Ac. The danger here is that the
looked-up values may be different from the Bc and Gc val-ues used to rank the ads. How different depends on the
Rank at 3rd Party (Ra3)
level of churn in B and G values, which we found to beminimal in the Bing auction trace (Section 6). Therefore,
Figure 2: Three basic auction schemes. A is an ad
it may well suffice to use looked-up values rather than val-
ID unique to the ad, while ID is an Ad ID unique to
ues stored along with the ad at the client. Note that this
the combination of ad and client. The subscripts ‘c’
variation applies to RaB and Ra3 as well.
and ‘n’ refer to the clicked ad and the next ranked adrespectively. [x] denotes encryption of x. Messages
Rank-at-Broker (RaB)
between client and broker pass through anonymiz-
One concern with RaC is that the value (BxG) exposes
ing proxies. Messages through solid-line proxies are
information about the advertiser (see Section 5.1.3). This
encrypted. Messages though dotted-line proxies are
can be avoided if the ranking is done at the broker. The RaB
scheme presented here is similar to the approach proposedby Privad in [12]. We present it here for completeness.
Along with the ad, the broker transmits the following to
DETAILED DESIGNS
To run the auction specified in Section 3, the system doing
the ranking must have access to the bid B, the broker quality
score G, and the user score U . This means that either 1) B
ID: An identifier unique to this specific delivery of this ad
and G are sent to the client, 2) U is sent to the broker, or
(among all other deliveries). In other words, the same
3) B, G, and U are all sent to a 3rd party (Figure 2). These
ad delivered to other clients would have a different
basic approaches are explored in the following sections. Rank-at-Client (RaC)
[B × G]: A single value which is the product of (B × G),
In this approach, the following information is transmitted
encrypted with a symmetric key known only to the
with the ad to the client along with everything else required
by the advertising system (e.g. targeting information, not
[B, G]: The values B and G, encrypted with a symmetric
The client computes a user score U for each ad (in the
(B × G): A single value which is the product of (B × G).
absense of knowledge of what web page the ad may be shownon). In order to obscure the user profile, the client assigns
[B, G]: The individual values B and G, encrypted with a
a random value for U for those ads for which the client has
symmetric key known only to the broker.
a very low user score (for instance because the demographicdoesn’t match that of the user).
When a collection of ads arrive at the client, it ranks all
Clients transmit the ID,U tuples to the broker via a proxy.
ads using (B × G × U ). Note that in this case U is current,
These messages are not encrypted. The proxy also remem-
while (B × G) is more-or-less older. If the user clicks on
bers which IDs were received from which clients.
For each received ID, the broker looks up the current val-
in Sections 5.1.1 and 5.3.3. When a user clicks on an ad, the
ues of B and G. It then uses B, G, and U to rank each
client encrypts (Un/Uc) with the broker’s public key. In the
received ad among a large number of recently received ads
case of RaC, it also encrypts (Bn × Gn) with the broker’s
(say, those received over the last hour), and associates a rank
public key. In the case of RaB and Ra3, the broker provides
number Rank to each ad. Closely ranked ads may have the
the encrypted [Bn × Gn], but using its public key instead of
same ranking number. The broker transmits the ID,Rank
a symmetric key. For all three schemes, the broker provides
[1/Gc], again encrypted with the broker’s public key. Using
The proxy looks up which client is associated with each
homomorphic property of the encryption, the client is able
ID, and forwards the message on to the client. The client
disregards the ads related to the low user scores, and uses
the remaining ranking for selecting ads to put in ad boxes.
When a user clicks on an ad, it transmits the following
and transmit the resulting value in the click report. Uponreceiving a click report, broker decrypts the value to obtain
the CPC. Although homomorphic encryption is relatively
expensive, there is no need to do the operation in real-time.
n × Gn]: The encrypted (B × G) for the next-ranked ad
Rather, the client can do the operation when it has spareCPU cycles before transmitting, and the broker can likewise
(Un/Uc): A single value which is the ratio of the user score
run the operations later on as batch processing.
of the next-ranked ad Un and the user score of theclicked ad Uc. AUCTION ANALYSIS
This section analyzes the three types of auctions in terms
c ,Gc ]: The encrypted B and G for the clicked ad received
of privacy, auction quality, and attacks on the auction sys-tems.
Upon reception of this message, the broker decrypts [Bn ×
Privacy Properties
compute the CPC as shown in Equation 2. As with RaC, the
In this section, we look at the information that is conveyed
broker also compares the resulting CPC with the decrypted
for the sake of the auction between honest-but-curious play-
ers, and determine whether it constitutes a privacy threat.
The proxy prevents the broker from learning the identity
In Section 5.3 we relax this assumption.
of the client whose ads are being ranked. The per-client-per-ad unique ID prevents the proxy, which does know the
Broker analyzes (Un/Uc) (RaB and Ra3)
client identity (network address), from learning which ad,
In order to exploit this value to gather more information
and therefore what targeting information, is being referred
about the user profile, the broker would have to first tease
apart the values of Un and Uc, then use the value combined
Even with the noise added to low user scores, we are con-
with the ad targeting to reverse engineer the user profile, and
cerned that the non-noise user scores can be interpreted at
then use the user profile knowledge to link together multiple
the broker as a kind of fingerprint over the set of ads (i.e.,
reports. The first step may be made difficult by making
ads targeted to men should have uniformly higher scores for
(Un/Uc) relatively coarse-grained, thus making it harder to
men, and lower scores for women). In this way, the broker
uniquely factor out its components. The second step is made
could potentially tease out the profile of users.
difficult simply by the sheer number of clients that are likelyto have similar user scores. Thus, we conclude that exposure
Rank-at-3rd-Party (Ra3)
of (Un/Uc) does not constitute a serious threat.
This approach is similar to RaB, but prevents the finger-
printing mentioned above. The main difference between Ra3
Broker analyzes ((Bn × Gn) × (Un/Uc)) (RaC)
and RaB is that in Ra3, the broker additionally sends the
This value is more difficult to reverse engineer than
unique ad IDs and (B × G) products to a 3rd party system
(Un/Uc), and is therefore also not a threat.
which is trusted not to collude with the broker. This in-formation must be delayed long enough that the 3rd party
Client analyzes (B × G) (RaC)
system cannot use a timing attack to correlate the values
An advertiser can use this value to determine the broker
associated with a single user. This 3rd party also receives
quality score G assigned by the broker to its own ads. This
the user scores from the clients, and based on this informa-
can be done by the advertiser simply creating a client that
tion, ranks ads in the same way the broker does in the RaB
receives its own ads, and using the known value of B to factor
approach. Since, unlike the broker, it does not know which
out G. Whether this is a problem needs to be decided by
ads were transmitted to the same client, it cannot fingerprint
the broker, though we point out that today Google reveals
a coarse-grained quality score to its advertising customers.
The product (B × G) to the client also reveals the overall
Homomorphic Encryption Variant (RaC,
ranking of an ad to anyone running a client, including the
RaB, and Ra3)
advertiser’s competitors. From this, they can also roughly
A variation on all three auction designs is to use homomor-
estimate the advertiser’s bids. It is not clear that this is a
phic encryption (e.g., ElGamal [9]), which allows for multi-
problem, for two reasons. First, in today’s advertising sys-
plication operations on encrypted data. This may be used
tems, an advertiser can see how its competitors rank relative
to defend against certain attacks by the broker as described
to itself simply by observing how ads are displayed. RaC
makes it easier and cheaper to obtain this ranking infor-
the user score based on the number of clients. If the broker
mation, but does not fundamentally change an advertiser’s
predicts too high, then it can lower the minimum user score
ability to do so. Second, historically in traditional advertis-
and send this to clients, thus causing more clients to show
ing (print, TV, radio), advertisers can monitor how much
advertising their competitors do, and can generally knowthe cost of that advertising. While certainly all things being
equal advertisers would like to keep this information secret,
An important aspect of the auction is the scope of the
historically the inability to do so has not, for the most part,
auction, by which we mean the set of ads that compete in
prevented companies from advertising.
any given auction. As a general rule, the more ads that
If exposing the product (B × G) to the client is an accept-
compete, the higher the CPC. This is simply because the
able privacy loss, then RaC should be the preferred auction
more ads there are, the more probabilistically likely the next-
method for its overall simplicity and lower overhead. If it is
ranked ad will have a (B ×G×U ) closer to that of the clicked
not acceptable, then RaB and Ra3, which both avoid expos-
ad. On the other hand, the larger the auction scope, the less
ing this information, may be preferred.
fair it is in the sense that very different types of ads mustcompete. A local pizza store may not wish to compete with
Auction Properties
In this section, we discuss the various shortcomings of each
The auction scope for search or contextual systems like
of the approaches with respect to the auction properties,
Bing and Google is the set of ads whose keywords match that
especially ranking results and revenue.
of the search or web page. Today’s ad exchanges, where ad-vertisers bid in real time, typically for adboxes on premium
publishers, have a potentially much broader scope because
There are several potential delays in the non-tracking ad-
any advertiser can bid. The auction scope in a non-tracking
vertising systems that can change both the rankings and the
advertising system is tunable. It may be all ads in a client, or
computed CPC. With RaC, there is a delay between when
all ads within an ad type (i.e. an interest). What’s more, in-
the ad was transmitted and adbox time when the ranking
terests may be hierarchical (sports/tennis/clothing/shoes),
takes place. With RaB and Ra3, there is a delay between
and may be more general or more specific, thus allowing for
when the ranking occurs and adbox time when the ranking
substantial flexibility in auction scope.
is actually used. In either case, the bid B or the brokerquality score G used for ranking may no longer be correct,
and an out-of-date ranking takes place.
In this section, we relax our assumption of honest-but-
During the design of the auction approaches, these delays
curious players, and consider a number of malicious attacks
were a major concern. As it turns out, at least for the auc-
tion data from Bing search advertising auctions (Section 6),the delays have only a minor impact on both broker rev-
enue and advertiser costs, even when the delay is several
The client can commit a form of click fraud by lying about
hours or a day. Nevertheless, this may not be the case for
the value of ((Bn × Gn) × (Un/Uc)) (RaC) or (Un/Uc) (RaB
other systems or future systems, and so it remains important
or Ra3). By inflating or deflating these values, the client
that these delays are engineered to be minimal. This could
can cause advertisers to pay more or less, and cause pub-
be done, for instance, by having clients frequently request
lishers to earn more or less. At a high level, this is very
similar to normal forms of click fraud that occur today, andin this sense our auctions do not allow fundamentally new
forms of click fraud. Privad describes how to defend against
A problem encountered by non-tracking advertising is that
click-fraud even with anonymizing brokers [12]. The same
the broker does not know which clients are the best clients
method may be used here. The basic idea is that the proxy
to send an ad to. For instance, suppose that some number
tags reports with a per-report unique identifier. If the bro-
of clients M have requests ads for watches. The broker does
ker suspects click fraud, it informs the proxy of the report
not know which clients may be interested in cheap watches,
ID of the suspicious report. If a given client is suspected
and which in expensive. The advertiser, however, might not
more times than some threshold, its reports can be tagged
have enough budget to pay for all the clicks that would result
by the proxy as coming from a suspected client. In some
if all watch ads are sent to all interested users.
cases the broker may suspect click fraud simply because the
Lets assume that the broker knows the clicks per
2nd price is impossibly high (i.e. higher than the 1st price
delivered-ad rate. From this, it can determine the number of
bid). In most cases, however, the broker may suspect click
clients N that should receive the ad without exhausting the
fraud through statistical analysis of the reports for given
advertisers budget. If it randomly chooses N clients among
the interested clients, then it will not be sending all ads tothe most interested clients. Proxy fingerprints client user scores and re-
One way to solve this problem is for the broker to go ahead
and send the ad to all interested clients, but to also send
It is difficult but conceivable in RaB and Ra3 that the
a parameter giving the minimum user score that a client
proxy could determine user profiles through observation of
must have in order to show the ad. This way, only the
the client user scores and rankings. For instance, the proxy
best matching clients will show the ad. The broker may
could establish a number of fake clients that pretend to have
be able to establish the expected click per delivered-ad rate
various profile attributes, and establish fingerprints of the re-
for various user scores, and therefore predict the setting of
sulting user scores and rankings. One way to do this might
be to determine (B × G) given user scores and correspond-
can manually update the bid; second, the ad network can
ing ad ranks, and use these values as the fingerprint. The
automatically update the bid (as directed by the advertiser);
proxy could then compare these fingerprints with the cor-
third, a 3rd-party may update the bid on the advertiser’s
responding fingerprints of real clients. It could be that the
behalf. Each of these has different churn characteristics:
signal-to-noise ratio is high enough to successfully pull off
Advertiser: Manual updates, we believe, cause very little
this attack. One way to prevent this would be to encrypt
churn since they are reactive over a long feedback cycle.
user scores and rankings. The user scores could be encrypted
Advertisers receive updated campaign information (i.e. how
using the brokers (RaB) or 3rd-party’s (Ra3) public key, and
many clicks, actual amount charged, budget left) at fairly
the rankings could be encrypted using symmetric keys cre-
coarse intervals (few times a day). This limits the number
ated by the clients and conveyed securely to the broker or
of informed changes to their campaigns.
3rd-party. These symmetric keys would be frequently mod-
Ad Network: The advertiser can invoke functionality pro-
ified to prevent the broker or 3rd-party from linking user
vided by the ad network to optimize his bidding strategy.
scores with the same client, and possibly launching a simi-
For example, the ad network may allow the advertiser to
set a preferred rank (e.g. position 4), and the ad networkautomatically lowers or raises the bid to satisfy the request
Broker manipulates [Bn × Gn] (RaB, Ra3)
based on the market. Other examples may include auto-
A malicious broker could launch an attack on a non-
matically modifying bids to meet a target number of im-
tracking advertising system to identify clients by inserting
pressions per day (while still being charged only for clicks),
unique IDs into the encrypted fields [B, G] or [B × G]. Once
or modifying bids based on time of day etc. Some of this
a client is identified in this way, unlinkability is lost, and the
functionality (e.g. modify bids based on time-of-day) can be
broker can build up client profiles. The broker can then po-
implemented in the client and would therefore not result in
tentially identify the client through external means. There
any added churn. Other functionality (e.g. preferred rank)
is some cost to this approach, as the broker must “waste” ads
tends to be implemented today as a periodic update (once
to do the tracking2. The homomorphic encryption variant
described in Section 4.4 defends against this attack. Because
3rd Party: Search Engine Optimization (SEO) companies
the client multiplies the received encrypted fields with other
optimize their client’s bidding strategy in real-time [6] e.g.
fields, the values generated by the broker are obscured.
based on trending terms, real-time click-through rates, etc. This could potentially result in high bid churn, however, due
Discussion
to the premium nature of these SEO services, only a small
RaB, which was proposed in [12], is a weak system be-
cause it opens up a fingerprinting attack at the broker. Ra3
Aside from changes in B, G can also change. Recall G
solves this problem, though at the expense of requiring yet
in our model is a function of what the broker knows: G is
another administratively distinct and non-colluding system.
computed based on the ad (past CTR, landing page quality,
Nevertheless, we consider it to be better than RaB.
etc.). G is largely a property of the ad itself, which we
If exposing the (B × G) product to the client is not a
don’t expect to change quickly or dramatically. In any event,
problem for the broker and advertisers, then RaC is better
our Bing auction trace unfortunately does not allow us to
than Ra3 because it is simpler, incurs less overhead and
validate our assumption since it does not isolate user-derived
latency, and does not require the 3rd party. In addition, it
components of G from other components.
has no issues here with respect to user score churn, becauseranking takes place at ad view time. If exposing (B × G)
How Does Churn Affect Auctions?
is a problem, however, then Ra3 appears to be a reasonable
Today auctions take place at the time when an ad is dis-
played to the user; ranking and CPC calculations can im-mediately reflect any changes in B or G. Privacy preserving
EFFECT OF CHURN
auctions described in Section 4 are limited in terms of how
Section 5.2 describes how various delays in all three auc-
fast new B and G information can be incorporated. Since G
tion systems may distort rankings and CPC computation.
does not rapidly change over time or can be engineered to
How detrimental this delay is depends on how much churn
remain relatively stable (e.g., using U to reflect short-term
there is. Churn may affect rank, CPC value, and ultimately
changes in click probability), the main source of churn is
revenue. RaC is affected by churn in B and G, while RaB
the changes in B. To understand the effects of churn in B
and Ra3 are additionally affected by churn in U . In this
values, we simulate auctions that use stale B information
section, we use trace data from Microsoft’s Bing advertising
for ranking and CPC computation, and then compare the
platform to study in depth the effect of auction delays on B
resulting ranking and CPC computation with auctions that
and G from both the advertiser and broker perspective. We
find that, while churn exists, it has only a negligible impacton broker revenue and advertiser costs.
For our trace driven simulations, we sampled around 2TB
What Causes Churn?
of log data from Bing’s auction engine spanning a 48 hour
B × G for an ad changes when either B or G changes.
period starting September 1, 2010. The data covers over
B can change in one of three ways: first, the advertiser
150M auctions for over 18M unique ads shown to North
2One might argue that the same attack can be launched
American Bing search users across all search topics. The
simply be creating unique Ad IDs transmitted in the clear.
trace record for an auction lists all the ads that participated
However, this attack can at least be detected by third par-
in it (whether the ad was ultimately shown or not), the bids
ties, for instance running honey-farms of clients.
corresponding to each ad, the corresponding quality scores,
and which if any of the ads were ultimately clicked by the
Methodology
We re-compute auction rankings and the CPC for each
auction in our dataset using stale bid information; we varystaleness from 1 minute to 2 days.
Auction rankings are re-computed using bid and quality
data from the trace. Since our trace does not show when
the advertiser updated the bid, we infer the time based on
multiple auctions that a given ad participates in. If the bid
for the ad is the same for two consecutive auctions, we inferthat the bid did not change during that interval. If the bid
is different, we infer that the bid changed sometime between
the two auctions; we use the mid-point as the time of change.
To simulate an auction at time T with stale information fromd minutes ago, we simply use the bids current as of time
T − d in our trace. The quality score in the trace is based onuser features (e.g. search query), which correspond to U inprivate auctions; since the client always has the current value
advertisers or SEOs, may for instance, attempt to predict
of U we use the same quality score for simulated auctions as
what bid they might want to make 1hr hence, and enter it
into the system well in advance. Advance bidding would re-
CPCs are re-computed based on the re-computed auction
duce the effective staleness of information. For our purposes,
rankings (using the second-price formula of Equation 2). In
we assume the bidding strategy does not change.
other words, for an adbox at time T in the trace, we computethe ranking based on bid values recorded at time T − d
Simulation Results
and populate the adbox using resulting ranking. If the user
Overall our simulations show that there is no appreciable
clicked on an ad in this adbox, the bid of the next lower
change in broker revenue for using stale bid information;
ranked ad Bn that we use in the CPC computation is the
even in the most conservative cases, the revenue is within
±0.1% of today. For advertisers, while stale bids affect their
One limitation we face is that we cannot predict the
auction rankings, they do so in a balanced manner with cases
change in user behavior when auction rankings change. Con-
of higher-than-today rank canceling out cases of lower-than-
sider, for example, two ads A1 and A2 where in the trace
today rank resulting in zero net change.
they are ranked 1 and 2, while in the simulated stale auction
Figure 3 plots the change in broker revenue compared to
they are ranked 2 and 1 respectively. If the user clicked A2
today as a function of the staleness of information used and
in the trace, what might we expect the user to click in our
the user model. The x-axis varies the stateless of bids from
simulation? One option is to model the user as clicking the
1 minute to 2 days. The box-and-whisker plot varies the
same ad he clicked in the trace; thus in this case the user
user model with the top whisker showing the outcome where
clicks A2 in the simulation. Another option is to model the
the user clicks the same position, and the bottom whisker
user as clicking the same position he did in the trace; in this
showing when the user clicks the same ad; the top edge of
case the user clicks position 2 (A1 in the simulation). In
the box shows 75% same position and 25% same ad, and
reality, the user model is neither of these two extremes —
vice versa for the bottom edge of the box; the line in the
it is well-known that both ad content and rank effect click-
through rates (CTR) [10]. To account for this, we simulate
The first observation we make from Figure 3 is that under
5 user models: 1) same position, 2) 75% same position and
a 50-50 user model, change in revenue is practically 0% even
25% same ad, 3) 50%-50%, 4) 25% same position and 75%
with bid information as stale as up to 12 hours. Under the
same ad, and 5) same ad. Thus we establish an envelope of
75-25 and 25-75 models, the change is almost always between
possible user behavior to get a sense of the upper- and lower-
±0.05%, and only in the extreme cases 100-0 or 0-100 does
bounds of our simulation results. Note that always clicking
it pass ±0.1%. More importantly, the change increases very
on the same ad is a strictly conservative estimate. This is
gradually. This is good news since it means a private adver-
because an ad that was clicked in the trace but is not shown
tising system would not have a hard delay deadline beyond
to the user in our simulation (due to being ranked too low)
which there would be disproportionate change in revenue.
would not get clicked; at the same time, under the same-ad
Instead the system can strive to do the best it can, and re-
model, an ad that was not shown in the trace (due to being
duce revenue change proportionally. The extremely gradual
ranked too low) and was therefore not clicked would have
rate of change also means that system design trade-offs can
no chance of getting clicked even if it were to be shown to
be biased towards scalability and other engineering goals
the user in the simulation. This asymmetry biases the sim-
without much concern to revenue since it changes very little
ulation towards fewer clicks (and therefore lower revenues).
The only user model immune to this limitation is the same-
At first blush the effect of the “same-ad” user-model ap-
pears to be to reduce the revenue, but this is deceptive. As
A second limitation we face is that we cannot predict how
mentioned earlier, the more the user clicks on the same-ad
advertisers would change their bidding strategy in response
(going from 0% to 100% from the top whisker to bottom),
to auctions being based on stale information. Enterprising
the more biased the simulation is towards fewer clicks and
COMPUTING USER SCORE
So far, we have assumed the existence of a user score U
that, when multiplied with the quality score G produces the
expected click probability at the client for a given ad. Be-
cause clicks are relatively rare, it may be difficult to estimate
U at the client based purely on the click history of the client.
Therefore, we require that the broker anonymously and un-linkably gathers detailed click statistics from clients in order
to improve click probability estimates at individual clients.
In what follows, we sketch out an approach.
There are a number of measurable attributes X =
{x1, x2, .xL} at the client that may help in prediction ofclick probability. For instance, the level of interest (high
Average fraction of auctions affected for advertiser (%)
or low) in the ad’s product or service, the quality of the
match between the targeting and the user, the context ofthe webpage, as well as the user’s historic CTR. The idea
Figure 4: Fraction of auctions with modified rank-
is that each client reports this information anonymously to
the broker for each ad that it views and clicks. These re-ports contain: {Ad-ID, X, click}, where X is the values ofthe attributes, and ‘click’ indicates whether or not the ad
therefore less revenue. Recall that only the top-whisker is
was clicked. Given this information from many clients, the
broker can determine the effect of the attributes on click
The second observation we make from Figure 3 is the
probability, and convey this information to the client as a
slight upwards trend of the top-whisker signifying higher
function f of the attributes such that U = f (X), along with
revenues as more stale information is used. This suggests
the ad. This allows the client to compute U by measuring
a consistent trend of advertisers (as a whole) reducing their
the attributes and plugging them into the provided function.
bids over time. We don’t know the cause of this trend.
As mentioned, U in RaC can be computed at viewing time
Next we turn to the advertiser perspective. We compute
with the latest set of attributes without churn issues since
for each ad the fraction of auctions where the user-visible
that’s when the ranking takes place. The function f for a
simulated ranking increased or decreased compared to the
new ad can be initially set to that of similar existing ads
trace, and whether the ad became visible or invisible due
until enough data for the new ad is gathered. The details of
to being ranked high-enough or too-low as compared to the
trace. Figure 4 plots the average of these numbers across
One concern is that the set of attribute values X is unique
all ads as a function of the staleness of bid information
cern. First, the attributes may be fairly coarse-grained, thus
We first observe that both increased ranks and decreased
broadening the set of users to which they apply. Second,
ranks are roughly equal, and so average to nearly zero. The
some of the attributes may be hard to correlate using ex-
same is true for ads becoming visible or invisible. While this
ternal knowledge, such as the user’s CTR. Third, attributes
is consistent with the revenue change in Figure 3 averaging
like level of interest change from interest to interest, and
out to zero, we note that there are other ways the revenue
even within an interest over time, and therefore are hard to
could average out to zero while being unfair to advertisers.
link to the same user. Fourth, some attributes are not spe-
For instance, fewer increased ranks could have been com-
cific to the user, for instance webpage context. Finally, the
pensated by more cases where the ad became visible thus
only information beyond the attribute values that is leaked
still resulting in zero revenue change while being unfair to
is the ad viewed. In particular, the user’s click-stream is not
the advertiser; luckily, this is not the case.
exposed. We believe that it is reasonable to establish public
We observe next that there is a very small impact of stal-
policies that determine the nature of the attributes in such
eness on change in ranks; it begins with around 12% of
a way that meaningful privacy is preserved.
auctions for 1 minute stale data, and quickly converges toaround 16%. The reason this number is high is because ofthe cascade effect — if a single ad jumps from a low rank
RELATED WORK
to a high rank, it causes all the ads in between to register a
There is a substantial body of work on cryptographic pro-
“change” in rank; thus a single change in bid can affect up
tocols for privacy preserving auctions. Depending on the un-
to ten ads. The impact, however, is very little; the ad jump-
derlying security model these proposals can be classified into
ing from low to high might register a change of 10 ranks,
the following three categories. In the first category, there are
however, the other 10 ads would register a change of only 1
protocols that rely on computation that is distributed among
rank each (not captured in the graph). Overall we found a
auctioneers who jointly determine the outcome of an auction
median net change of 1 rank for every 820 auctions the ad
using threshold multi-party computation (e.g., [14, 15, 22]).
The second category of protocols introduces a semi-trusted
To summarize, based on extensive simulations across vary-
third party, aka an “auction issuer” or “auction authority”,
ing degrees of staleness and different user-models, there is
in addition to the auctioneer, and uses asymmetric multi-
little impact on broker revenue as compared to today, and
party computation technique, such as Yao’s garbled circuit
little impact on advertiser fairness as compared to today.
(e.g., [1, 2, 5, 19, 21]). Finally, protocols in the third categoryallow bidders to cooperatively compute the auction outcome
without relying on any trusted third party (e.g., [3, 4]). The
on Financial cryptography, pages 223–238. Springer-Verlag,
primary goal of all these proposals, and many others not
cited, is to keep bids and selling price secret from the auc-
[4] F. Brandt and T. Sandholm. Efficient privacy-preserving
tioneer and other auction participants. The problem that
protocols for multi-unit auctions. In Proceedings of the 9thinternational conference on Financial cryptography and
we address in this paper is different. This paper is primarily
Data Security, pages 298–312. Springer-Verlag, 2005.
concerned with protecting the user, not the bidder (i.e. ad-
[5] C. Cachin. Efficient private bidding and auctions with an
vertiser). Indeed the user does not exist in the prior work.
oblivious third party. In Proceedings of the 6th ACM
In any event, the high computational and communication
conference on Computer and communications security,
complexity imposed by aforementioned secure auction pro-
tocols make them impractical for our problem.
[6] S. Clifford. Instant Ads Set the Pace on the Web. The New
York Times, 2010. http://tinyurl.com/yl8dt29.
[7] DoubleClick. DART for Advertisers. http:
//www.doubleclick.com/products/dfa/index.aspx,
[8] B. Edelman, M. Benjamin, and M. Schwarz. Internet
This paper addresses the challenge of designing an on-
Advertising and the Generalized Second-Price Auction:
line advertising auction for a non-tracking advertising sys-
Selling Billions of Dollars Worth of Keywords. American
tem that leverages the user profile information while keep-
Economic Review, 97(1):242–259, Mar. 2007.
[9] T. ElGamal. A public key cryptosystem and a signature
ing the user profile private. We broadly explore the design
scheme based on discrete logarithms. Information Theory,
space, proposing three types of auctions, and analyzing their
IEEE Transactions on, 31(4):469–472, 2002.
properties with respect to privacy, auction quality and vul-
[10] J. Feng, H. Bhargava, and D. Pennock. Implementing
nerability to attack. Overall, we find that two of the sys-
sponsored search in web search engines: Computational
tems, Rank-at-Client (RaC) and Rank-at-3rd-Party (Ra3),
evaluation of alternative mechanisms. INFORMS Journal
are very acceptable designs. RaC is simpler and more effi-
[11] Google, Inc. AdWords: Advertise Your Business on Google.
cient, but has the drawback that information about ad qual-
ity and bid is leaked. On the other hand, this is information
[12] S. Guha, B. Cheng, and P. Francis. Privad: Practical
that can today be determined by placing ads and monitoring
Privacy in Online Advertising. In Proceedings of the 8th
the resulting ranking. Finally, noting that our auction de-
Symposium on Networked Systems Design and
signs suffer delays that cause out-of-date bid information to
Implementation (NSDI), Boston, MA, Mar 2011.
be used in rankings, we use Bing advertising system auction
[13] S. Guha, A. Reznichenko, K. Tang, H. Haddadi, and
P. Francis. Serving Ads from localhost for Performance,
trace to determine the effect of these delays. We find the
Privacy, and Profit. In Proceedings of HotNets ’09.
effect to be very minimal, and so conclude that our auction
[14] H. Kikuchi. (M+ 1) st-Price Auction Protocol. In
Proceedings of the 5th International Conference on
As future work, we plan to implement the auction sys-
Financial Cryptography, pages 351–363. Springer-Verlag,
tem to operate with a medium-scale deployment of Privad
planned for next year (several 10’s of thousands of users).
[15] H. Kikuchi, S. Hotta, K. Abe, and S. Nakanishi.
Distributed auction servers resolving winner and winning
Along with this, we plan to design and deploy mechanisms to
bid without revealing privacy of bids. In Parallel and
compute U , and measure their effectiveness. We also plan to
Distributed Systems: Workshops, Seventh International
do a measurement study of ads served by Bing to determine
Conference on, 2000, pages 307–312. IEEE, 2000.
to what extent advertisers can reverse-engineer each other’s
[16] B. Krishnamurthy and C. Wills. On the Leakage of
bids in today’s systems. This will quantify how much adver-
Personally Identifiable Information Via Online Social
tiser privacy loss is incurred by the Rank-at-Client scheme.
Networks. In Proceedings of WOSN ’09.
[17] B. Krishnamurthy and C. Wills. Privacy diffusion on the
Each of the non-tracking advertising schemes so far proposed
web: A longitudinal perspective. In Proceedings of the 18th
makes the tacit assumption that there is only a single bro-
International Conference on World Wide Web (WWW
ker, and a single profiler operating at each client. We are
interested in exploring what happens if there are multiple
[18] D. Levin, B. Bhattacharjee, J. R. Douceur, J. R. Lorch,
brokers with competing profilers in each client. In particu-
J. Mickens, and T. Moscibroda. Nurikabe: Private yet
lar, the profilers should be able to dynamically compete for
Accountable Targeted Advertising. Under submission. Contact [email protected] for copy, 2009.
ad boxes in real time, thus adding a new element to the auc-
[19] H. Lipmaa, N. Asokan, and V. Niemi. Secure Vickrey
tion that is not unlike the way ad exchanges operate today.
auctions without threshold trust. In Proceedings of the 6th
What’s more, these clients may also need to compete with
international conference on Financial cryptography, pages
existing tracking advertising systems in real-time auctions
run by existing ad exchanges. We plan on designing and
[20] Microsoft, Inc. Start advertising on Yahoo! Search and
Bing. https://adcenter.microsoft.com/.
[21] M. Naor, B. Pinkas, and R. Sumner. Privacy preserving
auctions and mechanism design. In Proceedings of the 1st
REFERENCES
ACM conference on Electronic commerce, pages 129–139.
[1] M. Abe and K. Suzuki. M+ 1-st Price Auction Using
Homomorphic Encryption. In Proceedings of the 5th
[22] K. Sako. An Auction Protocol Which Hides Bids of Losers.
International Workshop on Practice and Theory in Public
In Proceedings of the Third International Workshop on
Key Cryptosystems, pages 115–124. Springer-Verlag, 2002.
Practice and Theory in Public Key Cryptography, pages
[2] O. Baudron and J. Stern. Non-interactive Private
Auctions. In Proceedings of the 5th International
[23] Stanford. Do Not Track Universal Web Tracking Opt-Out.
Conference on Financial Cryptography, pages 364–378.
[24] V. Toubiana, A. Narayanan, D. Boneh, H. Nissenbaum,
[3] F. Brandt. Fully private auctions in a constant number of
and S. Barocas. Adnostic: Privacy Preserving Targeted
rounds. In Proceedings of the 7th international conference
Advertising. In Proceedings of NDSS ’10.
• Highly adaptable to meet ever-changing A broad range of data col ection and platform components including the latest Windows Mobile 6.1, and Cisco® Compatible Extensions (CCX) certification, ensuring wireless networks, to provide the robust • Latest generation of imaging technology performance metrics, simplify IT support and control operating costs. extend enterprise applica
Primary Hyperparathyroidism By Associate Professor Terry Diamond, Department of Endocrinology, University of New South Wales What is Primary Hyperparathyroidism? This is a syndrome comprising of an overproduction of parathyroid hormone (PTH) by one or more abnormal parathyroid glands resulting in hypercalcaemia (an elevated serum calcium level). Normal Parathyroid Anatomy: The para