Code Monkey home page Code Monkey logo

privacy-preserving-ads's Introduction

This is the repository for various privacy-preserving ads proposals.

Ad Selection API Proposal

Overview

The documents in this repository are meant to help describe the Ad Selection API, a privacy-preserving ads (PPA) proposal that is substantially similar to other ad serving proposals from a structure, flow, and syntax perspective, but with some core differences related to overall model and infrastructure that we believe provide critical capabilities that will enable the open web ecosystem to effectively move to privacy-preserving ads APIs.

We plan to hold regular meetings under the auspices of the WICG to go through the details of this proposal and quickly respond to feedback. Please comment on the timing question in Issue #50 if you want to participate in these meetings to influence the direction of the proposal.

If you are ready to dive in, we recommend you start with the following content:

  • The Ad Selection API overview, which describes the proposal at a high-level including descriptions of privacy model, infrastructure, and features.
  • The Auction & infrastructure design, which contains diagrams of the auction and ad serving flows and how they are similar and different to other proposals in this space.

Since this API leverages similar concepts from other proposals, some of the concepts referenced are already well-described in those proposals' GitHub repositories. We aim to minimize duplicative explanations and definitions in favor of focusing on documenting key additions and differences.

Documents in this repository:

  1. Proposal overview
    1. Ad Selection API overview: an overview of the Ad Selection API, including the rationale for a new proposal
    2. API differences: a high-level overview of differences between Ad Selection API and other industry proposals.
  2. Data flows & examples
    1. Auction & infrastructure design
    2. Life of an ad request
  3. API specification

Background reading: the Protected Audience API

Since this proposal leverages many of the concepts and terms used in the the Protected Audience API proposal, we recommend you review the following resources as a part of reviewing this proposal:

Archived Proposals

This repo has hosted a variety of proposals intended to help with the effort to enable privacy-preserving advertising on the web. While these proposals are not actively being worked on, we offer historical links here for educational purposes.

privacy-preserving-ads's People

Contributors

brandr0id avatar caraitto avatar erik-anderson avatar jrmooring avatar keldaanders avatar mehulparsana avatar microsoftopensource avatar ptetali2 avatar travisleithead avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

privacy-preserving-ads's Issues

K-anon check vs. dynamic URLs

Hello Edge folks! We're delighted to have you on board.

Can you explain a little more how you are planning to handle the combination of dynamically-generated ad URLs and checking for k-anonymity? This seemed challenging to us. WICG/turtledove#729 offered a partial solution, but it seems like you're going farther than that, and I'd like to understand the trade-offs.

The current Protected Audience flow is:

  1. IG's generateBid gets called once with the full list of available ad URLs, i.e. both k-anon and non-k-anon ads.
  2. If the IG bids with one of the non-k-anon choices, then the same IG's generateBid is called a second time, but this time with only the ads that are already k-anon in its list of available ads. Now it has a guaranteed chance to bid with an ad that's over the k-anon threshold.
    Of course this flow only makes sense because the browser already knows (in advance of the auction) the k-anon status for every ad in the IG.

I understand that your choice to not support on-device bidding makes your life easier here because your k-anon checks are server-to-server, which makes step 1 feasible in real time. But I still don't know how you handle step 2.

MaskedLARK: No interaction between the helpers → no input validation?

In the MaskedLARK proposal, there is a claim that helpers do not need to communicate. I think this opens up attacks that can be done by dishonest clients sending invalid secret shares that don’t sum up to proper ranges (binary, etc). Adding interaction can prevent this bad outcome (via more complex MPC) and reduce the “blast radius” of a single corrupted record.

I think this should be considered as an extension to the proposal.

Reference to Parakeet as a previous proposal, probably a typo

The key difference between previously proposed approaches such as TURTLEDOVE and PARAKEET is multiple ad requests with segregated information vs. a combined anonymized ad request.

Is PARAKEET supposed to be FLEDGE or SPARROW in the above sentence?

Scheduled Calls for Ad Selection API

As a follow-up to sharing our Ad Selection API proposal, we will schedule and resume biweekly discussions to encourage open discussion, review technical details and resolve open issues.

Note: The 8/8 meeting has been cancelled! We look forward to having more discussions in the future 😄

If you want to discuss a topic in the next meeting please submit a Github issue here!

Upcoming Meeting: Thursday August 22nd, 08:00a PST/11:00a EST
Agenda & Meeting Notes

Join on your computer, mobile app or room device
Click here to join the meeting
Meeting ID: 253 510 859 112
Passcode: iFpcJ7
Download Teams | Join on the web
Or call in (audio only)
+1 323-849-4874,,690773170# United States, Los Angeles
Phone Conference ID: 690 773 170#
Find a local number
Learn More | Meeting options

Following Meeting: Thursday August 22nd, 08:00a PST/11:00a EST

If you want to participate, please make sure you join the WICG: https://www.w3.org/community/wicg/

Clarify "contextual" vs "user" division of signals

In the discussion on the 4/21, I realized I had the wrong model of the division between contextual and user signals.

Per your API flow diagram, I was under the impression that the user signals S were cross-domain signals, with the transformation from S to S' happening in advance — the top part of the Data Flows diagram.

But it sounds like your notion of user signals S includes first-party user data as well. That means that the browser doesn't have any technical means of ensuring it's passed through the Browser Targeting Storage — a publisher could add information about prior behavior on their own site to C, but you want them to add it to S instead.

It might be worth a paragraph explaining the details of the flow involving first-party audiences.

frequency capping support

Hi all,

Re reading through the proposal, I couldn't find any mention of frequency / recency capping support.
Was that use case considered? Is it supported in current version of either PARAKEET / MaCAW ? If not, do you envision support in the future? if yes, how?

Thank you for your help and clarification on the topic

MaCAW with FLEDGE-style contextual signals C

Thank you for publishing MaCAW! This additional phase for MPC-based ad selection with higher-fidelity signals sounds extremely promising. In particular it makes it seem more plausible that we increase the privacy parameters for the lower-resolution (C', S') in the first PARAKEET round while using the second round to regain utility.

Now that the MPC is based on unmodified contextual signals C, I'd like to better understand what form those signals take. In PARAKEET you said:

We propose to remove page URL and page title from ad request. Websites can work with SSP or ad network prior to get page category and keywords to pass in an ad request.

So let's dig into "prior" means (emphasis added).

Issues #4 and #5 make it sound like there could be direct interaction between the browser and some ad networks at page rendering time before contacting the PARAKEET service. This interaction doesn't involve any user features S, but its view of page context is unrestricted.

Could that direct interaction involve an ad network providing both purely-contextually-targeted ad candidates and the contextual signals C that feed into PARAKEET+MaCAW?

Of course I understand that I'm asking about making the PARAKEET flow look a little more like FLEDGE's with a purely-contextual request. But maybe this is what you expected was hiding in the word "prior" all along.

PARAKEET & retargering

Hi,

I am trying to better understand how the PARAKEET proposal works with the retargering use case.

Let's say a user visits an electronics store and is interested in a product - a red Iphone 10. The electronics store would like to run a retargeting campaign, and advertise this exact product to the user.

Question1: how could a call to joinAdInterestGroup look like in this case? Perhaps something like this?

navigator.joinAdInterestGroup({
origin: 'electronicsstore.com',
interests: 'red-iphone-10',
(...)
} (...))

Later on the user visits two other advertisers and they add two more interests: 'yellow-nike-air-force-gym-trainers' and 'pirelli-all-year-250/80/16'.

Question2: I was wondering, how would these interests be passed to the buyers during an auction? Would they be passed as a tuple in the 'user-ad-interests' field?

'user-ad-interests': 'red-iphone-10,yellow-nike-air-force-gym-trainers,pirelli-all-year-250/80/16'

I have some follow up questions in my mind, but before I ask them, I'd be grateful if you could confirm if my understanding so far is correct.

Best regards,
Jonasz

PARAKEET with Noisy Ranking: Advertiser isolation

An advertiser might not want to provide signals to be potentially used to their competitors’ advantage. To satisfy the advertiser’s needs, the buyer could try to isolate users’ data and compute embeddings separately. During an auction, such isolated advertisers should be considered independently, even if represented by the same buyer.

PARAKEET with Noisy Ranking seems to lack mechanisms to support that, with the following scoped per buyer:

  • User profile
    • Shared storage
    • Browser embeddings
  • User privacy protection mechanisms
    • Throttling
    • Random delays
    • Caching

What do you think about this use case?

Call to discuss MPC and measurement

Per the discussion in this issue, we had a one-off call:

  • Date and time: Monday, November 8th at 9am PT/12pm ET.
  • Minutes

There's a lot of activity happening across various entities that are working to propose novel solutions to use multi-party computation to help solve measurement use cases in a privacy-preserving way:

There are higher-level questions across all of these that we'd ideally have a forum on discussing how different folks want to do evaluations and comparisons, explore more collaboration opportunities, discuss higher-level policy and threat model questions, etc.

There's already the every-other-week Attribution Reporting API meeting which likely has many of the right participants, but that meeting naturally gravitates toward discussing the specific proposals in that repo and implementation status; it's difficult to structure and have some of the broader coordination on MPC proposals.

Given TPAC is ongoing, I'd propose that we start the week of November 1st or later. Given participant overlap, it may make sense to slot this in opposite the existing Attribution Reporting API meeting.

If you are interested in participating in this, please comment on this issue and state if every other Monday at 9am PT/12pm ET will generally work for you. If we get consensus on that time, then I'd like to have our first meeting November 8th.

If you'd like to propose an alternate set of possible times, we can evaluate that too.

PARAKEET: Clarify HTTPS requirements

The explainer does not mention whether there are any requirements on the URLs for the proxy, the reporting URLs, etc. It seems the proxy at the very least, and most other endpoints should be required to use HTTPS.

PARAKEET origin

The PARAKEET spec says "an advertiser with the help of DSP (demand side platform) tools will be able to add the user to an ad interest group" and defines origin as "The advertiser domain that is adding the user to a specific interest group." This seems misleading in several ways: (1) typically the advertiser merely pays the DSP to place its ads and isn't actively involved in such transactions, which means the DSP will actually add the user to the interest group; (2) AFAICS there's no technical means to limit the use of the information gathered via joinAdInterestGroup() to a single advertiser or even to the DSP that gathers the information, e.g. the information could be shared with other entities in the ecosystem.

Ensuring Brand Safety with the PARAKEET model

Background
Brand safety in advertising is the function by which brands can ensure that their ads show in safe environments. For example, an airline advertiser may not want to show their ads next to an article about the spread of COVID-19. Similarly, a mortgage lender or bank may not want their ads to show up next to an article about housing crisis. There are many reasons and triggers for brand safety and today DSP’s and SSP’s manage brand safety by honoring an allow list or block list of sensitive domains, subdomains, urls or keywords. Keywords are the most granular brand safety signal and are useful where a site could carry content from many different categories and the details about the environment cannot be captured just by a url.

Problems with Brand Safety in PARAKEET Model
PARAKEET protects user identity by anonymizing user interests and context when the ad request is sent to the SSP and DSP. The anonymization of the context makes it harder for the advertiser’s brand safety signals to be applied. For example, if the only contextual signal is the main domain, then the SSP or DSP are unable to enforce brand safety at sub-domain or keyword levels.
This problem also exists for other privacy preserving proposals, where the interest group ad request contains no contextual information and hence hinders brand safety enforcement for those ad responses.

Proposals for Brand Safety in PARAKEET

_Option 1 – Sending validated brand safety signals in the request path.

Conceptually, the advertiser working with a DSP can define which context groups are to be avoided because of brand safety. This can be based on keywords or topics that are considered unsafe for their ads. PARAKEET can enable the sharing of brand safety categorization from the publisher context along with the ad request.

The brand safety tool provider reviews publisher side context (url, title, keywords on the page etc.) and associates that contet with a brand safety taxonomy classification.
For example, multiple references to the keyword ‘COVID’ or the url of the page about the spread of COVID in a local region, may result in the context being classified as being about ‘public health’.

When a user later visits a publisher page, publisher provided code can obtain a signed brand safety category signal from the brand safety tool provider and send it on to PARAKEET as part of the request.

Alternatively, when PARAKEET gets an ad request from the publisher, it can request the brand safety tool provider for the brand safety category for that context (url). The signed response from the brand safety service can be cached for future use for that context(url).

In either case, PARAKEET can validate the signature as one coming from an authorized brand safety tool provider and check if the brand safety signal could be a risk to privacy and enable user identification. The privacy check would include how many users visit that page in the recent past to minimize identification. If the check passes and brand category is passed down, the SSP would get this signal and pass it on downstream and DSPs can read the brand safety signal and enforce exclusions as specified by the advertiser for their campaigns. If the check fails or if the brand safety signal is not from an authorized tool provider, then the ad request will proceed without brand safety signal.

We will need to work with the industry to define a taxonomy that is acceptable for ensuring brand safety and how large that taxonomy list needs to be while accounting for privacy concerns. We can review this proposal and define this within industry groups like IAB and W3C Web advertising group.

_Option 2 – using MPC and homomorphic encryption.
Another option is to use secure multi-party compute. In this model, the first request through the browser has anonymized interest group and restricted context information (just the domain for example) as proposed in PARAKEET. If the DSP or SSP examine the domain context and need additional brand safety checks, they can respond with their ads and indicate that they need a further brand safety check for this request. At that point, the browser will send an encrypted request with full context. The DSP or SSP can run brand safety on encrypted context string. An evaluation model will need to be built to run on encrypted contextual data across the browser service and the DSP or SSP, such that it can send out a binary safe or unsafe signal without the browser service needing to know the model parameters for brand safety evaluation, nor for the SSP or DSP to know the unencrypted, granular, context detail to enforce brand safety.
SSP or DSP or third parties that specialize in brand safety can train the model offline on current web content and add it to the flow. The result for the browser service needs to be guaranteed to only return a brand safety binary signal.
The browser service can send back a signal indicating that brand safety was the reason why one or more ads were withheld to the publisher and it can optionally return a winning ad.

How do we expect Programmatic Guaranteed and other Private auctions to work through Parakeet

Typically Private Marketplace deals between a publisher and a buyer work via deal_id. The deal_id needs to be passed in the bid request which helps a DSP determine to bid differently as its a part of known deal done beforehand between the publisher and advertiser.

Given this deal_id needs to reach dsp/ssp as it is, how does Parakeet could send it keeping all the KAnonymity thresholds into account

Reference to Sparrow probably a typo

As described earlier in this document, the most significant difference between SPARROW and TURTLEDOVE is that TURTLEDOVE completely separates contextual and interest group signals by making two separate uncorrelated requests.

I believe you mean "… difference between PARAKEET and TURTLEDOVE."

It could also be that you meant to mention and compare with SPARROW earlier but you don't.

Encode identity for every individual on the earth using 23 binary ad interests?

For example, one can adversarially encode identity for every individual on the earth using 23 binary ad interests without microtargeting and exchange it with other parties.

2^23 ≈ 8.4 million. Are you considering additional fingerprinting bits to reach the necessary 33 bits or is this a typo? Or maybe I'm misinterpreting what "binary ad interests" are. Earth population ≈ 7.7 billion, 2^32 ≈ 4.3 billion, 2^33 ≈ 8.6 billion.

Parakeet: support simple comparison against ad server ads.

Parakeets server-side solution for managing the privacy utility tradeoff provides some benefits on both side of the equation over FLEDGE.

However, integrating Parakeet into an ad serving stack will be more difficult as publishers will need to choose between sending an ad impression to its ad server to fill direct sold ads or send it to Parakeet.

This could be solved by allowing publishers to first choose the best ad in their ad server and passing information in the Parakeet ad request config (such as price and possibly a simple priority) which Parakeet could use to compare against the winning Parakeet ad returning null instead of a promise from navigator.createAdRequest when the Parakeet ad is lower value (similar to the FLEDGE API). The comparison could either be done client side, on the Parakeet server, or even passed to the SSP to ensure no bids are returned that do not beat the ad server ad.

MaskedLARK: Timing side-channel via only sending false reports at attributionexpiry

In MaskedLARK, the doc says:

In the existing aggregate reporting proposal, only actual conversions require reporting. To train a model, we require both positive samples and negative samples, meaning negative values (i.e., no conversion) also need to be sent to the ad server. We propose extending the scheduled reports framework to handle the negative cases, by sending a value when attributionexpiry is reached.

Unless real reports are also sent at the attributionexpiry, this allows distinguishing between false reports and real reports. In order to make this private we’d need to make sure both false reports and real reports are sent in a way that is indistinguishable.

In https://github.com/WICG/conversion-measurement-api/blob/main/event_attribution_reporting_views.md we do this by having a deterministic report time for both real and fake values. We could do the same in this proposal. However, this is very difficult to align with a desired use-case of getting reports quickly without much delay. If real reports are sent close to their conversion time, it is very difficult to make fake reports match that distribution. On the other hand, if attributionexpiry is short, we could suffer coverage loss.

Alternatively, you could allow some level of privacy loss if you submitted reports at a time that was protected with local differential privacy. One issue that comes up with this technique is that for true reports, the DP mechanism is only allowed to add positive noise (since time travel is not quite possible yet).

Allow malicious ads prevention scripts to scan Parakeet ads for malicious activities

It's understandable why Parakeet ads need to be rendered in a Fenced Frame. However publishers are still responsible for the quality of such ads. I agree a lot of quality checks can be done on the server side but malicious behaviors are typically exhibited by ads while rendering in the browser.

Such malicious ads detection are typically done by scripts provided by specialized vendors like GeoEdge, Clean.IO, The Media Trust, Confiant etc.

We propose that such malicious ads detection companies should be allowed to scan Parakeet ads rendering into fenced frames in same way as done today.

MaskedLARK & MACAW, FLEDGE (sharing the feature vector with helpers)

Hi,

Thank you Charles for the presentation during the recent W3C meeting!

In the Masked Gradient Model Training use case one of the assumptions seems to be that the feature vector can be shared with the helpers.

In my understanding, this solution would not be compatible with MACAW and FLEDGE, where the feature vector is constructed from context and user features ((c, s) in MACAW's notation), and it is important not to share that combined information outside the device.

I was wondering, is my understanding correct? If so, do you see any ways to adapt MaskedLARK to MACAW and FLEDGE?

Best reagards,
Jonasz

Removing yourself from an IG

In your description of Interest Groups, you say:

The consumer is able to see the IGs they have joined and remove themselves if they don't wish to be targeted in that way anymore.

I like this sentiment a lot! But how does it work in the Edge design, where an IG's bidding behavior can draw on information from many other IGs as well?

I'm comparing with the Chrome design, with single-site bidding behavior. If I don't like an ad I saw because of a bid from IG-1 which I was added to on site-1, then I can delete all the IGs I was added to on site-1. Yay!

In the Edge approach, I might see an ad which looks like it came from IG-1 added on site-1. But if I delete that IG, or all site-1 IGs, the very same ad might appear based on the very same underlying data, this time coming from IG-2 joined on site-2.

Maybe the only way for my copy of Edge to know that it's deleting the underlying targeting data would be to delete all IGs owned by the same ad tech? Or, since advertiser buy through multiple ad techs, maybe you'd just need to wipe all IGs on the device entirely?

This seems like a big difference between Chrome's single-IG and Edge's many-IG bidding model, so I'm very interested in hearing your thinking on this UX question.

Clarification on seller use of IG

One of the primary problems with the https://github.com/WICG/turtledove proposal is the awkward position seller ATP are put in. SSPs, retailers, and big media sellers may know the content and audience quite well, but have no idea how to bid. PAAPI makes the IG owner the only entity that can bid. However, there are well-known issues with that eg WICG/turtledove#338 from a big retailer and this one WICG/turtledove#418 from the sandbox team. Proposals to solve this asymmetry with reporting or with delegation mechanisms, eg WICG/turtledove#399 (comment), are quite good but not available. In the meantime, sellers with knowledge of audience are asked to pivot into being buyers or to deeply trust one.

We're curious if this proposal allows a use case we're quite familiar with: supply platform sets an audience segment (or IG as we call them now) on one site, knows the content in-depth on another, and delivers the combination as a bid request deal id. PAAPI allows for onboarding and supply platform audience trafficking quite awkwardly, with entities having to morph far into unfamiliar territory. If this proposal didn't require such leaps, its barriers to adoption would be quite a bit lower than the experience so far in PA.

PARAKEET adoption in Chrome

I would like to discuss the adoption of PARAKEET in Chrome.

Michael Kleber (Chrome) has previously said that #11 is currently blocking Chrome's adoption of PARAKEET.

Although that issue has been partly addressed, I would like to hear:

  1. How Chrome currently regard this issue
  2. What steps the Edge team are taking to address this

PARAKEET & retargeting - ad selection

Hi,

In PARAKEET the ad selection is based on anonymized signals (c', s'). This is also true for MACAW, which picks an ad set based on these anonymized signals.

In #34 (comment) I expressed some worries about how c' may not carry enough information for retargeting purposes.

I was wondering, could the ad selection process perhaps be moved to the time when joinAdInterestGroup is called?

const AdInterests = {'origin': 'www.advertiser.com',
                     'business': 'advertiser name'
                     'interests': ['athletic-shoes',...]
                     'representations': [ Model1, Model2 ]
                     'readers': ['first-ad-network.com',
                                 'second-ad-network.com'],
                     'adUrl': 'advertiser.com/ad-for-athletic-shoes',  // new field, optional
                };
navigator.joinAdInterestGroup(AdInterests, 30 * kSecsPerDay);

At the time of joinAdInterestGroup the advertiser has all the first party data necessary to pick an ad for the user, so the problem of c' potentially being too coarse for effective ad selection would be solved. This information would not be propagated to the ad server during an auction. Also note that this field may be completely optional.

Effectively, the advertiser would be able to say: "if I ever win an auction, I want to display the ad that was specified in adInterests.adUrl".

Best regards,
Jonasz

PARAKEET via OpenRTB Protocol

OpenRTB is the common standard for interacting between parties at request time in an advertising transaction. Adoption of Parakeet would be much easier for both SSPs and DSPs if the Parakeet server used OpenRTB to communicate.

Microsoft has indicated they are working towards adding full support for OpenRTB in both the polyfill and native implementations of PARAKEET. However, agreement on location of key pieces values within the OpenRTB object is needed to move forward. Could Microsoft publish expected OpenRTB structure for comment and agreement?

Additionally, could any published structure clearly indicate where publisherAdUnit would be placed within the structure and include a flag indicating the given request is from a parakeet server in order to trigger parakeet specific logic.

MaskedLARK: Range restriction on conversion data restricts use-cases

In the MaskedLARK proposal, there is only a small enum of potential conversion-side values ("purchase", "visit time", etc). This is fairly restrictive for some reporting use-cases. For example, a large advertiser might want to report on precisely which product (out of thousands) in their catalog was purchased, which would likely require huge value vectors.

I think at the very least this should be documented in the proposal, although I totally see why it is left out of scope.

Detecting Invalid Traffic with Anonymization and Aggregated Reporting

Detecting invalid traffic (IVT) is an essential part of the ads ecosystem and often relies on:

  • Human Analysis for manual investigations, discovering new attacks and developing new IVT models
  • Training and deploying Automated IVT detection models

However, the use of anonymization and aggregated reporting within PARAKEET makes the above IVT use cases challenging. We’ve written up a proposal called ASTRAPIA to address these key IVT detection use cases (although the ASTRAPIA proposal is currently written for FLEDGE, it could also be generalized for PARAKEET too).

We look forward to further discussing ASTRAPIA or other ideas for supporting PARAKEET and IVT detection either here or as an issue in https://github.com/google/ads-privacy/issues. Thanks!

PARAKEET in JavaScript?

With FLEDGE we are building a JavaScript-only implementation (polyfill/shim), to allow experimentation, testing, and faster iteration. I see something along these lines was also raised at the 2021-03-15 meeting; is this something you've given more thought to?

Since the FLEDGE and PARAKEET APIs are similar and the JS implementations would also be similar, I was also wondering whether a joint implementation might make sense?

Publisher–SSP (or Publisher-DSP) collusion risk

It seems like at the time of the ad request, the publisher knows the true contextual signals C, and the SSP learns the anonymized version C'. But as you point out in the Timing correlation section of your threat analysis, the need for a real-time ad request and response lets both of them join their respective signals with a timestamp whose resolution is on the order of the duration of the ad request (surely a single-digit number of seconds, no matter how willing we are to inject latency).

That makes it seem inevitable that a colluding publisher and SSP would be able to learn the collection of all the user/etc S' signals associated with a single user.

I believe this means that over time, we need to assume the publisher could recover the un-noised (DP-free) user signals.

Advertiser's control over Interest Group merging

Hi all,

During last two public calls there was discussion about the differences in the privacy model between Ad Selection API and Protected Audience API. More specifically, as mentioned in API difference highlihgts, in Ad Selection API Interest Groups are

Partitioned by domain on disk and in transit, merged in trusted/transient/opaque env.

One aspect discussed during the calls was what these differences mean from the perspective of the advertiser. For better visibility and tracking, I'd like to file this issue and describe RTB House's feature request for better advertiser's control over Interest Group merging.

We see value in the advertiser being able to specify whether their interest group should be processed in isolation, or if it can be merged and processed with other IGs (created when user visited different toplevel domains) when generateBid is run. We propose that this behavior be controlled by a dedicated parameter during a call to joinAdInterestGroup. It'd also be valuable if the value of this parameter is easily inspectable in the created IGs (for example through DevTools). The value this brings is that advertisers can inspect their technical integration with AdTechs, and make sure the browser will technicaly enforce that their IGs are isolated during generateBid.

Some more detailed discussion on this was held on 2024-03-21 (notes), and 2024-04-04 (notes).

Best regards,
Jonasz

Scheduled calls for PARAKEET

As a follow-up to sharing our PARAKEET proposal, we had scheduled biweekly discussions to encourage open discussion, review technical details and resolve open issues.

Meetings are currently paused

We are continuing to iterate on an updated proposal. We'll resume meetings when we're able to share something substantial.

If you want to participate, please make sure you join the WICG: https://www.w3.org/community/wicg/

New call-in info will be made available when calls resume.

Meeting Agenda, Notes, and Attendance will be recorded here

Previous Discussion Details:

Batching operation from Noisy Ranking

I'm trying to understand the "batching" and "caching" operations from the Noisy Ranking proposal, and in particular how they helps with the re-identification risk that we've discussed before in #11 .

I see that each batched request contains a list of browserEmbeddings represented as dense vectors (which have undergone noising). If I understand correctly, each of those embedding vectors is based on both the page the user is visiting ("Publisher-user input features") and the user's stored ads interests ("User-ads input features"), so of course the noising/batching/caching is key to the system not leaking browsing history.

Are the vectors in a particular batch supposed to share anything other than the choice of DSP and embedding model? In particular, is there any batching by publisher going on, or might they all come from different pubs? And if the pubs might be all different, then do you expect your DP noising to be large enough that a buyer would not even be able to figure out which embedding vector came from one particular "colluding publisher"?

I ask because if the DSP can figure out which embedding vector came from a particular publisher, then that seems to make it easy to undo the batching. But on the other hand, if a DSP cannot figure out the publisher of a particular vector, then I worry that the design will be very difficult to integrate into buyers' existing work flows, which of course has been a key PARAKEET design goal.

I have a similar question about the caching operation behind the throttling. Would a particular user's cache entry from their visit to pub1 be reused for a visit to pub2? If not, then the DSP could see many requests from the same user in which the user-specific signals stay the same (up to noise) but the publisher-specific signals vary, and we're back to a re-identification attack.

Apologies if these questions reveal that I'm misunderstanding a part of the proposal, which I readily admit I have not yet fully absorbed.

Request to outline the total expected latency before an ad is rendered via Parakeet

The way i understand the steps

  1. User visits the page and browser calls Parakeet service
  2. Parakeet services generates C',S' parameters and calls multiple SSPs for each Interest Group. The SSPs then call multiple DSPs to collect ads with bids
  3. Parakeet Service the does multiple round trips between itself, the same SSPs and DSPs it interacted in point 2 above to run secure multiple part computations for other quality controls as expected by sell side and buy side
  4. Finally Parakeet service chooses the final ad and sends it back to the browser

We understand that caching techniques between SSPs, DSPs, Parakeet service could help but the network calls latency and multi party computations seem heavy even when done on server side.

Will love to understand how Parakeet team thinks about cutting latency which is critical

direct sold ads and header bidding

Typically a publisher's page does header bidding to collect all bids from multiple SSPs --> the bids pass into the ad server --> ad server run competition between these bids and direct ads before the final ad is rendered on the page. Publisher page calling ad server typically passes 100s of key-value pairs including few like browser, viewport size, ad position etc. which are very specific to the publisher. How do we expect all of this to work in this proposal?

MaskedLARK: Additional Types of Data Labels

I have two questions/requests for the labels in the data the browser produces in the MaskedLARK proposal.

If I understand correctly, when a user clicks on an ad, this would be stored in the browser, and then later, if the user converts, the browser would generate the datapoint (x=features, label=1). Whereas if the user failed to convert within a given time frame, the browser would generate the datapoint (x=features, label=0).

  1. You describe how we could define arbitrary labels in the conversion pixel (e.g. use dollar amounts of the conversion instead of a binary is_conversion) but would it be possible to have the label include the delay between the click and the conversion? This would require help from the browser since the conversion pixel itself couldn't know the delay.

  2. In the case where the user clicks ad 1, then clicks ad 2, and only then converts, would it be possible to emit a data point for the first click in addition to the second (ideally with a negative label)? In the proposal you emphasize how a click that timed out could be treated as a negative data point but I don't think you cover the case where a click is superseded by a more recent click.

That is, ideally (for us) if a user's browsing history looked like click_1 -> click_2 -> conversion -> click_3 -> (no conversion before click_3 times out), we would have the following data points (stored in the browser, to be masked etc later):

click_1: (x=click_1_feats, label=(is_conversion=False, delay=T_click_2 - T_click_1))
click_2: (x=click_2_feats, label=(is_conversion=True, delay=T_conversion - T_click_2))
click_3: (x=click_3_feats, label=(is_conversion=False, delay=Timeout_TTL))

Multiple ads on a Page

Most pages display multiple ads. When the ad network cannot tell who the ad is being shown to, but only knows cohorts/interests/context, it is likely that the same ad (or ads from the same advertiser campaign) may win the bids for multiple ad locations on that page. This could be a real problem; it is more than simple frequency capping.

With PARAKEET, the page could request all ads in a single request, such that the ad network can coordinate the set of ads returned. The request should indicate ads that will be immediately visible (above the fold) and those that will only be visible if the user scrolls.

The main issue might be that even with some reasonable limit on the number of ads requested, it might make it easier to communicate some type of user ID within the metadata of the request. PARAKEET might address this by sometimes splitting this up into multiple requests to the ad network (but still more than a single ad per request) and then eliminating any duplicate ads.

While this issue applies to TURTLEDOVE, SPARROW, etc. as well, I think PARAKEET is better positioned to provide a good solution.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.