It appears that <a href="https://github.com/google/customer-match-upload-/blob/m

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Script only passes one identifier per UserData object about customer-match-upload-script HOT 13 OPEN

jradcliff commented on September 18, 2024

Script only passes one identifier per UserData object

from customer-match-upload-script.

Comments (13)

jradcliff commented on September 18, 2024 1

The identifier field of a UserIdentifier is a union (oneof) field:

https://developers.google.com/google-ads/api/reference/rpc/v10/UserIdentifier#identifier

That means that only one of its fields can be set on a given instance of UserIdentifier. Here's the proto definition that may clarify things (note the "Exactly one must be specified" comment):

  // Exactly one must be specified. For OfflineUserDataJobService, Customer
  // Match accepts hashed_email, hashed_phone_number, mobile_id,
  // third_party_user_id, and address_info; Store Sales accepts hashed_email,
  // hashed_phone_number, third_party_user_id, and address_info.
  // ConversionUploadService accepts hashed_email and hashed_phone_number.
  // ConversionAdjustmentUploadService accepts hashed_email,
  // hashed_phone_number, and address_info.
  oneof identifier {
    // Hashed email address using SHA-256 hash function after normalization.
    // Accepted for Customer Match, Store Sales, ConversionUploadService, and
    // ConversionAdjustmentUploadService.
    string hashed_email = 7;

    // Hashed phone number using SHA-256 hash function after normalization
    // (E164 standard). Accepted for Customer Match, Store Sales,
    // ConversionUploadService, and ConversionAdjustmentUploadService.
    string hashed_phone_number = 8;

    // Mobile device ID (advertising ID/IDFA). Accepted only for Customer Match.
    string mobile_id = 9;

    // Advertiser-assigned user ID for Customer Match upload, or
    // third-party-assigned user ID for Store Sales. Accepted only for Customer
    // Match and Store Sales.
    string third_party_user_id = 10;

    // Address information. Accepted only for Customer Match, Store Sales, and
    // ConversionAdjustmentUploadService.
    OfflineUserAddressInfo address_info = 5;
  }

What this means in practice is that your 2nd example above is not possible. If you have an instance of UserIdentifier and set its hashed_email and then set its hashed_phone_number, the hashed_email will no longer be set on that instance -- it will only have the last field you set in the oneof, which would be hashed_phone_number.

Thanks,
Josh, Google Ads API Team

from customer-match-upload-script.

valerioditinno commented on September 18, 2024 1

Hi @ashishghai, sorry for the late response.

I send the data to the API using an Apache Beam pipeline (Python SDK) that use BatchElements PTransform: this transform attempts to optimize the batch size based on total elements in the PCollection, so I do not have a fixed one.
For 15k elements the pipeline creates 3 batch and it takes 7 min and 20 sec to complete (batch are processed sequentially)

Hope it helps a little

Have a nice day

Valerio

from customer-match-upload-script.

dliu9999 commented on September 18, 2024

Does this change significantly impact performance at all? I'm trying to upload about 1.4million rows, each row with [Email, Phone, FirstName, LastName, CountryCode, ZipCode], with the existing script, but build_offline_user_data_job_operations took over 1.5 hrs before I killed it.

Do you have any other suggestions?

EDIT: I just implemented your second version, and it's faster. But doing the math, it took 5 seconds for 10 rows. Meaning it'll take over 8 days for all 1.4million rows...

from customer-match-upload-script.

jradcliff commented on September 18, 2024

Hi,

I would caution against extrapolating job run times from such a small sample, as there's overhead involved in every job and jobs are queued up before they run. See the first bullet point in our guide for expected job run times.

Even if using a separate UserData with multiple identifiers for each distinct member results in longer processing times, you should still use that approach since that's the correct usage.

Thanks,
Josh

from customer-match-upload-script.

valerioditinno commented on September 18, 2024

hello @jradcliff,

I had your same concern and I basically refactored read_csv and build_offline_user_data_job_operations in order to send exactly one request to the api for each row in the csv.
My question is: are you sure that this won't impact the overall match rate?

Thanks
Valerio

from customer-match-upload-script.

jradcliff commented on September 18, 2024

Hi @valerioditinno ,

I'm part of the Google Ads API team and one of the owners of its Java library. By sending one UserData with all of the user_identifiers for a specific user, you'll be using the API as intended.

I noticed you said "one request to the api for each row in the csv". Did you actually mean one operation for each row? I ask because if you have thousands of rows in your CSV, it will be more efficient to batch those into multiple operations (one per CSV row) in a single AddOfflineUserDataJobOperationsRequest than to send a separate AddOfflineUserDataJobOperationsRequest for each operation. Just keep the following limit in mind from our guide:

The operations collection for each AddOfflineUserDataJobOperationsRequest can contain at most 100,000 identifiers across all of the UserData objects in the operations. If you need to submit more than 100,000 identifiers for a job, send multiple requests with the same job resource_name.

Thanks,
Josh, Google Ads API Team

from customer-match-upload-script.

valerioditinno commented on September 18, 2024

Yes, you are right I meant operations and sorry I had no idea you were from the Google Ads API team :)

I have a last doubt: why I need to create differents user_identifiers for the same member of the list?

So basically why this:

operations {
  create {
    user_identifiers {
      hashed_email: "<hashed email address here>"
    }
    user_identifiers {
      hashed_phone_number: "<hashed phone number address here>"
    }
    user_identifiers {
      address_info {
        hashed_first_name: "<hashed first name here>"
        hashed_last_name: "<hashed last name here>"
        country_code: "<country code>"
        postal_code: "<postal code>"
      }
    }
  }
}

instead of this:

operations {
  create {
    user_identifiers {
      hashed_email: "<hashed email address here>"
      hashed_phone_number: "<hashed phone number address here>"
      address_info {
        hashed_first_name: "<hashed first name here>"
        hashed_last_name: "<hashed last name here>"
        country_code: "<country code>"
        postal_code: "<postal code>"
      }
    }
  }
}

Thank you for your time

from customer-match-upload-script.

ashishghai commented on September 18, 2024

Hi @dliu9999 did you manage to send batch request of 10,000 each and what was the performance for response , please. thank you !!

from customer-match-upload-script.

ashishghai commented on September 18, 2024

Hi @valerioditinno ,

I have some questions regarding numbers

how many records did you added in one batch ?
what was the time to complete a single API request with batch payload ?

This information would help me a lot. Thank you for your time .

Kind regards

from customer-match-upload-script.

ashishghai commented on September 18, 2024

Hi @valerioditinno Thank you very much !! it gives idea about latency . you too have a great day

Kind regards,
Ashish

from customer-match-upload-script.

dliu9999 commented on September 18, 2024

Hi, Understandable, but this is only building the offline data object locally, not running it async yet. Also I tested on sizes of 10, 50, and 100, and they scaled linearly (5, 25, 50 seconds respectively). So you think this is still the only/correct way? Some more context: we want to be able to update the records with changes (either removals or additions) once a week, I think currently we are just uploading the full updated csv each week so we're thinking about doing the same but with python.

…

On Thu, Mar 17, 2022, 7:29 AM Josh Radcliff ***@***.***> wrote: Hi, I would caution against extrapolating job run times from such a small sample, as there's overhead involved in every job and jobs are queued up before they run. See the first bullet point in our guide <https://developers.google.com/google-ads/api/docs/remarketing/audience-types/customer-match#customer_match_considerations> for expected job run times. Even if using a separate UserData with multiple identifiers for each distinct member results in longer processing times, you should still use that approach since that's the correct usage. Thanks, Josh — Reply to this email directly, view it on GitHub <#2 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AI37SUQ7UJJ32PCXRSY2QZLVAM6VJANCNFSM5LISCDMA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you commented.Message ID: ***@***.***>

from customer-match-upload-script.

RyHoLi commented on September 18, 2024

Hi @jradcliff or @valerioditinno,

How can we update the script so that we can get one UserData with multiple identifiers instead of two separate UserData objects for the same member of the list? I believe this is causing me to get an error that says "Maximum number of user identifiers allowed per request is 100,000 and per operation is 20."

from customer-match-upload-script.

jradcliff commented on September 18, 2024

Hi,

If you have multiple user identifiers for the same UserData object, you would call user_data.user_identifiers.append(...) multiple times. For example:

...
customer_data_operations = []
...

// Creates the operation and UserData objects.
user_data_operation = client.get_type('OfflineUserDataJobOperation')
user_data = user_data_operation.create

// Creates and adds identifier for hashed email to the UserData.
email_user_identifier = client.get_type('UserIdentifier')
email_user_identifier.hashed_email = item['hashed_email']
user_data.user_identifiers.append(email_user_identifier)

// Creates and adds identifier for hashed phone to the same UserData.
phone_user_identifier = client.get_type('UserIdentifier') 
phone_user_identifier.hashed_phone_number = item['hashed_phone_number']
user_data.user_identifiers.append(phone_user_identifier)

// Adds the ONE operation for the UserData to the collection.
customer_data_operations.append(user_data_operation)

The error Maximum number of user identifiers allowed per request is 100,000 and per operation is 20 occurs if either of the following conditions holds:

The total number of UserIdentifier objects across all operations in the request exceeds 100,000.

In this case, you can just split the operations across multiple requests.

A single operation's UserData contains more than 20 elements in its user_identifiers collection.

This usually indicates an implementation error in how you are constructing your UserData objects, as it's unlikely you would have more than 20 identifiers for a single user.

See our blog post at https://ads-developers.googleblog.com/2021/10/userdata-enforcement-in-google-ads-api.html for more details.

Thanks,
Josh, Google Ads API Team

from customer-match-upload-script.

Script only passes one identifier per UserData object about customer-match-upload-script HOT 13 OPEN

Comments (13)

Related Issues (5)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent