Code Monkey home page Code Monkey logo

Comments (13)

jradcliff avatar jradcliff commented on September 18, 2024 1

The identifier field of a UserIdentifier is a union (oneof) field:

https://developers.google.com/google-ads/api/reference/rpc/v10/UserIdentifier#identifier

That means that only one of its fields can be set on a given instance of UserIdentifier. Here's the proto definition that may clarify things (note the "Exactly one must be specified" comment):

  // Exactly one must be specified. For OfflineUserDataJobService, Customer
  // Match accepts hashed_email, hashed_phone_number, mobile_id,
  // third_party_user_id, and address_info; Store Sales accepts hashed_email,
  // hashed_phone_number, third_party_user_id, and address_info.
  // ConversionUploadService accepts hashed_email and hashed_phone_number.
  // ConversionAdjustmentUploadService accepts hashed_email,
  // hashed_phone_number, and address_info.
  oneof identifier {
    // Hashed email address using SHA-256 hash function after normalization.
    // Accepted for Customer Match, Store Sales, ConversionUploadService, and
    // ConversionAdjustmentUploadService.
    string hashed_email = 7;

    // Hashed phone number using SHA-256 hash function after normalization
    // (E164 standard). Accepted for Customer Match, Store Sales,
    // ConversionUploadService, and ConversionAdjustmentUploadService.
    string hashed_phone_number = 8;

    // Mobile device ID (advertising ID/IDFA). Accepted only for Customer Match.
    string mobile_id = 9;

    // Advertiser-assigned user ID for Customer Match upload, or
    // third-party-assigned user ID for Store Sales. Accepted only for Customer
    // Match and Store Sales.
    string third_party_user_id = 10;

    // Address information. Accepted only for Customer Match, Store Sales, and
    // ConversionAdjustmentUploadService.
    OfflineUserAddressInfo address_info = 5;
  }

What this means in practice is that your 2nd example above is not possible. If you have an instance of UserIdentifier and set its hashed_email and then set its hashed_phone_number, the hashed_email will no longer be set on that instance -- it will only have the last field you set in the oneof, which would be hashed_phone_number.

Thanks,
Josh, Google Ads API Team

from customer-match-upload-script.

valerioditinno avatar valerioditinno commented on September 18, 2024 1

Hi @ashishghai, sorry for the late response.

I send the data to the API using an Apache Beam pipeline (Python SDK) that use BatchElements PTransform: this transform attempts to optimize the batch size based on total elements in the PCollection, so I do not have a fixed one.
For 15k elements the pipeline creates 3 batch and it takes 7 min and 20 sec to complete (batch are processed sequentially)

Hope it helps a little

Have a nice day

Valerio

from customer-match-upload-script.

dliu9999 avatar dliu9999 commented on September 18, 2024

Does this change significantly impact performance at all? I'm trying to upload about 1.4million rows, each row with [Email, Phone, FirstName, LastName, CountryCode, ZipCode], with the existing script, but build_offline_user_data_job_operations took over 1.5 hrs before I killed it.

Do you have any other suggestions?

EDIT: I just implemented your second version, and it's faster. But doing the math, it took 5 seconds for 10 rows. Meaning it'll take over 8 days for all 1.4million rows...

from customer-match-upload-script.

jradcliff avatar jradcliff commented on September 18, 2024

Hi,

I would caution against extrapolating job run times from such a small sample, as there's overhead involved in every job and jobs are queued up before they run. See the first bullet point in our guide for expected job run times.

Even if using a separate UserData with multiple identifiers for each distinct member results in longer processing times, you should still use that approach since that's the correct usage.

Thanks,
Josh

from customer-match-upload-script.

valerioditinno avatar valerioditinno commented on September 18, 2024

hello @jradcliff,

I had your same concern and I basically refactored read_csv and build_offline_user_data_job_operations in order to send exactly one request to the api for each row in the csv.
My question is: are you sure that this won't impact the overall match rate?

Thanks
Valerio

from customer-match-upload-script.

jradcliff avatar jradcliff commented on September 18, 2024

Hi @valerioditinno ,

I'm part of the Google Ads API team and one of the owners of its Java library. By sending one UserData with all of the user_identifiers for a specific user, you'll be using the API as intended.

I noticed you said "one request to the api for each row in the csv". Did you actually mean one operation for each row? I ask because if you have thousands of rows in your CSV, it will be more efficient to batch those into multiple operations (one per CSV row) in a single AddOfflineUserDataJobOperationsRequest than to send a separate AddOfflineUserDataJobOperationsRequest for each operation. Just keep the following limit in mind from our guide:

The operations collection for each AddOfflineUserDataJobOperationsRequest can contain at most 100,000 identifiers across all of the UserData objects in the operations. If you need to submit more than 100,000 identifiers for a job, send multiple requests with the same job resource_name.

Thanks,
Josh, Google Ads API Team

from customer-match-upload-script.

valerioditinno avatar valerioditinno commented on September 18, 2024

Yes, you are right I meant operations and sorry I had no idea you were from the Google Ads API team :)

I have a last doubt: why I need to create differents user_identifiers for the same member of the list?

So basically why this:

operations {
  create {
    user_identifiers {
      hashed_email: "<hashed email address here>"
    }
    user_identifiers {
      hashed_phone_number: "<hashed phone number address here>"
    }
    user_identifiers {
      address_info {
        hashed_first_name: "<hashed first name here>"
        hashed_last_name: "<hashed last name here>"
        country_code: "<country code>"
        postal_code: "<postal code>"
      }
    }
  }
}

instead of this:

operations {
  create {
    user_identifiers {
      hashed_email: "<hashed email address here>"
      hashed_phone_number: "<hashed phone number address here>"
      address_info {
        hashed_first_name: "<hashed first name here>"
        hashed_last_name: "<hashed last name here>"
        country_code: "<country code>"
        postal_code: "<postal code>"
      }
    }
  }
}

Thank you for your time

from customer-match-upload-script.

ashishghai avatar ashishghai commented on September 18, 2024

Hi @dliu9999 did you manage to send batch request of 10,000 each and what was the performance for response , please. thank you !!

from customer-match-upload-script.

ashishghai avatar ashishghai commented on September 18, 2024

Hi @valerioditinno ,

I have some questions regarding numbers

  • how many records did you added in one batch ?
  • what was the time to complete a single API request with batch payload ?

This information would help me a lot. Thank you for your time .

Kind regards

from customer-match-upload-script.

ashishghai avatar ashishghai commented on September 18, 2024

Hi @valerioditinno Thank you very much !! it gives idea about latency . you too have a great day

Kind regards,
Ashish

from customer-match-upload-script.

dliu9999 avatar dliu9999 commented on September 18, 2024

from customer-match-upload-script.

RyHoLi avatar RyHoLi commented on September 18, 2024

Hi @jradcliff or @valerioditinno,

How can we update the script so that we can get one UserData with multiple identifiers instead of two separate UserData objects for the same member of the list? I believe this is causing me to get an error that says "Maximum number of user identifiers allowed per request is 100,000 and per operation is 20."

from customer-match-upload-script.

jradcliff avatar jradcliff commented on September 18, 2024

Hi,

If you have multiple user identifiers for the same UserData object, you would call user_data.user_identifiers.append(...) multiple times. For example:

...
customer_data_operations = []
...

// Creates the operation and UserData objects.
user_data_operation = client.get_type('OfflineUserDataJobOperation')
user_data = user_data_operation.create

// Creates and adds identifier for hashed email to the UserData.
email_user_identifier = client.get_type('UserIdentifier')
email_user_identifier.hashed_email = item['hashed_email']
user_data.user_identifiers.append(email_user_identifier)

// Creates and adds identifier for hashed phone to the same UserData.
phone_user_identifier = client.get_type('UserIdentifier') 
phone_user_identifier.hashed_phone_number = item['hashed_phone_number']
user_data.user_identifiers.append(phone_user_identifier)

// Adds the ONE operation for the UserData to the collection.
customer_data_operations.append(user_data_operation)

The error Maximum number of user identifiers allowed per request is 100,000 and per operation is 20 occurs if either of the following conditions holds:

  1. The total number of UserIdentifier objects across all operations in the request exceeds 100,000.
  • In this case, you can just split the operations across multiple requests.
  1. A single operation's UserData contains more than 20 elements in its user_identifiers collection.
  • This usually indicates an implementation error in how you are constructing your UserData objects, as it's unlikely you would have more than 20 identifiers for a single user.

See our blog post at https://ads-developers.googleblog.com/2021/10/userdata-enforcement-in-google-ads-api.html for more details.

Thanks,
Josh, Google Ads API Team

from customer-match-upload-script.

Related Issues (5)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.