Comments (13)
The identifier
field of a UserIdentifier
is a union (oneof
) field:
https://developers.google.com/google-ads/api/reference/rpc/v10/UserIdentifier#identifier
That means that only one of its fields can be set on a given instance of UserIdentifier
. Here's the proto definition that may clarify things (note the "Exactly one must be specified" comment):
// Exactly one must be specified. For OfflineUserDataJobService, Customer
// Match accepts hashed_email, hashed_phone_number, mobile_id,
// third_party_user_id, and address_info; Store Sales accepts hashed_email,
// hashed_phone_number, third_party_user_id, and address_info.
// ConversionUploadService accepts hashed_email and hashed_phone_number.
// ConversionAdjustmentUploadService accepts hashed_email,
// hashed_phone_number, and address_info.
oneof identifier {
// Hashed email address using SHA-256 hash function after normalization.
// Accepted for Customer Match, Store Sales, ConversionUploadService, and
// ConversionAdjustmentUploadService.
string hashed_email = 7;
// Hashed phone number using SHA-256 hash function after normalization
// (E164 standard). Accepted for Customer Match, Store Sales,
// ConversionUploadService, and ConversionAdjustmentUploadService.
string hashed_phone_number = 8;
// Mobile device ID (advertising ID/IDFA). Accepted only for Customer Match.
string mobile_id = 9;
// Advertiser-assigned user ID for Customer Match upload, or
// third-party-assigned user ID for Store Sales. Accepted only for Customer
// Match and Store Sales.
string third_party_user_id = 10;
// Address information. Accepted only for Customer Match, Store Sales, and
// ConversionAdjustmentUploadService.
OfflineUserAddressInfo address_info = 5;
}
What this means in practice is that your 2nd example above is not possible. If you have an instance of UserIdentifier
and set its hashed_email
and then set its hashed_phone_number
, the hashed_email
will no longer be set on that instance -- it will only have the last field you set in the oneof
, which would be hashed_phone_number
.
Thanks,
Josh, Google Ads API Team
from customer-match-upload-script.
Hi @ashishghai, sorry for the late response.
I send the data to the API using an Apache Beam pipeline (Python SDK) that use BatchElements PTransform: this transform attempts to optimize the batch size based on total elements in the PCollection, so I do not have a fixed one.
For 15k elements the pipeline creates 3 batch and it takes 7 min and 20 sec to complete (batch are processed sequentially)
Hope it helps a little
Have a nice day
Valerio
from customer-match-upload-script.
Does this change significantly impact performance at all? I'm trying to upload about 1.4million rows, each row with [Email, Phone, FirstName, LastName, CountryCode, ZipCode], with the existing script, but build_offline_user_data_job_operations
took over 1.5 hrs before I killed it.
Do you have any other suggestions?
EDIT: I just implemented your second version, and it's faster. But doing the math, it took 5 seconds for 10 rows. Meaning it'll take over 8 days for all 1.4million rows...
from customer-match-upload-script.
Hi,
I would caution against extrapolating job run times from such a small sample, as there's overhead involved in every job and jobs are queued up before they run. See the first bullet point in our guide for expected job run times.
Even if using a separate UserData
with multiple identifiers for each distinct member results in longer processing times, you should still use that approach since that's the correct usage.
Thanks,
Josh
from customer-match-upload-script.
hello @jradcliff,
I had your same concern and I basically refactored read_csv and build_offline_user_data_job_operations in order to send exactly one request to the api for each row in the csv.
My question is: are you sure that this won't impact the overall match rate?
Thanks
Valerio
from customer-match-upload-script.
Hi @valerioditinno ,
I'm part of the Google Ads API team and one of the owners of its Java library. By sending one UserData
with all of the user_identifiers
for a specific user, you'll be using the API as intended.
I noticed you said "one request to the api for each row in the csv". Did you actually mean one operation for each row? I ask because if you have thousands of rows in your CSV, it will be more efficient to batch those into multiple operations
(one per CSV row) in a single AddOfflineUserDataJobOperationsRequest than to send a separate AddOfflineUserDataJobOperationsRequest
for each operation. Just keep the following limit in mind from our guide:
The operations collection for each AddOfflineUserDataJobOperationsRequest can contain at most 100,000 identifiers across all of the UserData objects in the operations. If you need to submit more than 100,000 identifiers for a job, send multiple requests with the same job resource_name.
Thanks,
Josh, Google Ads API Team
from customer-match-upload-script.
Yes, you are right I meant operations and sorry I had no idea you were from the Google Ads API team :)
I have a last doubt: why I need to create differents user_identifiers for the same member of the list?
So basically why this:
operations {
create {
user_identifiers {
hashed_email: "<hashed email address here>"
}
user_identifiers {
hashed_phone_number: "<hashed phone number address here>"
}
user_identifiers {
address_info {
hashed_first_name: "<hashed first name here>"
hashed_last_name: "<hashed last name here>"
country_code: "<country code>"
postal_code: "<postal code>"
}
}
}
}
instead of this:
operations {
create {
user_identifiers {
hashed_email: "<hashed email address here>"
hashed_phone_number: "<hashed phone number address here>"
address_info {
hashed_first_name: "<hashed first name here>"
hashed_last_name: "<hashed last name here>"
country_code: "<country code>"
postal_code: "<postal code>"
}
}
}
}
Thank you for your time
from customer-match-upload-script.
Hi @dliu9999 did you manage to send batch request of 10,000 each and what was the performance for response , please. thank you !!
from customer-match-upload-script.
Hi @valerioditinno ,
I have some questions regarding numbers
- how many records did you added in one batch ?
- what was the time to complete a single API request with batch payload ?
This information would help me a lot. Thank you for your time .
Kind regards
from customer-match-upload-script.
Hi @valerioditinno Thank you very much !! it gives idea about latency . you too have a great day
Kind regards,
Ashish
from customer-match-upload-script.
from customer-match-upload-script.
Hi @jradcliff or @valerioditinno,
How can we update the script so that we can get one UserData with multiple identifiers instead of two separate UserData objects for the same member of the list? I believe this is causing me to get an error that says "Maximum number of user identifiers allowed per request is 100,000 and per operation is 20."
from customer-match-upload-script.
Hi,
If you have multiple user identifiers for the same UserData object, you would call user_data.user_identifiers.append(...)
multiple times. For example:
...
customer_data_operations = []
...
// Creates the operation and UserData objects.
user_data_operation = client.get_type('OfflineUserDataJobOperation')
user_data = user_data_operation.create
// Creates and adds identifier for hashed email to the UserData.
email_user_identifier = client.get_type('UserIdentifier')
email_user_identifier.hashed_email = item['hashed_email']
user_data.user_identifiers.append(email_user_identifier)
// Creates and adds identifier for hashed phone to the same UserData.
phone_user_identifier = client.get_type('UserIdentifier')
phone_user_identifier.hashed_phone_number = item['hashed_phone_number']
user_data.user_identifiers.append(phone_user_identifier)
// Adds the ONE operation for the UserData to the collection.
customer_data_operations.append(user_data_operation)
The error Maximum number of user identifiers allowed per request is 100,000 and per operation is 20
occurs if either of the following conditions holds:
- The total number of
UserIdentifier
objects across all operations in the request exceeds 100,000.
- In this case, you can just split the operations across multiple requests.
- A single operation's
UserData
contains more than 20 elements in itsuser_identifiers
collection.
- This usually indicates an implementation error in how you are constructing your
UserData
objects, as it's unlikely you would have more than 20 identifiers for a single user.
See our blog post at https://ads-developers.googleblog.com/2021/10/userdata-enforcement-in-google-ads-api.html for more details.
Thanks,
Josh, Google Ads API Team
from customer-match-upload-script.
Related Issues (5)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from customer-match-upload-script.