microsoft / product-recommendations Goto Github PK
View Code? Open in Web Editor NEWProduct Recommendations solution
License: Other
Product Recommendations solution
License: Other
Hi.
I tried ARM deployment. It's so useful.
By the way, I'd like to integrate multiple solutions on one Azure WebApp, what should I do?
I manage 3 solutions and update each model everyday, these are running on each WebApp.
I'd like to integrate WebApp into one, because the price of WebApp is expensive.
I do not want to mix models in one solution, because these models are using in completely different use cases.
Do you have any ideas?
Thank you.
I would like to know if it is possible to somehow have alternative products, which may be a better alternative, or have a higher markup, to come up in the results of the recommendation. These products may or may not be cold, but need to be prioritized over standard recommendations.
Is this possible somehow using features or would the code here have to be changed.
Hello,
I have been running some tests with a catalog of 50 unique items and a usage file of 200 unique users. After training the model, I can construct a 'user recommendation matrix' from the recommendations (k = 50) for each user. From this, and believing that SAR builds the user affinity matrix in the way that I think it does, I am able to fully reconstruct the item-item similarity matrix.
When I reconstruct the item-item similarity matrix, I find that ALL of the diagonal values are set to zero. This behavior is observed with cooccurrence, jaccard, and lift.
Please tell me if this is the desired behavior, and if so, why. I can not find any reason to set these diagonal elements (essentially a measure of an items frequency) to zero.
Thanks,
Ryan
In the docs, the default similarity function is stated to be Lift. As far as I can tell, it's actually Jaccard.
Attempting to use the deploy to azure link with default values results in an error.
Steps to reproduce:
Result is this error:
{ "code": "DeploymentFailed", "message": "At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/DeployOperations for usage details.", "details": [ { "code": "BadRequest", "message": "{\r\n \"Code\": \"BadRequest\",\r\n \"Message\": \"The parameter 'remoteDebuggingVersion' has an invalid value. Details: Supported Versions: VS2017,VS2019.\",\r\n \"Target\": null,\r\n \"Details\": [\r\n {\r\n \"Message\": \"The parameter 'remoteDebuggingVersion' has an invalid value. Details: Supported Versions: VS2017,VS2019.\"\r\n },\r\n {\r\n \"Code\": \"BadRequest\"\r\n },\r\n {\r\n \"ErrorEntity\": {\r\n \"ExtendedCode\": \"01033\",\r\n \"MessageTemplate\": \"The parameter '{0}' has an invalid value. Details: {1}.\",\r\n \"Parameters\": [\r\n \"remoteDebuggingVersion\",\r\n \"Supported Versions: VS2017,VS2019\"\r\n ],\r\n \"Code\": \"BadRequest\",\r\n \"Message\": \"The parameter 'remoteDebuggingVersion' has an invalid value. Details: Supported Versions: VS2017,VS2019.\"\r\n }\r\n }\r\n ],\r\n \"Innererror\": null\r\n}" } ] }
{
"code": "DeploymentFailed",
"message": "At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/DeployOperations for usage details.",
"details": [
{
"code": "BadRequest",
"message": "{\r\n "Code": "BadRequest",\r\n "Message": "The host name cognitive_servicesahqibdfiimvcews.azurewebsites.net is invalid.",\r\n "Target": null,\r\n "Details": [\r\n {\r\n "Message": "The host name cognitive_servicesahqibdfiimvcews.azurewebsites.net is invalid."\r\n },\r\n {\r\n "Code": "BadRequest"\r\n },\r\n {\r\n "ErrorEntity": {\r\n "ExtendedCode": "04003",\r\n "MessageTemplate": "The host name {0} is invalid.",\r\n "Parameters": [\r\n "cognitive_servicesahqibdfiimvcews.azurewebsites.net"\r\n ],\r\n "Code": "BadRequest",\r\n "Message": "The host name cognitive_servicesahqibdfiimvcews.azurewebsites.net is invalid."\r\n }\r\n }\r\n ],\r\n "Innererror": null\r\n}"
}
]
}
It says "The host name cognitive_servicesahqibdfiimvcews.azurewebsites.net is invalid".
Pls help !!
Hi, is this solution able to improve an existing model without recreating a new model each time?
I was just wondering if the swagger PUT call to models was to retrain it? It appears to only exist for default, so it is perhaps to set it as default?
If not, what is the correct way to update a model?
I have searched all of the documentation for this answer, but sorry if I missed it.
Thank you,
I am developing a script to add new products to my catalog file in my Azure blob container.
The blob type for my catalog file is an Append Blob, because in C# it supports appending new lines to an existing file. This is much simpler than using the standard block blob's write operation which overwrites the previous entries.
When I try to train my model with my catalog file as type append blob, I get the error
'System.Exception: Failed downloading training files from storage. Model id: 198d7cdf-818c-4fd5-a8b3-c8c636cdabd8 ---> Microsoft.WindowsAzure.Storage.StorageException: Blob type of the blob reference doesn't match blob type of the blob. ---> System.InvalidOperationException: Blob type of the blob reference doesn't match blob type of the blob.
at Microsoft.WindowsAzure.Storage.Blob.CloudBlob.UpdateAfterFetchAttributes(BlobAttributes blobAttributes, HttpWebResponse response)
at Microsoft.WindowsAzure.Storage.Blob.CloudBlob
Is there a chance append blob's will be supported in the future?
In catalog file schema ,,[,], I think the item category has only one value.
so what is the best practice when the item has many categories?
https://www.nuget.org/packages/Microsoft.MachineLearning.TLCRecommendations/ seems unlisted?
Also, Microsoft.MachineLearning.Recommend.Sar.Sar seems not available in ml.net (I am trying for .net6 )...
Any thoughts on where the above name space relocated in ml.net ?
Any guidance would be very much helpful.
Any Plans to use ML.Net to implement SAR
I have cloned this repo locally.
I created a Application Service in our Azure account, and set up my local solution to deploy to my new App Service. (In order to do this I needed to upgrade the WindowsAzure.Storage
reference to 9.x; not sure if that's relevant.)
Now when I go to the app homepage, I get a Yellow Screen of Death:
[BadImageFormatException: Could not load file or assembly 'ManagedBlingSigned' or one of its dependencies. An attempt was made to load a program with an incorrect format.]
System.Reflection.RuntimeAssembly._nLoad(AssemblyName fileName, String codeBase, Evidence assemblySecurity, RuntimeAssembly locationHint, StackCrawlMark& stackMark, IntPtr pPrivHostBinder, Boolean throwOnFileNotFound, Boolean forIntrospection, Boolean suppressSecurityChecks) +0
System.Reflection.RuntimeAssembly.nLoad(AssemblyName fileName, String codeBase, Evidence assemblySecurity, RuntimeAssembly locationHint, StackCrawlMark& stackMark, IntPtr pPrivHostBinder, Boolean throwOnFileNotFound, Boolean forIntrospection, Boolean suppressSecurityChecks) +36
System.Reflection.RuntimeAssembly.InternalLoadAssemblyName(AssemblyName assemblyRef, Evidence assemblySecurity, RuntimeAssembly reqAssembly, StackCrawlMark& stackMark, IntPtr pPrivHostBinder, Boolean throwOnFileNotFound, Boolean forIntrospection, Boolean suppressSecurityChecks) +152
System.Reflection.RuntimeAssembly.InternalLoad(String assemblyString, Evidence assemblySecurity, StackCrawlMark& stackMark, IntPtr pPrivHostBinder, Boolean forIntrospection) +77
System.Reflection.RuntimeAssembly.InternalLoad(String assemblyString, Evidence assemblySecurity, StackCrawlMark& stackMark, Boolean forIntrospection) +21
System.Reflection.Assembly.Load(String assemblyString) +28
System.Web.Configuration.CompilationSection.LoadAssemblyHelper(String assemblyName, Boolean starDirective) +38
[ConfigurationErrorsException: Could not load file or assembly 'ManagedBlingSigned' or one of its dependencies. An attempt was made to load a program with an incorrect format.]
System.Web.Configuration.CompilationSection.LoadAssemblyHelper(String assemblyName, Boolean starDirective) +738
System.Web.Configuration.CompilationSection.LoadAllAssembliesFromAppDomainBinDirectory() +217
System.Web.Configuration.CompilationSection.LoadAssembly(AssemblyInfo ai) +130
System.Web.Compilation.BuildManager.GetReferencedAssemblies(CompilationSection compConfig) +170
System.Web.Compilation.BuildManager.GetPreStartInitMethodsFromReferencedAssemblies() +92
System.Web.Compilation.BuildManager.CallPreStartInitMethods(String preStartInitListPath, Boolean& isRefAssemblyLoaded) +290
System.Web.Compilation.BuildManager.ExecutePreAppStart() +157
System.Web.Hosting.HostingEnvironment.Initialize(ApplicationManager appManager, IApplicationHost appHost, IConfigMapPathFactory configMapPathFactory, HostingEnvironmentParameters hostingParameters, PolicyLevel policyLevel, Exception appDomainCreationException) +549
[HttpException (0x80004005): Could not load file or assembly 'ManagedBlingSigned' or one of its dependencies. An attempt was made to load a program with an incorrect format.]
System.Web.HttpRuntime.FirstRequestInit(HttpContext context) +10075108
System.Web.HttpRuntime.EnsureFirstRequestInit(HttpContext context) +95
System.Web.HttpRuntime.ProcessRequestNotificationPrivate(IIS7WorkerRequest wr, HttpContext context) +254
Help, please?
Am I thinking this right with our id's (bold) and features for product catalog? Our categories have id's, like image below
I think I will make one model per market (Sweden, Germany, etc). They each have their own translations for what the categories are translated to from the SD1 etc naming.
AB2065138, Blue Casual Tight fit Medium Armani Blazer,SD1M5S6,, Context=Casual, Gender=Male, Brand=Armani, Article Standard Colour=Blue, Size=Medium, Seasonality=Summer, Fabric appearence=Velvet
Another one in clothing would be
2b324086-85ba-4d68-b83d-8dae68db39f0, White Loose fit Small Adidas T-shirt,SD1M10S1,, Context=Casual, Gender=Women, Brand=Adidas, Article Standard Colour=White, Size=Small, Seasonality=All year, Fabric appearence=Textile
But then how do we do Accessories? Is that a different model?
b68bd680-5cc2-4220-98b6-be743943006c, S.c Iphone X/xs Silicone Black,SD2M20S1,, Gender=Unisex, Brand=The Case Factory, Article Standard Colour=Black, Size=Small
Some product attributes does not exist for Accessories the same way they do for Clothing.
Hi, I've been training the model succesfully with different sources of data (belonging to different clients) for the past weeks. But last week something strange happened.
On March 5th I successfully trained a model with 10.5 million lines in the usage files using a B3 instance (with 7GB of RAM). The model returns this info:
{
"id": "xxxxxxxx",
"description": "Client 1 recommendations model",
"creationTime": "2018-03-05T14:26:26.3215001Z",
"modelStatus": "Completed",
"modelStatusMessage": "Model Training Completed Successfully",
"parameters": {
"blobContainerName": "data-client1",
"catalogFileRelativePath": "3years/catalogue.csv",
"usageRelativePath": "3years/usage/3",
"supportThreshold": 4,
"cooccurrenceUnit": "User",
"similarityFunction": "Jaccard",
"enableColdItemPlacement": true,
"enableColdToColdRecommendations": true,
"enableUserAffinity": true,
"enableUserToItemRecommendations": true,
"allowSeedItemsInRecommendations": true,
"enableBackfilling": true,
"decayPeriodInDays": 60
},
"statistics": {
"totalDuration": "01:07:01.0929075",
"trainingDuration": "00:57:34.3260296",
"storingUserHistoryDuration": "00:57:13.0521714",
"catalogParsing": {
"duration": "00:00:13.3761182",
"successfulLinesCount": 104175,
"totalLinesCount": 104175
},
"usageEventsParsing": {
"duration": "00:09:13.3899645",
"errors": [
{
"count": 36,
"error": "UnknownItemId",
"sample": {
"file": "3years/usage/usage_part00.csv",
"line": 142809
}
}
],
"successfulLinesCount": 10499964,
"totalLinesCount": 10500000
},
"numberOfCatalogItems": 104175,
"numberOfUsageItems": 72983,
"numberOfUsers": 2967496,
"catalogCoverage": 0.70058075353971683,
"catalogFeatureWeights": {
"1": -0.0009318547,
"2": 0.01265376,
"3": 0.04137097,
"marca": 0.003024425
}
}
}
Since then I've been trying to train a model using the same parameters but with different data. This data belongs to the same client and both datasets are in fact a sample of the same larger data pool. The thing is that I can't seem to replicate the number of usage lines that worked in the previous example.
And even more, today I tried to replicate the model that successfully trained, but it came away with the same error. This is the output message when getting information about the model via its REST API:
{
"id": "yyyyyyyyy",
"description": "Client 1 recommendations 11 MM",
"creationTime": "2018-03-12T12:13:12.571664Z",
"modelStatus": "Failed",
"modelStatusMessage": "Core Training",
"parameters": {
"blobContainerName": "data-client1",
"catalogFileRelativePath": "3years/catalogue.csv",
"usageRelativePath": "3years/usage/1",
"supportThreshold": 4,
"cooccurrenceUnit": "User",
"similarityFunction": "Jaccard",
"enableColdItemPlacement": true,
"enableColdToColdRecommendations": true,
"enableUserAffinity": true,
"enableUserToItemRecommendations": true,
"allowSeedItemsInRecommendations": true,
"enableBackfilling": true,
"decayPeriodInDays": 60
}
}
What could be the problem here? It is the exact same configuration and data, but it fails!
Any help would be really appreciated
Hi,
I am trying to train a model using rest api's call but its taking lot of time though size of usage file is 170 MB.
Request is something like this
headers
content-type : application/json
x-api-key:*******
Body
{
"description": "Test_RESTAPIBuild",
"blobContainerName": "input",
"catalogFileRelativePath": "Catalog/catalog_sample.csv",
"usageRelativePath": "Usage/usage_sample.csv",
"evaluationUsageRelativePath": "Usage",
"supportThreshold": 5,
"cooccurrenceUnit": "0",
"similarityFunction": "Jaccard",
"enableColdItemPlacement": true,
"enableColdToColdRecommendations": true,
"enableUserAffinity": true,
"enableUserToItemRecommendations": true,
"allowSeedItemsInRecommendations": true,
"enableBackfilling": true,
"decayPeriodInDays": 5
}
what exactly is issue ? I remember when I did through c# application it got trained in seconds.
Response I am getting is
{
"id": "XXXXXXX-d000-452a-947e-XXXXXXXXXXX",
"description": "Test_RESTAPIBuild",
"creationTime": "2018-01-10T10:31:36.0281983Z",
"modelStatus": "InProgress"
}
Hi,
I'm not sure if I missed out any point but think below text in /doc/sar.md page needs some corrections/revisions:
################## SAR DOC
Note that the recommendation score of an item is purely based on its similarity to Item 5 in this case (??seems like duplication to some texts below). Assuming that a same item is not recommended again, items 1 and 4 have the highest score (Did you mean items 4 and 5? User already seen item 1) and would be recommended before items 2 and 3 (??).
Now, if this user adds Item 2 (Was this meant to be item 5?) to the shopping cart, affinity vector (assuming weight 2 (weight 1?) for this transaction) will be
New User aff
Item 1 0
Item 2 0
Item 3 0
Item 4 0
Item 5 1
resulting in recommendation scores:
New User rec
Item 1 2
Item 2 1
Item 3 1
Item 4 2
Item 5 3
Note that the recommendation score of an item is purely based on its similarity to Item 5 in this case. Assuming that a same item is not recommended again, items 1 and 4 have the highest score and would be recommended before items 2 and 3. Now, if this user adds Item 2 to the shopping cart, affinity vector (assuming weight 2 for this transaction) will be
New User aff
Item 1 0
Item 2 2
Item 3 0
Item 4 0
Item 5 1
resulting in recommendation scores:
################################
Thanks,
For the evaluation, the recommendation is user based, right? Is it evaluated line by line in the evaluation file or first group the items by timestamp/user id and match with the recommendation results?
I have a dataset that has items fewer than 100, customers over 1M. I'm wondering whether I can access all the output for the evaluation file?
Thank you so much!
I am trying to train the existing model based on the usage file. There is no api call to train the existing model.....can you please suggest any thing
Training over large data-sets takes quite a while, even on the highest deployment plan. Is there a way to incrementally train the SAR model as new usage data becomes available?
Hi,
I've got the system working with upto 7GB of training data. But as soon as I try 10GB the program falls over with the following after approx 4 hours.
Any ideas ?
Framework Version: v4.0.30319
Description: The process was terminated due to an unhandled exception.
Exception Info: System.IO.IOException
at System.IO.MemoryStream.Write(Byte[], Int32, Int32)
at Microsoft.MachineLearning.Data.IO.BinarySaver.WriteWorker(System.IO.Stream, System.Collections.Concurrent.BlockingCollection`1, ColumnCodec[], Microsoft.MachineLearning.Data.ISchema, Int32, Microsoft.MachineLearning.IChannelProvider, Microsoft.MachineLearning.Internal.Utilities.ExceptionMarshaller)
Exception Info: System.InvalidOperationException
at Microsoft.MachineLearning.Internal.Utilities.ExceptionMarshaller.ThrowIfSet(Microsoft.MachineLearning.IExceptionContext)
at Microsoft.MachineLearning.Data.IO.BinarySaver.SaveData(System.IO.Stream, Microsoft.MachineLearning.Data.IDataView, Int32[])
at Microsoft.MachineLearning.Recommend.ItemSimilarity.SimilarityMatrix.Save(Microsoft.MachineLearning.Model.ModelSaveContext)
at Microsoft.MachineLearning.Model.ModelSaveContext.SaveModel[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]](Microsoft.MachineLearning.Model.RepositoryWriter, System.__Canon, System.String)
at Microsoft.MachineLearning.Recommend.Sar.SarPredictor.Save(Microsoft.MachineLearning.Model.ModelSaveContext)
at Microsoft.MachineLearning.Model.ModelSaveContext.SaveModel[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]](Microsoft.MachineLearning.Model.RepositoryWriter, System.__Canon, System.String)
at Microsoft.MachineLearning.Data.TrainUtils.SaveModel(Microsoft.MachineLearning.IHostEnvironment, Microsoft.MachineLearning.IChannel, System.IO.Stream, Microsoft.MachineLearning.IPredictor, Microsoft.MachineLearning.Data.RoleMappedData, System.String)
at Microsoft.MachineLearning.EntryPoints.PredictorModel.Save(Microsoft.MachineLearning.IHostEnvironment, System.IO.Stream)
at Recommendations.Core.Train.TrainedModel.GetObjectData(System.Runtime.Serialization.SerializationInfo, System.Runtime.Serialization.StreamingContext)
at System.Runtime.Serialization.Formatters.Binary.WriteObjectInfo.InitSerialize(System.Object, System.Runtime.Serialization.ISurrogateSelector, System.Runtime.Serialization.StreamingContext, System.Runtime.Serialization.Formatters.Binary.SerObjectInfoInit, System.Runtime.Serialization.IFormatterConverter, System.Runtime.Serialization.Formatters.Binary.ObjectWriter, System.Runtime.Serialization.SerializationBinder)
at System.Runtime.Serialization.Formatters.Binary.WriteObjectInfo.Serialize(System.Object, System.Runtime.Serialization.ISurrogateSelector, System.Runtime.Serialization.StreamingContext, System.Runtime.Serialization.Formatters.Binary.SerObjectInfoInit, System.Runtime.Serialization.IFormatterConverter, System.Runtime.Serialization.Formatters.Binary.ObjectWriter, System.Runtime.Serialization.SerializationBinder)
at System.Runtime.Serialization.Formatters.Binary.ObjectWriter.Serialize(System.Object, System.Runtime.Remoting.Messaging.Header[], System.Runtime.Serialization.Formatters.Binary.__BinaryWriter, Boolean)
at System.Runtime.Serialization.Formatters.Binary.BinaryFormatter.Serialize(System.IO.Stream, System.Object, System.Runtime.Remoting.Messaging.Header[], Boolean)
at System.Runtime.Serialization.Formatters.Binary.BinaryFormatter.Serialize(System.IO.Stream, System.Object)
at Recommendations.Common.ModelsProvider.SerializeTrainedModel(Recommendations.Core.ITrainedModel, System.IO.Stream, System.Guid)
Exception Info: System.Exception
at Recommendations.Common.ModelsProvider.SerializeTrainedModel(Recommendations.Core.ITrainedModel, System.IO.Stream, System.Guid)
at Recommendations.Common.ModelsProvider+d__1.MoveNext()
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task)
at System.Runtime.CompilerServices.TaskAwaiter1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].GetResult() at NativeTrainer.WebJobLogic+<TrainModelAsync>d__3.MoveNext() at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task) at System.Runtime.CompilerServices.TaskAwaiter
1[[System.Boolean, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].GetResult()
at NativeTrainer.WebJobLogic+d__1.MoveNext()
Exception Info: System.AggregateException
at NativeTrainer.Program.Main(System.String[])
TLCRecommendations.dll in "Microsoft.MachineLearning.TLCRecommendations:"3.8.51.1475" nuget consumed by this template project is delay signed. It won't load unless we skip the strong name validation for Microsoft public used for partially signing.
Can I change User Id from int to string? We have a system keeping track of users today but the identifier is of type string.
Same with Product. Our identifier of a product is a guid, is it possible to change the int productId to a string and how would I go about to do that?
I already have existing Resources groups, and existing one Azure App Service.
I want to deploy this solution and connect to existing Azure services (existing resource group, Existing App Service, existing storage account). But this creates a new blank. I am not able to change Web App to connect to existing Azure App Service later, (same region etc.) because of new limitation.
please make it easier to deploy separately, and to connect to existing azure resources, rather then creating a new ones.
Hi,
Are there any ways to filter and boost recommendations according to particular properties of items such as view count, non-linear boundaried genres etc? Also, i would like to know the future roadmap of this project as envisioned internally in Microsoft.
Thank you.
I have put a usage file for evaluation, which contains 500,000 lines. On diversity evaluation scores, I see that only a total of 160,000 items were recommended (600 unique items). Why the total number is fewer than the lines in the evaluation file? I thought we should recommend several items for each line in the usage file? or this is by customer?
I am creating the model using a dateset having 9K transaction records, with almost 8k unique users, and total
number of products is. 1100
I was getting a very low score for recommended items. And, it is understandable as very low co-occurence
See below:
Then, When I change the similarity function from "Jaccard" to "Cooccurrence" then my score is just bumped up.
See below:
Does anyone know why ?
Can I deploy this solution on an on-premise server, not on azure cloud?
How can I do that?
Thanks
Is there a chance this solution will ever be rewritten to .NET Core?
This would open new opportunities of hosting it in Azure.
Alternatively - perhaps there is some new similar project in .NET Core worth considering?
Hi All!
I am trying to find a solution for the following:
I have transactional data connected to users (loyalty card) from supermarkets. Roughly 30.000 items and 100.000 users with over a million transactional data.
Now, to train the model and get random recommendations is not a problem, the solution is great for that, but I cannot get my head around how to get individual scores for user & item combinations.
Basically, I have a weekly catalogue of around 50 items on sale. I would like to display these "coupons" in descending order in our mobile application where the user is signed in, based on his Recommendations Score for each item.
Is there a way to input user id & product id and get the score?
Thanks in advance!
Mark
There are important files that Microsoft projects should all have that are not present in this repository. A pull request has been opened to add the missing file(s). When the pr is merged this issue will be closed automatically.
Microsoft teams can learn more about this effort and share feedback within the open source guidance available internally.
Is there any way to ask for recommendations for first time user who has express interest in a specific category?
We recently tried to deploy this solution with a partner, using the sample C# console app client.
We faced several issues massaging the partner's data into the correct format for the SAR service and found the error messages from the service not extremely helpful at tracking down data issues. For example, the input data looks like CSV but doesn't seem to support standard CSV-encoded commas in columns.
As such, it would be super valuable to have a little tool that verifies that the input data is in the correct format for the SAR service so that data issues can be precisely located and fixed instead of trying to run the model and having to wait for a while until the bad data is hit which greatly slows down iteration speed.
There appears to be zero traffic on this repository, nobody posting bugs, nobody asking or answering questions.
Does anyone in the real world use this product?
Or has Microsoft abandoned this field to AWS and Google?
How do you define the co-occurrences of items? The items with the same timestamps in the usage file are identified in the same transaction, or the within certain time range?
Hi,
Is there to filter out recomendedItemId field by some constraint?
for example.
only return recommendedItemId results ending with character w
&recommendedItemId=.*w$
Hi,
Let me know if you have NodeJS implementation of this sample program. Also want to know the availability of any libraries/documentation in NodeJS for the Product Recommendation service.
I tried to use the catalog.xlsx file and the three csv's with timestamps in one folder but got errors. Catalog
Usage
Postman request
Postman response
{
"id": "922792be-1add-485f-af51-07513ad8cb88",
"description": "Simple recommendations model",
"creationTime": "2020-07-04T21:43:03.9316674Z",
"modelStatus": "Failed",
"modelStatusMessage": "Failed to parse catalog file or parsing found no valid items",
"parameters": {
"blobContainerName": "trainingdata",
"catalogFileRelativePath": "catalogs/catalog.xlsx",
"usageRelativePath": "usage2",
"supportThreshold": 6,
"cooccurrenceUnit": "User",
"similarityFunction": "Jaccard",
"enableColdItemPlacement": true,
"enableColdToColdRecommendations": false,
"enableUserAffinity": true,
"enableUserToItemRecommendations": false,
"allowSeedItemsInRecommendations": true,
"enableBackfilling": true,
"decayPeriodInDays": 30
},
"statistics": {
"totalDuration": "00:00:00",
"trainingDuration": "00:00:00",
"catalogParsing": {
"duration": "00:00:00.0295404",
"errors": [
{
"count": 65,
"error": "MissingFields",
"sample": {
"file": "catalogs/catalog.xlsx",
"line": 1
}
},
{
"count": 34,
"error": "IllegalCharactersInItemId",
"sample": {
"file": "catalogs/catalog.xlsx",
"line": 5
}
},
{
"count": 1,
"error": "ItemIdTooLong",
"sample": {
"file": "catalogs/catalog.xlsx",
"line": 14
}
},
{
"count": 1,
"error": "MalformedLine",
"sample": {
"file": "catalogs/catalog.xlsx",
"line": 53
}
}
],
"successfulLinesCount": 0,
"totalLinesCount": 101
},
"numberOfCatalogItems": 0,
"numberOfUsageItems": 0,
"numberOfUsers": 0
}
}
Then I tried the other sample data.
Postman request
This succeeded to train but..
When asking for recommendations I get a bit too high numbers, something seems wrong here. What is the latest tested files and what setup is used for testing?
/api/models/fa2dd4a3-c783-4c0f-8a45-d801a2ee746b/recommend?itemId=2005018
Hi I read the ModelTrainingParameters.cs about the parameters setup and definetion.
Out of my limited study, I tend to think or suggest that the Co-Occurrence unit is more referring to a levelling:
User Level - measure co-occurence of items in same user (regardless of time/date)
TimeDate Level - on top of same user, it further measure whether it is same date/time (for the same user)
If my understanding, based on the /// comments as given, then the wording or term "level" would be more appropriate, as "Co-occurence Unit" as standalone may refer to:
User - items bought by same user (regardless of time/date)
TimeStamp - items bought within the same time stamp (mostly it is by the same user, but if there are multiple check out machine / POS.... they may come from different users
So there would be a subtle / potential difference and this also puzzle beginners to this recommender system (esp the use of term here slightly deviate from literature elsewhere (co-occurrence refer more to products/items matrix or dimension *.
*It may be such as case at the implementation internal of the software that such as matrix is established (Product Co-Ocurrence Matrix for same user or same time stamp), however the term presented as such cause confusion for serious learner.
Hence the wording: Co-occurrence Level or Focus is suggested (especially the word Unit may also refer to quantity).
We implement product recommendations solutions based on this project on many websites for our customers.
We've encountered websites which use different formats as their item IDs. Some use simple numbers but these strings could also be valid item IDs in some systems:
J4Z18-JSOM100-27M+31S+36S
[S4Z17-TSDLF701] TSDLF701 - kobalt
g.2017.12.x.xmas-gift-1
head&shoulders_2014_M7_G1
Charmine Rose_2015_M10_G2
It's not up to us to decide whether these formats are good or not - it's just how the real world looks like.
Unfortunately, it is impossible to use these strings as item identifiers in Product-Recommendations, because it imposes character restrictions on item ID in catalog file. Allowed characters are letters, numbers. dash and underscore. There is a piece of code in CatalogFileParser.cs that makes the check:
// check for illegal characters in the item id
if (!itemId.All(UsageEventsFilesParser.IsAlphanumericDashOrUnderscoreCharacter))
{
parsingError = ParsingErrorReason.IllegalCharactersInItemId;
return null;
}
The same check is performed when parsing usage events file.
My question is: is it really necessary to be so restrictive when it comes to product ID? Why not just allow any string? We've seen all kinds of special characters, spaces and even national characters in IDs.
Right now we would need to create some kind of ID translator and meticulously convert IDs back and forth every time we feed or retrieve data from Product-Recommendations.
Hi,
I'm the system architect for a company that produces a literacy product for students reading in schools. The product is basically an e-reader, in which the students can select and read books in class, and we've basically gamified the experience while collecting a ton of data points.
The problem we're trying to solve now is that students are spending too much time browsing through the library of available books; we want to minimize their browsing time and maximize their reading time.
So we figure Product Recommendations might be the way to go. A few questions, though:
TIA for your help!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.