- •2. Launch Visual Studio 2013 as administrator.
- •Implementing a product recommendation solution using Microsoft Azure Overview Problem statement
- •Solution
- •Solution architecture
- •Setting up the Azure services Prerequisites
- •Deploying the Azure landscape using PowerShell
- •Publishing the retailer demo website using Visual Studio
- •Testing the product recommendation site
- •Verifying execution of the Azure Data Factory workflow
- •Viewing product-to-product recommendations
- •Deep dive into the solution
- •Prepare
- •Analyze
- •Publish
- •Consume
- •Next steps
- •Useful resources
- •Roll back Azure changes
- •Terms of use
Viewing product-to-product recommendations
Once the ADF workflow has executed successfully, you will be able to see products that are similar or complementary on the demo retailer website.
1. To verify execution of the ADF workflow, go to your ProductRecommendations data factory in http://portal.azure.com.
2. Verify that all slices corresponding to the last produced dataset in the workflow (ProductsSimilaritySQLTable) have executed successfully. (The Ready checkmark will show next to the updated slice.)
Note: It can take anywhere from 15 minutes to an hour for all the slices in this output dataset to show as Ready. Keep monitoring the status during this time.
3. View the product-to-product recommendations on the demo retailer website. Here, it’s the Contoso Music Store (http://<azurewebsitename>.azurewebsites.net).
4. To see artists who are similar to the artist “1 Giant Leap,” click 1 Giant Leap. This will display all other artists whom site visitors have listened to while also listening to 1 Giant Leap.
The recommendations screen based on 1 Giant Leap is as follows.
You can also see artists who are similar to the artist “50 Cent.” Click 50 Cent to display all other artists whom site visitors have listened to while also listening to 50 Cent.
Deep dive into the solution
This section provides details corresponding to stages depicted in Figure 2, the solution architecture for product recommendations. While there are six stages, this section focuses on the last five:
1. Data sources: The raw data sources for this use case are the web log files containing information about customer behaviors on a retailer website.
2. Ingest: Use Azure Data Factory (ADF) to ingest the customer usage web logs data, product catalog, and customer information into Azure Blob storage.
3. Prepare: Use HDInsight to partition the web logs data using Hive, and then create input data for the machine learning recommendation model.
4. Analyze: Use the machine learning recommendation model (Mahout) to generate an item similarity matrix and a personalized recommendation matrix, and to provide recommendations to customers.
5. Publish: Use ADF to copy the results to a relational store (SQL Azure) for consumption by the retailer website.
6. Consume: Consume the personalized recommendations on the retailer website.
Ingest
The sample dataset for this use case corresponds to a large online music store and includes these elements:
Note: The dataset being used is publicly available.
1. Customer information
a. Customer information, along with the region
b. Columns: ID, Name, Region
2. Product catalog
a. Product catalog information, including the product name and pricing
b. Columns: ID, Name, isPlayed
3. Customer usage web logs
a. Clickstream data in the form of web log files corresponding to shoppers’ current and historical behavior on the retailer website
b. Columns: CustomerID, ProductID, SessionDate
For the use case, we are using PrepareSampleDataPipeline in the ADF workflow as the first step to generate sample customer usage web logs. This pipeline prepares the data for consumption by the product recommendation workflow. It mimics the web logs generated every day by a large online retailer, including the site’s customer and product information. The data sources can be on-premises, SaaS, or cloud, and the volume of web logs can be around 100 GB per week for a large online retailer.
The pipeline is composed of an ADF custom .NET activity that generates the timestamp and customer usage data—CustomerId, ProductId (the artist ID in our case), isPlayed (Artist Played 1/0)—and uploads it to an Azure Blob location.
Note: A year's worth of customer usage data is being generated in the data preparation step.
Once the pipeline is executed successfully, you will see the following folders (in the screenshot below) generated in your storage account under the productrec container. (See the demo/productrec-accounts.txt file for logon details.) Each folder holds a text file containing sample data generated as part of the data preparation process, namely rawusageevents/artist_customer_data.txt.
