- •2. Launch Visual Studio 2013 as administrator.
- •Implementing a product recommendation solution using Microsoft Azure Overview Problem statement
- •Solution
- •Solution architecture
- •Setting up the Azure services Prerequisites
- •Deploying the Azure landscape using PowerShell
- •Publishing the retailer demo website using Visual Studio
- •Testing the product recommendation site
- •Verifying execution of the Azure Data Factory workflow
- •Viewing product-to-product recommendations
- •Deep dive into the solution
- •Prepare
- •Analyze
- •Publish
- •Consume
- •Next steps
- •Useful resources
- •Roll back Azure changes
- •Terms of use
Solution architecture
The following sections reference the six stages outlined in Figure 2.
Figure 2: Solution architecture for personalized product recommendations
Data sources: The raw data sources for this use case are the web log files containing information about customer behaviors on a retailer website. In Figure 2 above, we assume that the retailer website produces the customer usage web log files (customer browsing behaviors, buying patterns, and so on). The store also maintains a product catalog along with the customer information.
Ingest: In this stage, the customer usage web log files, along with customer and product catalog information, is “ingested” on a regular basis (for example, daily) into a Microsoft Azure Blob storage account. Azure services used and their purpose at this stage are as follows:
Azure Data Factory (ADF) provides data movement activity to facilitate moving data from various data sources (NTFS share and the like) to blob storage. ADF is also used to orchestrate, schedule, and monitor the subsequent data processing steps described throughout the rest of this section.
Azure Blob storage is a highly scalable service designed to store large volumes of unstructured and semi-structured data—such as log files and pictures—inexpensively, and it can be easily accessed by at-scale processing services in Azure, such as HDInsight, Azure Batch, and Azure ML.
Prepare: Customer usage web log files from the retailer website are semi-structured files, often with daily volumes in gigabytes. In this phase, the raw log files for the day are partitioned (by year, month) in blob storage for long-term storage. Data partitioning helps provide better manageability, increased performance, and high availability of website usage data. The partitioned web log data is then processed to extract the needed inputs to call a machine learning model and generate personalized product recommendations. The Azure services used and their purpose in this stage are as follows:
HDInsight is used to partition the raw log files in blob storage and process the ingested logs at scale. It generates the input data set for a machine learning model to produce personalized product recommendations. HDInsight is the Microsoft “Hadoop as a service” offering that works natively over data in blob storage.
Analyze: In this stage, the data set produced using HDInsight in the Prepare phase (stored in blob storage) is used as input to a machine learning model to generate recommendations. Mahout, an open source machine learning recommendation model, is used to generate an item similarity matrix to produce product-to-product recommendations (in blob storage). The goal is to predict the similarity between items on the retailer website. Mahout is also used to generate a personalized recommendation matrix to produce user-to-product recommendations (in blob storage). The machine learning recommendation model used and its purpose in this stage are as follows:
Mahout is an open source project by the Apache Software Foundation (ASF) and provides tools for building a recommendation engine. It is used to generate an item similarity matrix that defines the similarity between two items.
Publish: In this stage, the result set (personalized product recommendations, product-to-product recommendations) produced by Mahout is moved to a relational data mart for consumption by the retailer website. The result set can also be accessed directly from blob storage by an app, or it can be moved to additional stores for other consumers/use cases. The Azure services used and their purpose in this stage are as follows:
ADF is used to move the result set from blob storage to the data mart (on-premises or cloud).
Consume: In this stage, the personalized product recommendations and product-to-product recommendations published in Azure SQL Database are consumed by the retailer website. Customers on the site see personalized recommendations while browsing products in the retailer's website catalog. These recommendations are based on customer interests and actions. Customers also see similar or complementary products that might be related based on website usage patterns (not related to any one user).
