Solution architecture

The following sections reference the six stages outlined in Figure 2.

Figure 2: Solution architecture for personalized product recommendations

Data sources: The raw data sources for this use case are the web log files containing information about customer behaviors on a retailer website. In Figure 2 above, we assume that the retailer website produces the customer usage web log files (customer browsing behaviors, buying patterns, and so on). The store also maintains a product catalog along with the customer information.

Ingest: In this stage, the customer usage web log files, along with customer and product catalog information, is “ingested” on a regular basis (for example, daily) into a Microsoft Azure Blob storage account. Azure services used and their purpose at this stage are as follows:

 Azure Data Factory (ADF) provides data movement activity to facilitate moving data from various data sources (NTFS share and the like) to blob storage. ADF is also used to orchestrate, schedule, and monitor the subsequent data processing steps described throughout the rest of this section.

 Azure Blob storage is a highly scalable service designed to store large volumes of unstructured and semi-structured data—such as log files and pictures—inexpensively, and it can be easily accessed by at-scale processing services in Azure, such as HDInsight, Azure Batch, and Azure ML.

Prepare: Customer usage web log files from the retailer website are semi-structured files, often with daily volumes in gigabytes. In this phase, the raw log files for the day are partitioned (by year, month) in blob storage for long-term storage. Data partitioning helps provide better manageability, increased performance, and high availability of website usage data. The partitioned web log data is then processed to extract the needed inputs to call a machine learning model and generate personalized product recommendations. The Azure services used and their purpose in this stage are as follows:

 HDInsight is used to partition the raw log files in blob storage and process the ingested logs at scale. It generates the input data set for a machine learning model to produce personalized product recommendations. HDInsight is the Microsoft “Hadoop as a service” offering that works natively over data in blob storage.

Analyze: In this stage, the data set produced using HDInsight in the Prepare phase (stored in blob storage) is used as input to a machine learning model to generate recommendations. Mahout, an open source machine learning recommendation model, is used to generate an item similarity matrix to produce product-to-product recommendations (in blob storage). The goal is to predict the similarity between items on the retailer website. Mahout is also used to generate a personalized recommendation matrix to produce user-to-product recommendations (in blob storage). The machine learning recommendation model used and its purpose in this stage are as follows:

 Mahout is an open source project by the Apache Software Foundation (ASF) and provides tools for building a recommendation engine. It is used to generate an item similarity matrix that defines the similarity between two items.

Publish: In this stage, the result set (personalized product recommendations, product-to-product recommendations) produced by Mahout is moved to a relational data mart for consumption by the retailer website. The result set can also be accessed directly from blob storage by an app, or it can be moved to additional stores for other consumers/use cases. The Azure services used and their purpose in this stage are as follows:

 ADF is used to move the result set from blob storage to the data mart (on-premises or cloud).

Consume: In this stage, the personalized product recommendations and product-to-product recommendations published in Azure SQL Database are consumed by the retailer website. Customers on the site see personalized recommendations while browsing products in the retailer's website catalog. These recommendations are based on customer interests and actions. Customers also see similar or complementary products that might be related based on website usage patterns (not related to any one user).

<<< < Предыдущая 1 23 / 93 4 5 6 7 8 9 > Следующая >>>

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]

#
15.11.2018342.53 Кб6Pokolodna_MR_do_PZ_ta_SR_po_Rekr_geogr_ostannya....doc
#
20.02.20161.89 Mб6Politiologiya_Posibn.DOC
#
19.08.2019128.51 Кб9polunin.doc
#
01.07.20254.14 Mб0pol_obl_ex.doc
#
01.04.2025758.78 Кб0posibn_2012.doc
#
01.07.202518.33 Mб0PowerPoint.docx
#
01.07.2025416.77 Кб0Pract_SRS_SP.doc
#
01.05.202578.85 Кб0Prakticheskaya1_bizness_plan.doc
#
01.07.202543.68 Кб0Praktichna_robota_1 (1).docx
#
01.03.2025421.38 Кб0Praktichna_robota_ITR_1.doc
#
01.03.2025131.58 Кб1Praktichna_robota_ITR_2.doc