3.3 Generate rules using apriori algorithm.
Support threshold for users with orders was 2%, for others 1%.
Here is first, most strongest rule from each report.
All users without orders:
Sorted by lift.
supp |
conf |
lift |
leve |
Cove |
rule |
0.012 |
0.623 |
20.260 |
0.012 |
0.020 |
divertimento -> tecnologia |
Sorted by leverage.
supp |
conf |
lift |
leve |
Cove |
rule |
0.036 |
0.523 |
7.215 |
0.031 |
0.068 |
pumpseopentoes -> botins |
Sorted by confidence.
supp |
conf |
lift |
leve |
Cove |
rule |
0.012 |
0.854 |
12.525 |
0.011 |
0.014 |
sabrinasemocassins botins -> pumpseopentoes |
Subscribed users:
Sorted by lift:
supp |
conf |
lift |
leve |
cove |
rule |
0.012 |
0.403 |
21.484 |
0.011 |
0.029 |
tecnologia -> divertimento |
Sorted by leverage:
supp |
conf |
lift |
leve |
cove |
rule |
0.034 |
0.480 |
6.962 |
0.029 |
0.071 |
botins -> pumpseopentoes |
Sorted by confidence.
supp |
conf |
lift |
leve |
cove |
rule |
0.011 |
0.871 |
12.198 |
0.010 |
0.012 |
botasdesalto pumpseopentoes -> botins |
Users with orders:
Sorted by lift
supp |
conf |
lift |
leve |
cove |
rule |
0.021 |
0.729 |
25.658 |
0.020 |
0.028 |
jogos -> divertimento tecnologia |
Sorted by leverage
supp |
conf |
lift |
leve |
cove |
rule |
0.034 |
0.514 |
9.745 |
0.030 |
0.066 |
pumpseopentoes -> botins |
Sorted by confidence.
supp |
conf |
lift |
leve |
cove |
rule |
0.021 |
0.921 |
23.219 |
0.020 |
0.022 |
jogos tecnologia -> divertimento |
The strongest rules obtained using sorting by leverage. Because support, confidence, coverage and lift a also big enough. Very interesting results can be obtained using sorting by confidence, there are rules with more then 2 items. All 3 basket shows the same behavior of visitors: they are has seen and ordered many shoes.
SUMMARY.
We implemented association rules inducer using Orange framework and used it to get knowledge from dataset that “We-Commerce” gave us. Dataset was processed and analyzed, all noises was reduced and three relevant subsets of data was aggregated. The first was subset of all visitors, the second subset of visitors that has user_gui (was subscribed), the third subset of visitors that has at least one order. This data separating helps us to see common patterns of behavior for different types of visitors. All rules was sorted three times by three parameters: by lift, by confidence, by leverage. The most strongest rules shows sorting by leverage, because in thus report all other parameters relatively high, higher then the mean values. In other cases some of parameters may be lower then the mean.
Better subset was visitors with orders, because all rules induced from has highest leverage, most confidence more then 50% and support higher then 2%. While others has rules with support higher then 1%, and most of them between 1%-2%.
APPENDIX A.
CREATE VIEW v_commerce_without_noise_product AS
SELECT *
FROM commerce c
WHERE c.product_gui != 'open'
AND c.product_gui != 'home'
AND c.product_gui != ''
AND c.product_gui != '/onestepcheckout/'
AND c.product_gui != '/checkout/cart/index/'
AND c.product_gui != '/lon-about-us'
AND c.product_gui != '/lon-contacts'
AND c.product_gui != 'display.category*homepage'
AND c.product_gui != '/ljv-contacts'
AND c.product_gui != '/sales/order/history/'
AND c.product_gui not like '/customer%'
AND c.product_gui not like '/?gclid%'
AND c.product_gui not like '/?t%'
AND c.product_gui not like '/sales/order/view/%'
AND c.product_gui not like 'order.%'
AND c.product_gui not like 'orderonline%'
AND c.product_gui not like 'ordermailphone%'
AND c.product_gui not like 'pddtitle%'
AND c.product_gui not like 'pddimage%'
AND c.product_gui not like 'intro%'
;
CREATE VIEW v_visitors_with_orders AS
SELECT DISTINCT cookie_id FROM commerce
WHERE product_gui LIKE '%order%'
AND product_gui NOT LIKE 'display.%'
AND product_gui NOT LIKE '/sales/%'
OR product_gui LIKE '%checkout%'
;
CREATE VIEW v_num_of_events_per_transaction AS
SELECT cookie_id,session_id,count(*) number_of_events_per_session
FROM commerce
GROUP BY session_id,cookie_id
;
CREATE VIEW v_number_of_events_per_cookie AS
SELECT cookie_id,count(*) sessions_count, sum(number_of_events_per_session)
FROM v_num_of_events_per_transaction
GROUP BY cookie_id
ORDER BY cookie_id
;
CREATE VIEW v_count_of_visitors_and_sessions_per_event AS
SELECT number_of_events_per_session numberofeventspersession, count(number_of_events_per_session) numberofvisitors
FROM v_num_of_events_per_transaction
GROUP BY number_of_events_per_session
ORDER BY number_of_events_per_session ASC
;
CREATE VIEW v_count_of_visitors_and_sessions_per_cookie AS
SELECT sessions_count numberofsessions, count(*) numberofvisitors
FROM v_number_of_events_per_cookie
GROUP BY sessions_count
ORDER BY sessions_count
;
CREATE VIEW v_relevant_subset_cookies AS
SELECT DISTINCT cookie_id
FROM v_number_of_events_per_cookie
WHERE sessions_count<30
AND sessions_count>5
;
CREATE VIEW v_relevant_subset_without_orders_all AS
SELECT c.session_id,c.product_gui
FROM v_relevant_subset_cookies r,v_commerce_without_noise_product c
WHERE r.cookie_id = c.cookie_id
ORDER BY c.session_id
;
CREATE VIEW v_relevant_subset_without_orders_subscribed_users AS
SELECT c.session_id,c.product_gui
FROM v_relevant_subset_cookies r,v_commerce_without_noise_product c
WHERE r.cookie_id = c.cookie_id
AND c.user_gui != ''
ORDER BY c.session_id
;
CREATE VIEW v_count_of_products_in_relevant_subset_all AS
SELECT count(product_gui) number_of_product,product_gui product_name
FROM v_relevant_subset_without_orders_all
GROUP BY product_gui
ORDER BY number_of_product DESC
;
CREATE VIEW v_count_of_products_in_relevant_subset_subscribed_users AS
SELECT count(product_gui) number_of_product,product_gui product_name
FROM v_relevant_subset_without_orders_subscribed_users
GROUP BY product_gui
ORDER BY number_of_product DESC
;
CREATE VIEW v_relevant_subset_orders_only AS
SELECT p.session_id, p.product_gui
FROM v_visitors_with_orders o,v_commerce_without_noise_product p
WHERE o.cookie_id=p.cookie_id
ORDER BY p.session_id
;
COPY (SELECT * FROM v_relevant_subset_without_orders_all) TO '/sorted_relevant_subset_all.tab' csv header
;
COPY (SELECT * FROM v_relevant_subset_without_orders_subscribed_users) TO '/sorted_relevant_subset_subscribed.tab' csv header
;
COPY (SELECT * FROM v_relevant_subset_orders_only) TO '/sorted_relevant_subset_orders_only.tab' csv header
;
