Big data analytics finds hidden models, unknown correlations, market trends, customer preferences and other useful business information from large datasets. This technology is now being applied in many different units of FPT Corporation such as FPT Retail, FPT Telecom, FPT Online.
This article focuses on the application of big data analytics at Sendo.vn.
Most e-commerce sites feature recommendations such as suggesting similar products for customers from the products they have already seen or bought.
Previously this task is usually carried out by listing products of the same type. Currently, the recommendation algorithm will find the products that a customer might be interested in by finding the products that similar customers are fond of. For example suppose user A likes iPhone, Casio watches, and lingerie VNS236. If user B, in addition to iPhone, Casio watches, lingerie VNS236, also likes a Babolat tennis racket, then the Babolat tennis racket may be recommended for user A. To describe the relationship of users and products we save text files with 3 parameters: ID of users, ID of products and interest level (possibly determined by the number of times he or she viewed the product). The input of the algorithm is described in the graph (bold line: users are very interested in the product, dashed lines: not too interested but have been linked to that product). By inspection we can see that users 1 and 5 have the same “taste” as both of them like product 101 and are slightly interested in 102 and 103. User 1 and 4 are quite similar as they both like product 101 and somewhat like 103. Meanwhile, user 4 and 5 like 104 and 106 very much, so that recommendations for user 1 can be 104 and 106.
When the number of users and products reaches millions, the data is massive. Two popular algorithms for recommendation are Frequent Pattern and Collaborative Filtering, both integrated in readily available frameworks like Mahout or Spark MLLib. In particular, Spark MLLib is on the rise because it supports new algorithms and simplifies distributed programming.
To implement this task for Sendo.vn, two modules have been developed.
- Builder Module: to preprocess raw data and then apply two algorithms, Frequent Pattern and Collaborative Filtering, on viewed and purchased items. The results are cached on Redis and saved to MongoDB.
- Web API Module: on each request from the client, the system will retrieve the cached data from Redis. If the Redis cache returns an error, the result can be taken from MongoDB. In case of unprocessed data (unavailable from Redis and MongoDB), the system will load the model and calculate on-the-fly.
Determining demographic information of customers is crucial in e-commerce to implement marketing strategies. From the actions of viewing and purchasing products, we can guess the gender, place of living, type of products that customers may buy in the future with acceptable accuracy.
On deployment, daily log data is processed by Hadoop MapReduce. The system uses neural networks to figure out the gender and products which can be purchased in the future and uses IP2Location to find the address based on IPAddress. Data for calculations is stored in HBase, then calculated incrementally using MapReduce. Data from HBase is later indexed to Solr for the client to request.
The deployment helped to significantly increase Sendo.vn’s ability to approach and understand customer needs. The user recommendation system is not only applied in e-commerce but also helps improve the experience of online users in many different fields such as entertainment, multimedia or online search.
Nguyen Viet Cuong – FPT HO
(Published on the FPT Technology Magazine, FPT TechInsight No.1)