In the electronic journal area, Big Data is used to analyze reader and publishing content, thereby creating new features that best connect these two components. VnExpress – Vietnam’s No. 1 online newspaper is leading the way in applying Big Data technology to improve customer experience and performance of the newspaper.
Big Data – Breakthrough Technology
In early 2014, Big Data was a technology trend that was believed create new opportunities for companies in the field of digital communications. This technology is expected to bring breakthrough features in e-journaling, including personalization for the readers, automation of content production, automatic content sorting, maximum advertising effectiveness, deep understanding of users and many other gadgets. In particular, Big Data is simply defined by four elements:
- Volume – the amount of data;
- Velocity – data processing speed;
- Variety – the diversity of data;
- Value – Value of the data.
Data – the most important element of Big Data, is collected from a variety of sources: internal in the enterprise database, information channels, social networks or customer data provided. Time, the amount of Big Data storage grows rapidly to the terabyte numbers, petabytes, zettabytes or more depending on the size of the business. Therefore, the organization or selection of storage solutions in each enterprise is a very important strategy when starting to use Big Data.
Normally, data collected is divided into two basic types: structured data and unstructured data. Structured data includes predefined types of data by business model, transaction type, log data, and so on. Unstructured data are accounting for 80% of Big Data, includ: video, audio, email, social network feedback, and others. This huge amount of data is a big challenge for the data processing team in useful decomposing data.
Great opportunity for VnExpress
Every second, VnExpress collects over 1 MB of data from all online activities on services including content, advertising, and arising transactions. To make good use of this data, VnExpress Technical Center has applied Big Data to the research and provides four optimal solutions to bring the best experience for users and editors:
- Recommended system;
- Automation of content production;
- Performance monitoring system.
When readers follow any content on the VnExpress system, the system will analyze and provide a trend that the reader may be interested in, based on factors such as the content of the article, the hashtag, the owner Title, article title, and publication time. In 2017, VnExpress Technical Center researched and successfully implemented the system of recommendations in the press publishing activities, initially reaching the click-through rate 15-20%.
In the future, the goal of the team is that by relying only on the content of the article without the need for auxiliary elements such as Graph Database (Neo4j), Caching (Redis), Python, the system will offer suggestions to the editor in the content editing process through content management system (CMS), which helps to make the content accurate to the editor’s need.
With data of more than 16 years collected, the technical center has conducted research and analysis to get the image of each audience: gender, interests, activities, favorite categories… Based on that analysis, the content of the home page, the category page as well as the other content elements next to the detailed article will be displayed differently for each reader. This deployment will be applied to both the desktop version as well as a mobile app (Android and iOS).
Automation of content production
Automation in content production is the goal for the next three years of CMS VnExpress. Features such as automatic hashtag hints, related topics, related news after the editor complete the article or as advanced as an automatic aggregation into a complete article based on the collected data are what CMS VnExpress is looking forward to.
Difficulties in implementing these features are Natural language processing (Vietnamese) and compatibility with the media standards that VnExpress is complying.
Performance monitoring system
VnExpress has a large number of users’ access, therefore is suffered from many attacks continuously. Supporting and tracking these attacks require the construction of a real-time search and analysis system for large-scale data with extremely fast speeds to help ensure the system stable.
Application Performance Monitoring (APM) is an integrated system of monitoring functions, with features such as:
- Error warning system;
- Monitor the number of users across the system;
- Track the number of links to servers and automatically block if DDOS is present;
- Monitoring system for VnExpress access time from locations, ISPs help to detect and handle errors quickly.
In the system, Big Data is responsible for collecting all log data, system metrics, users’ behavior in real time. From there, using algorithms to extract the necessary data for monitoring the system. The ability to view the system as a whole helps the administrator in upgrading, extending, and resolving incidents.
Currently, the system of VnExpress APM processes logs data volume over 100 million rows and the amount of CCU system log generation to about 100,000-200,000 per second daily. In the future, VnExpress will build a system that automatically resolves errors with these obtained data.
About the author:
FPT Online technical center
VnExpress technical center now has more than 80 members and four functional units. Technical center plays the role of managing, operating and developing the online newspaper system, advertising system as well as all technical systems of FPT Online company. The R&D department focus on research and development of new technology, delivery product’s application to the development team, and Technical Solution team takes part in the design architecture suitable with Core Platform.
(Published on the FPT Technology Magazine, FPT TechInsight No.2)Related posts: