Last week, the Talk show “Data Science: The sexiest field of the 21st century” with the host of Nguyen Hai Nam, Chief Mentor of xSeries FUNiX, finally took place, aiming to provide participants with an overview of Data Science.

MENTOR NGUYEN HAI NAM RECEIVED HIS MASTER IN COMPUTER SCIENCE FROM UNICAS UNIVERSITY (ITALY). He WAS A TECHNICAL CONSULTANT AT LUMI SMART HOME, PRIORITY ENGINEER AT VNPT AND AI R&D TEAM LEADER AT ASILLA JAPAN, A START-UP IN HEALTHCARE.

Data Science definition and process

Mentor Nguyen Hai Nam started off the Talk show with the definition of Data Science. Data Science is all about handling and manipulating data. Data Scientists collect, predict, analyze and model data, developing them into APIs or services, and make decisions based on data. The booming of Data Science in recent years has been attributed to the emerging of data volumes and the development of algorithms and scientific papers.

The report on “The IT Industry in Vietnam” shows that Machine learning/AI, and Big Data/Data Science are two of the most sought-after skills for those who looking for job in the near future. However, mentor Nguyen Hai Nam also emphasized that these are not the most in-demand skills in Vietnam. As Vietnam tends to focus on software outsourcing, the most in-demand skills are still front-end programming, web and mobile.

At the event, Nguyen Hai Nam shared 5 basic stages of a data science process:

  • Obtain – Collect and/or retrieve data: Gather data from relevent sources;
  • Scrub – Clean data: Clean data to formats that machine understands;
  • Explore – Data mining: Find significant patterns and trends using statistical methods;
  • Model – Modeling data: Construct models to predict and forecast. For example, when you need to decide which route to take to avoid traffic congestion, you need a model with such inputs as the traffic flows on each road and the number of vehicles on the surrounding roads, and the output is the  fastest route. An example of data modeling is Google Maps. Google Maps has showed users the fastest route as an alternative route since 5-7 years ago;
  • Interpret – Data Interpretation: Put the results into good use.
5 basic stages of a data science process.

The process is divided into 10 specific steps. A Data Science project begins with Business understanding, then Analytic approach, Data requirements, Data collection, Data understanding, Data preparation, Modeling, Evaluation, Deployment and Feedback. Mentor Nguyen Hai Nam specifically emphasizes the importance of Feedback as it is a key attribute to software updates. A software is released without any bugs, meaning that it was released too late. Developers will release a useful software product, not the perfect one.

Detailed process of a Data Science project.

Major Data Science jobs

Mentor Nguyen Hai Nam focus on four major sub-disciplines in data science, and positions of such disciplines in a project.

  • Data analyst: This role has the responsibility for five steps ranging from Business understanding to Data understanding. Data analyst works with data analytics tools like Python and Tableau, and has such skills as data processing, data tables, math skills and Machine Learning.
  • Machine Learning/Deep Learning Engineer: This role has the responsibility for three steps, which are Data Preparation, Modeling and Evaluation. The job of ML/DL Engineer is to create a model with the problem given by Data Analyst, and find a good enough solution to this problem. Both Machine Learning and Deep learning are the subsets of Machine Learning. But in the past 10 years, Deep Learning becomes a separate branch as it has evolved 2 branches – natural language processing and computer vision. Machine Learning needs a lot of domain experts to analyze the data fields and features that are useful for model. While machine learning requires tools related to probability and critical thinking, Deep Learning uses a structure called a neural network, a network that simulates human brain, and other tools related to the way the human brain operates, thinks, and makes decisions.
  • Data Engineer: The job of a data engineer is to make data requirements, collect, store, retrieve and process data. For companies with a huge amount of data such as Viettel or Shopee, this is not an easy job. This is the position with the most job opportunities in Data Science.
  • Data Scientist: This position is at a higher “level” than the above positions. Data scientists need to master the entire cycle of a Data Science project, with a particular focus on such steps as Business understanding: understanding specific problems of the project; Data understanding: understanding data; Feedback: Understanding where the problem is happening. This position should have a broad vision position to cover the entire project.
4 Data Science positions and essential skills at specific levels: L – large: highly skilled, M – Medium: medium skilled, S – low skilled.

The Talk show ended with a lot of questions from the audience for mentor Nguyen Hai Nam. Here are some questions that get audience’s attention:

Questions about Data Science jobs

1. What does a Data Scientist work day look like? Do you have to code a lot?

Answer: In addition to code, you will communicate with customers, boss and colleagues. However, coding is still a very important task. You work with the same type of code, but you will be paid a higher salary than other developers, as you add values to your company.

2. Having said that, newly graduated Data Science students will not be able to work at Data Scientist immediately, but start working as Data Analyst or Data Engineer. Do you agree with this opinion?

Answer: Definitely, graduates must be excellent, or very lucky, to apply for a Data Scientist job. Normally, they will start from three positions, which are Data Engineer, Data Analyst and Machine Learning Engineer. You need to acquire the above skills to become a Data Scientist.

3. What jobs are most in-demand right now? AI or Data Science jobs?

Answer: The demand or the job openings for a Data Scientist is more than that of an AI Engineer. In Vietnam, basic science receives less attention than applied science.

4. To become a Data Scientist, should I start working as Data Engineer or Data Analyst. Would Machine Learning Engineer be more suitable for me? Or should try all these jobs?

Answer: You should learn about Data Science for 3-6 months and then choose a branch, you can learn 1-2 subjects first to see what you are suitable for and what you like. You like in-demand job as Data Engineer, or low-stress, yet highly-skilled position. If it is the latter option, you need to take the risks and narrow your career opportunities. However, to get a job, you have to go deep. If you go wide, it will take 2-5 years to apply for a job.

5. How many Data Scientists needed to structure a Data Science Team?

Answer: In Vietnam, just around 100 businesses have Data Science positions. Most businesses will hire service providers. The number of service providers, for example to run data services from mobile carriers, is numerous.

6. What is the average length of a Data Scientist’s career?

Answer: For a emerging industry like Data Science, it is difficult to talk about the specific length. In my opinion, age does not matter. For example, I observed that the “peak age” of a programmer in a foreign country is 35-40.

Questions about Data Science learning

1. How to get started with Data Science and how to master it?

Answer: You should start by searching the phrase “How to become a data scientist” on Google and you will find a lot of resources. Xseries’s Machine Learning/Data Science will help you learn it by yourself. But if you want to go alone, you can still explore and reach your destination. To master Data Science, it is necessary to have a firm grasp of math and programming, these are the two basic foundations of Data Science.

2. I have 3 years of experience in statistics, programming and database. Do I match with Data Science?

Answer: Math and programming are two essential skills in Data Science, as you will face math and programming problems when working as a Data Scientist. If you do not have a firm understanding of math, you will not level up to other positions, and you will not go far in the industry.

3. What areas of Mathematics needed to Data Science?

Answers: Statistical probability (deep and firm grasp), linear algebra (average level), analysis (basic level).

4. Should beginners start with practicing algorithms on Codewar and Hackerrank, or focus on learning analysis skills like Numpy and Pandas?

Answer: Programming is a fundamental skill of Data Science. Personally, before switching to Data Science, I spent 3 months learning hackerrank. If you are very good at programming then just learn analysis skills.

5. If my starting point is a programmer, what skills I should acquire to become a Data Scientist?

Answer: If you are a good developer, and you are well informed of programming, algorithms and database, you need to review math to switch to Data Science. It is important that you define how much your programming skills are. When your boss or team BA raises a problem, how long will it take you to turn their idea into something that can run on the computer? That time period will show how well you are.

6. As you mentioned, AI is a small array in Data Science, and very problems from Data Science. I want to learn more about AI. Does Funix’s Data Science course offer in depth knowledge about AI?

Answer: If you are interested in AI, you should learn Machine Learning. This is the most used tool in AI and outperforming other techniques in this field. You can learn about Machine Learning to understand the overview of AI and choose your own direction. Data Science is very different from AI. Data Science is a problem of using data to create benefits, while AI is a problem of creating entities on the computer that can simulate human behavior/ intelligence.

For example, the problem of analyzing the real estate market, Google Map is Data Science, a problem for machines to do tasks that people can do well such as self-driving cars, while AlphaGo is AI. Of course, AI uses data as a great tool, like a ride that requires gas. But data is only one part, AI has many other parts such as science of thinking, consciousness and brain. There are problems in AI (such as inserting a chip into the brain – Elon Musk’s Neuralink) that are not included in Data Science.

7. I want to ask about xSeries Machine Learning and Data Science. I want to become a Machine Learning Engineer / AI or Data Scientist. Which certificate should I learn first or I should study both at the same time?

Answer: You should not study two certificates at the same time, because the amount of knowledge in one certificate is large enough than that of a master’s program. If you only want to work with data and learn about how to benefit from data, choose Data Science. If you are interested in AI, the problem that needs the support of machines, go for Machine Learning.

8. After graduating from FUNiX, what other certifications do I need to strengthen my Data Science resume?

Answer: Degrees (including Google professional certificates) are just the entrance tickets. Employers focus on your knowledge, experience and the number of Data Science projects you have done. If you want to captivate recruiter, you should have an achievement in this industry, such as activities or your own open projects and repos in the past.

9. Where could I find projects for entry level Data Scientists?

Answer: There are many competitions in Vietnam, you can search on Google. In foreign countries, you can access to Kaggle website to work on several problems to sharpen your skills.

Source: FUNiX

Related posts: