Image Recognition and Speech Recognition – Machine Learning Applications in Real World

1785

As a sub-field of Artificial Intelligence (AI) technology, machine learning is the method of data analysis which constructs analytical models automatically. This is a promising technology to provide the most optimal support for businesses with a variety of real-world applications, such as speech recognition and image recognition.

Machine learning uses iterative algorithms to learn from data and allows the computer to find information, hidden values that are not explicitly programmed. The repetitive aspect of Machine learning is important because when these models are exposed to new data, they can adapt independently. Machine Learning systems can quickly apply knowledge and training from large datasets to perform face recognition, speech recognition, and more.

Image Recognition

One of the most common uses of machine learning is image recognition. There are many situations where you can classify the object as a digital image. For digital images, the measurements describe the outputs of each pixel in the image.

In the case of a black and white image, the intensity of each pixel serves as one measurement. So if a black and white image has N*N pixels, the total number of pixels and hence measurement is N2.

In the colored image, each pixel considered as providing 3 measurements to the intensities of 3 main color components ie RGB. So N*N colored image there are 3 N2 measurements.

  • For face detection – The categories might be face versus no face present. There might be a separate category for each person in a database of several individuals.
  • For character recognition – We can segment a piece of writing into smaller images, each containing a single character. The categories might consist of the 26 letters of the English alphabet, the 10 digits, and some special characters.

Image recognition system uses the machine learning technology is being used by Google in their products such as Google Photos, Google Search, Google Drive … to optimize the image detection through the keyword search of user.

Speech Recognition

Speech recognition (SR) is the translation of spoken words into text. It is also known as “automatic speech recognition” (ASR), “computer speech recognition”, or “speech to text” (STT).

The application translating spoken words into text.

In speech recognition, a software application recognizes spoken words. The measurements in this application might be a set of numbers that represent the speech signal. We can segment the signal into portions that contain distinct words or phonemes. In each segment, we can represent the speech signal by the intensities or energy in different time-frequency bands.

Although the details of signal representation are outside the scope of this program, we can represent the signal by a set of real values.

Speech recognition applications include voice user interfaces. Voice user interfaces are such as voice dialing, call routing, domotic appliance control. It can also use as simple data entry, preparation of structured documents, speech-to-text processing, and plane.

Using Machine Learning, Baidu’s research and development department have created a tool called Deep Voice – a deep neural network that is capable of producing artificial voices that are difficult to distinguish from real human voice. This network can “learn” features in rhythm, voice, pronunciation, and vocalization to create the voice of the speaker. In addition, Google also uses Machine Learning for other voice-related products and translations such as Google Translate, Google Text To Speech, Google Assistant.

Besides the applications in audio recognition and image recognition, Machine learning is also applied in areas such as medical analysis; arranging, classifying; data analysis and forecasting, etc, in the field such as healthcare, financial services, transportation, marketing & sale…In a near day, Devices and applications based on Machine learning technology may appear in all aspects of human life.

FPT.AI – New Generation Conversation Platform and Virtual Assistant

In order to catch up with the modern technology trend, FPT has been using Machine learning in most FPT applications and technology products such as FPT.AI – New Generation Conversation Platform and Virtual Assistant, PeoIed identification in FPT Shop, Autonomous car, Human Machine Interface – TTS, STT,…

Source: data-flair.training

Related posts: