Synthetic Data — new ML technology for automotive

The new century comes with new challenges and innovations. While competing with such giants as Tesla, BMW, Audi, smaller companies need to go the extra mile. That’s why the German automobile manufacturer “Vivat” hired talented engineers through an outsourcing agency to help with a new invention.
After the extensive brainstorming session, they came up with an exciting idea… Welcome the one and only voice control system for cars with safety features!
For sure, it’s a great competitive advantage in the car market among other companies and let’s talk about its features.
The voice control system is a new intelligent AI algorithm to control the automobile with your voice and detect all types of accents. The safety features additionally include determining by the voice if the person is drunk or stressed to take the necessary safety prevention measures.
However, while developing POC, the tech team has faced such problems:
- Detection by voice commands when the driver is drunk and block the control of the car;
- Determining when the driver is not drunk but very stressed. The system must also distinguish the voice commands despite the change of tone and background noises.
- Understanding the voice commands of all types of accents. For example, as we all know Australian and New Zealand accents sound completely different. Same with the British / American versions.
- Understanding the user’s language when the person that has a rare type of accent is stressed and/or intoxicated.
The tech team then realized that it would take too long and would be way expensive to collect the data sets for training the voice assistant to determine all kinds of accents.
Additionally, it would be difficult to collect the data sets of the voices in stressed situations or when someone is very drunk, as this may not be ethical.
The tech lead then said: “Alright, team! Let’s code our own algorithm for generating synthetic data, such as:
- Voices of non-existing people with the right accents;
- Voices of non-existing people with strong emotions;
- Voices of non-existing people under the influence of alcohol.”
The tech team created a unique algorithm to generate the synthetic data to train another algorithm. Based on this training, the second algorithm was able to determine all types of accents, the right emotional coloring, and even the condition of a person speaking.
The synthetic data for machine learning (ML) is a great opportunity to get a dataset in cases where it is very difficult, time-consuming, expensive, or not ethical to collect real data to train an algorithm. Synthetic data generation is a GAN algorithm that can be trained to generate any images, videos, or sounds according to your business case. Synthetic data generated from computer simulations or algorithms provides an inexpensive alternative to real-world data that’s increasingly used to create accurate AI models.