Hello again, it's been a long time since my last message. I'd like to give a brief summary about the progress of our project concerning ASR and TTS. We have successfully implemented a pilot application using ASR engine of Nuance. Still being not perfect, it helped us to ensure the financing for a successive project in the next year.
ASR is a very complex topic. It is not enough to merely utilize API of a ASR framework in order to get reliable recognition results. As much important as the API usage is to understand how speech recognition works. How voice is recorded, what are hardware parameters, how it is analyzed. These are crucial questions leading to success or contrariwise.
Speaking in particular of Nuance ASR Vocon, there are dozens of parameters, which can be tuned to bring recognition ratio from 0 to 100 per cent. The problem thereby is, there is no unique, the one right settings list, that could satisfy all needs. It is very business case and application specific. It really depends on if data collection or application control via voice is performed. It's an illusion to think one can get good recognition results without vendor support in reasonable time. And under costs pressure and in face of project deadlines just forget about it.
But I think now we have sufficient know-how to build voice enabled applications. Next year project is to improve recognition results, bring them to a good confidence level and enhance the solution through the usage of TTS to deliver a user recognition confirmation via voice. That would complement a real hands-and-eyes-free solution.