6:4 Real-time voice translation for events
How can real-time translation make events more accessible and inclusive? Arkadiy Vershebenyuk, from Palabra AI, joins Lee to discuss how voice-to-voice translation can revolutionise the way we communicate at events, breaking down language barriers for attendees from around the globe.
Arkadiy shares the journey of Palabra AI, a startup that has developed the first real-time voice-to-voice translation technology for live streaming and events. He explains how their goal is to help the 4 billion people who only speak their native language by providing seamless translation capabilities. Palabra's technology allows speakers to communicate naturally in their own language while attendees receive instant translations, creating a more inclusive environment for global audiences.
In this episode, Lee and Arkadiy explore the challenges of building reliable AI-driven translation systems, the differences between their deep, focused approach and the broader scope of general AI models, and the practical applications for event organisers. They discuss use cases from live conferences to one-on-one video calls and the potential for using AI to minimise human bias during translations.
If you're interested in how cutting-edge technology can help make your events more accessible, this episode is full of inspiring ideas and practical insights.
Video
We recorded this podcast with video as well! You can watch the conversation with Arkadiy Vershebenyuk on YouTube.
Key Takeaways
Here are some of the key takeaways from our conversation with Arkadiy:
- Breaking down language barriers: Real-time translation can make events accessible to non-English speakers by allowing them to hear the content in their own language.
- AI-driven translation vs. human interpreters: AI can provide 24/7 availability, is less prone to fatigue-related errors, and can be trained to understand specific industry vocabularies, making it an excellent tool for many scenarios.
- Customisable for specific events: Palabra AI's technology can be integrated into event apps, allowing attendees to select their preferred language for real-time interpretation. This flexibility can be particularly useful for diverse audiences.
- Practical use cases: Beyond conferences, the technology can be used in various settings, such as corporate communication, video calls, and other live interactions where multilingual accessibility is required.
- Focused development approach: Unlike general AI models that aim to be broad, Palabra AI focuses deeply on perfecting one use case—real-time voice translation—ensuring high accuracy and relevance.
Connect
Transcription
We harness AI and voice recognition to generate transcripts, which we subsequently review and edit. However, due to conversational nuances and technical jargon, absolute accuracy cannot be guaranteed.
Lee:
Welcome to the Event Engine podcast. You're here with me, Mr. Lee Jackson, and we have the one and only, Arkadiy. How are you, mate?
Arkadiy Vershebenyuk:
Good, good. How are you, Lee?
Lee:
I am tip, top, and champion, especially now that you're here, and we're having this conversation. So mate, I would love for you, first of all, for the folks who don't know you, to just give us a little introduction as to who you are and where you're from.
Arkadiy Vershebenyuk:
Cool. I'm Ukrainian by origin. I live in London for several years, and I'm part of Palabra team, Palabra AI. We're the startup which is doing the first real-time voice-to-voice translation technology for live streaming, for events, and for general communication between people. We're solving a very simple issue. There are 4 billion people who don't speak any other language other than their own, and they have to communicate. They need to communicate. We give them that opportunity.
Lee:
That's incredible. Let's jump in the time machine then, mate, right back to the very beginning How did this all start?
Arkadiy Vershebenyuk:
Good question. Initially, the team behind it did AI when no one was even talking about AI, which is 10 years ago, and created one of the best face recognition algorithms in the world at that time, which was, among other things, was used at the World Cup, several other prominent use cases. We thought, what else can we solve? What other big, big, big issue? And language has been a big issue for quite some time, but there were no instruments until recently to put it all together. Again, as I said, it's very simple. 4 billion people who don't speak any other language or speak poorly one language. Right now, obviously, lots of communication is online, more communication, I think, is online in business communication, other communication. We need to solve that. We decided that problem, which is big enough, to to tackle on it.
Lee:
I often say that as somebody who speaks English, I tend to be very lazy because most people seem to be able to speak English, so I don't have to make much of an effort. I have tried, though, to make such an effort. I've tried to learn both Dutch and French. I've been learning French for three years, and I still can't hold it a half-decent conversation. I've been doing Dutch for a year, and I still can't... With either, I can't have a conversation. I could maybe tell somebody that the door is white. Exactly. All of the basic stuff. But it's going to take me as a human with only very limited time, an awful lot of time to become fluent and practise in either one of those languages.
Arkadiy Vershebenyuk:
Same with me. My Norwegian or Romanian, I can do some basic stuff. But yeah, I mean, with no seriousness, if you think about it, English is the most universally used language, obviously for a lot of business communication. When you have a native English speaker to a non-native, it's more or less okay. The problem is when one non-native starts talking to another non-native and use English as a medium. Because You have first this non-native speech recognition, which is distorted by the level of language which that person has. Then you have the other way around, which is again distorted by the level of language that the other person has. We're basically taking out that medium.
Lee:
I've never thought of it that way. Because you've got certain inflexions in your action which make it obvious where you're from. If somebody else from, say, France is speaking English to you as with their very obvious pronunciation, if you're not used to that, you're going to struggle more.
Arkadiy Vershebenyuk:
Not only pronunciation, it's about just knowledge of the words, vocabulary, giving different meaning to those words. Sure. We're basically, we are giving them the direct breach. Instead of using a third-party solution, which is English, we're just telling them, Look, you can communicate in your own language, and the other person will understand you in their own language well.
Lee:
I mean, that sounds the Holy Grail to me. How have you been able to get accurate translation then through this AI model without accidentally offending people? Because if I rewind maybe 10 years ago to all the jokes and the memes about Google Translate and how you could be telling somebody that their mother looks like a goat when all you actually meant to say was, It's lovely to see you. How has things changed and how are you guys able to confidently say, Yes, we can say what you're saying in another language?
Arkadiy Vershebenyuk:
Short answer. It's our own secret sauce, to be honest, because- Fair enough. Yeah. I think it's like now it's 14 or maybe even more world-class ML engineers working just simply on that one problem. Everything is done in-house. All of the models, all of the training, everything, we're basically fine-tuning. Do we have hiccups like that? Occasionally, we still can in some of the in some of the outlier cases. But yeah, I mean, short answer, there is a lot of background work in there. There is a lot of effort to make sure that, yeah, exactly. You hear exactly what I want you to hear.
Lee:
I suppose as well with Google Translate. Google Translate was an algorithm that was just churning words around and meanings and trying to piece together with the basic grammar rules. Whereas with, say, large language models and with AI, et cetera, they have the ability to say, does this that even make sense? And also have context, doesn't it? So we can say, all right, well, in the context of this conversation as well, does this random sentence now make sense based on the previous? Or should we reevaluate this translation? I assume stuff like that would be happening as well.
Arkadiy Vershebenyuk:
Yeah, that's one. Then you go one level deeper because the LLMs that exist right now, the clauses, ChatGPTs, etc, they are They're going broad. Basically, goal of most of us companies is going towards Gen AI, ideally. We're going deep. They're going broad. We're going deep. We're solving just one particular problem, solving it very well. That's another thing. We are focused solely on fully understanding the context, the emotions, everything, Just like in human communication. Being focused on that very narrow particular use case, but being able to solve it very well gives us a lot of very clear advantage.
Lee:
What are some of the use cases for your service? Well, first of all, for folks who aren't aware, could you just describe how it could be used and then perhaps as well share in an event context where it might be most useful?
Arkadiy Vershebenyuk:
Absolutely, yes. I mean, in very simple terms, it's an API which any of our partners can use. They can go through the documentation, we can onboard them very easily. Then what happens, the magic happens, and then you can build it in. In case of conferences, there are several use cases. One is for, say, people here present at the event, if there is a large speaking event, and not everyone speaks English. The conference organisers can get our stream. The speaker will speak as he or she does. They will give the audio stream to us, and we can either give it, for example, they can put us in the event application, and the people, the listeners, will just choose the language that they need to hear, and they will hear us in the language. That's amazing. If it's streamed online, it's the same thing. Basically, they can even just say, Look, a listener from Brazil will hear it in Brazilian Portuguese. A person tuning in in Turkey will hear it in Turkish, or et cetera, et cetera. That's a typical use case.
Lee:
I imagine the use cases then go way beyond just that then in theory, because I'm already now starting to think of, does this mean that two people who cannot speak each other's language could in theory have a video call?
Arkadiy Vershebenyuk:
Yeah, absolutely. We're doing that right now. We're running several design projects right now, for example, with a potential client who's built a corporate communications last task management platform. They build this specifically for Arabic segment, Saudi Arabia. Some of their clients just don't speak good English, but they need a very good understanding. They want us to come in and help them. That's incredible. The other thing is we can do it on-premises. You cannot use any of the Gen AI, the I don't think it's a general models. You cannot use OpenAI or any other model on-premises.
Lee:
I was talking to someone earlier about our human bias and biases, and I don't know if this is true, but if, say, a human interpreter, if If we have a human interpreter helping someone, then not only is there the delay because I've got to listen to the person, I've then got to understand that context and then translate it to somebody else. I might introduce some of my own biases into that conversation. What that person was trying to convey may not necessarily be in reality. Would it be fair to say that something like AI could strip out that issue?
Arkadiy Vershebenyuk:
Unless it's At some point AI starts having its own biases, of course. But no. But yes, in all seriousness, yes. Ideally, of course, the most professional, simultaneous interpreters are trying, but we're all humans. But yes, you touched on two very important aspects where we come in and we… Obviously, we will not substitute 100% of human interpreters, high-level negotiations, et cetera, between the two presidents.Oh.
Lee:
Yeah
Arkadiy Vershebenyuk:
You don't want…for various reasons. But in a vast majority of use cases, there are several aspects where AI is better. One, it's 24/7. It doesn't get tired. Normally, a human interpreter needs to change every 30 minutes because it's very intense. They need to switch. Tiring means not only they need to change, it's also just a human may make mistakes because they tired. The machine does not do that. Third, you can train the machine on very specific vocabularies. You have a very specific need, like say, pharma or some very niche needs or niche events, we can train our machine specifically for that event or for that use case.
Lee:
That's incredible. That will also help then reduce mistranslations, et cetera, because you've already fed with the vocabulary of the industry, of the topic, et cetera.
Arkadiy Vershebenyuk:
The machine just learns much faster than a human. It's amazing, isn't it? This is the thing, yeah.
Lee:
I was just saying the other day, I could feed a 90-page document into an AI and it could give me a summary in a second.
Arkadiy Vershebenyuk:
It's out, yeah. In a second.
Lee:
And is your translation almost real-time as well?
Arkadiy Vershebenyuk:
Yes, it's very low latency. Our goal is ideally to be zero latency. I mean, near zero.
Lee:
That would be impossible, I imagine, but no problem. No, but right now, Right now, we are definitely on par with human simultaneous interpreters.
Arkadiy Vershebenyuk:
With good human interpreter, we're completely on par. Yeah, that's amazing. You don't have to... This was one of the questions I just got. Do we need to stop? No. Once you start talking, it's basically machine picks up. You don't have to stop and then wait for the machine to translate. Stop and wait for the machine to… You just talk naturally, the machine interprets it on the Amazing.
Lee:
Mate, that's incredible. How can people find out more? Then we shall say goodbye.
Arkadiy Vershebenyuk:
Palabra.ai. We're very happy. There is a button, you can book a live demo, and we will take it from there. We're just coming out of stealth mode. We're onboarding first clients, so we're very happy to talk.
Lee:
Amazing. Well, I'll tell you what would be great then. Maybe in a few weeks time, if we reconnect, what we can do is an online demo on the YouTube channel. Perfect.yes, we can do that. That'd be great. Then we'll put a link in the show notes for this episode. People can go watch.Mate, put it there.
Arkadiy Vershebenyuk:
Thank you.
Lee:
Thank you so much.
Arkadiy Vershebenyuk:
Thank you.
Lee:
Cheers..