I’ve been using Speech Note (github link) for months, but it often gets things wildly wrong.
I thought it was my mic, so I got one that’s crystal clear. I also tried a ton of different models, and other than being slow (or fast), their accuracy is usually pretty similar.
But I’m still needing to take a lot of time to edit the results, and I wonder if there’s something I should be doing to get better results.
On other speech-to-text platforms (like Futo keyboard on Android), the results are fast and very accurate. I have a hard time believing that Speech Note can’t be as good.
Can any other users share their experience?
UPDATE: Ok, the best model that I’ve found for Speech Note is the WhisterCpp FUTO English-244, which, funny enough, is the model I use on Futo Keyboard for Android. It’s not the fastest, but fast enough. It is quite accurate, and that means less time editing text.
- I’m not a native English speaker, but neither people nor other robots have problems understanding me - in person or over a microphone. Speech Note hadn’t shown good results, unfortunately. I really wanted to use it, because on my Android phone I use voice input all the time. - I really wanted to use it, because on my Android phone I use voice input all the time. - That’s why I’m thinking it’s a problem with Speech Note and not my mic, or how I’m speaking to it. - That’s a real shame. I can type quite fast, but my hand joints called it quite a while ago. 😵 
 
- Had enough issues with it to not find it helpful. But I’m not a native English speaker and support for my local language is so-so, so might as well be me that’s the problem. 
- I haven’t used Speech Note, but I have been using Whisper with great success. I run it via Docker. 
- Try a few different accents out - but I’ve never had better than a 95% success rate myself 
- I’ve used it for a short while to test it out. Accuracy was pretty good, as was correct punctuation. Response time also good. - It’s using my Nvidia GPU to do the LLM thing, so that may be the difference. - It’s using my Nvidia GPU to do the LLM thing, so that may be the difference. - This could be! - Interestingly enough, I was playing around with LLama, as they have speech to text to interact with their chat bot, and it converts in near real-time with very good accuracy. So I do know that things can be fast and accurate, but I wish it was in Speech Note. LOL - For now, I may just to STT through my phone on a shared document with my laptop. 
 



