Using for extracting the text from an audio file or video

  • Posted on: 9 May 2024
  • By: ocarcamob

I wanted to prepare a video based on a situation or problem I faced with Moodle. When I wanted to log in into my blended learning Moodle website, I was given this message at the entrance page call “Error for site owner: invalid domain site key”. This was a very worrying problem and I decided to prepare a video for explaining the solution to others in case they had the same problem.

As a first step, I decided to prepare a draft video as a way of brainstorming ideas for a definitive video script.

Then I had to extract the text of the video draft and begin a process of bettering the text and preparing a better script for the definitive version of the video.

This is the method I used to follow in past videos. First, I filmed the video with a video camera or cellphone. Then, I opened the video in the program CapCut and had it to create the automatic captions. After that, I exported the automatic captions in .srt format to work with them in other programs such as Adobe Premier. I have to recognize that CapCut has a very powerful speech to text recognition technology.

The problem

In this opportunity, April 2024, the program CapCut does not allow automatic captions anymore in the free version, just in the paid one, the Pro version. Since I do not perform this activity frequently, I thought it was not worthy for me to pay for a subscription to the program. Consequently, I had to look for another option for extracting the captions from the speech of the video.

I did a Google search and then I found that Microsoft has a free video software program that automatically prepares or extracts the captions without any problem, without demanding a paid subscription. The name of the program is Clipchamp. You can download it for free.

Clipchamp actually extracted the captions automatically and allowed me to export them in a .srt file. However, as any subtitle file, the .srt file comes with all the timings attached above each text line. This fact makes it very difficult to process the extracted text. I tried to extract just the text, not the time marks, but this was a very lengthy process of deleting every line of time marks.

Then I opened the .srt file in the program Aegisub in order to export the captions as text without the time marks. Remember that by this procedure I wanted to obtain a first draft text for making a better video script. However, although Aegisub allows to export as text, the exported text came with all the time marks I didn’t want.

The solution

I came up with a solution for this problem by doing the following. I opened the video in Adobe Premier. Then, from inside Adobe Premiere, I exported the audio of the video to Adobe audition. Once in Adobe audition, I exported the audio content file into mp3 format.

Then I looked for a speech to text page online. There are many paid services for carrying out speech to text transcriptions. I didn’t want to pay because, I have to admit it, this work I am doing does not produce any money to me. It is aimed at being posted in social media with no money back interest, just for sharing ideas and that is why it is not worthy investing money in an expensive subscription. In other words, I needed a free speech to text page.

I googled again and I found

This page offers very clean and precise speech to text transcriptions. It is a paid service of course, but it allows you to produce up to 3 free transcriptions every day. This was enough for me, a professional who doesn’t make money out speech to text works, a person who needs text of the speech technology just for fun and for editing and posting videos in social networks.

You can log in at using your Google account. Then you will have to drag and drop your audio file and TurboScribe will generate the text of your audio.


I wanted to extract the text of a draft video I filmed for generating ideas in a kind of brainstorming exercise before preparing for a more formal video. Also, I wanted to have that video draft to check things like posture, facial expressions and even lightning or light in the video, background, and so on.
I managed to extract the text contained in the captions of my first video draft by using page. Then, based on that text, I prepared a better and definitive script for my video which, I think, I am going to call “A problem in Moodle: 'Error for site owner: invalid domain site key'."