Made by myself

TranscribeMe: How to create a speech recognition service

Tired of waiting for a convenient product to appear on the market that turns oral speech into written text, a native of Kiev, Aleksey Dunaev, invented the TranscribeMe service, which combines the program with real decryptors, and set about looking for customers.


Voice Decryption




He began to execute orders manually and in parallel with the team built the platform, focusing on the requests of existing users. After six months of work, Dunaev received $ 1 million of angelic investments and now hopes that in two years TranscribeMe's turnover will exceed $ 10 million.

start budget

$100 000



founder of TranscribeMe

How it all began

I was born in Ukraine, in childhood with my parents I moved to Russia, then - to England, studied in New Zealand with a degree in programming and management. I started doing startups while I was studying, 10 years ago. In the first two companies, my friends and I wrote programs for automating financial transactions, but the projects did not work out: the market was not yet ready. When the first startup failed, we took his main idea and made the second one based on it. Then, when the second failed, we again transformed the idea and made the third.

The third business was AXI Web Solutions, which provided web design services to large companies. I have grown and sold this business in two years. Then I was 27 years old, I came to the USA to get an MBA at Stanford under the Fulbright government grant program.

Finding a Niche

Over the past 20 years, the computer industry has promised us all the time that automatic voice recognition will become possible very soon: in five years, or in ten. My partner Igor Firer and I (co-founder of Green Light Energy Solutions, an American company for the development and production of waste processing equipment), and I decided to try to develop our own technology. The demand for transcription services in the United States is very large, the market size is approximately $ 5-6 billion. But the competition is high: these are small crowdsourcing companies Scribie, Verbalizelt, Rev, and such major players as GMR Transcription and TransPerfect.

There are different niches in this market. Nuance and Siri services use purely automated translations - they are cheap and fast, but of poor quality: if people speak with an accent or there are extraneous sounds, the transcription will be inaccurate. There are also people who decrypt records manually, but it is expensive and slow.

We came up with a hybrid approach - to use the latest voice recognition technologies simultaneously with the work of people: a computer decrypts speech with an accuracy of “from 0 to X%,” and then people bring this accuracy to 100%.

The most difficult thing is to find a market and determine customer needs. Any technical problem is solved, especially in the case of a hybrid approach, where part of the work is done by people

Project start

In June 2011, we started working on an idea: we wrote a business plan, began to check different options for the development of the company. Three months later, we officially registered her and assembled a team of smart and gambling people.

Then we did not know how all this would work technically and who our customers were. We understood that this is a profitable market and that existing solutions do not satisfy demand. We have compiled a list of 20 sectors of the economy in which our service can be used: education, the media, publishing, marketing. Then we started calling everyone who we thought needed her. In fact, we decided to sell the product, which at that time was not there. We called potential customers from San Francisco to London at three or four in the morning and said, "We have a product, would you like to buy it?" At first, we thought that our product would be useful to journalists, but quickly realized that they did not have money, especially the employees of large newspapers.

In addition, the process of typing for a journalist is very important: they think over ideas while deciphering the interview. But the marketers really liked it. Especially successful writers were also interested: in the usual way, they write four books a year, and if you dictate a text, then five.


We invited Viktor Obolonkin to the role of technical director. He created technology for tracking rare birds in the forest through sound analysis. Victor also became a co-founder. In the first three to four months, we carried out all orders manually: I, Igor and Vitya decrypted the notes ourselves. When we decided on the technology, we first partially, and then completely automated our processes, connected to the decryption platform.

Our platform is based on the patented TranscribeMe technology, which automatically cuts audio or video files received from the customer into fragments lasting 7-10 seconds, transcribes them using our engine and sends them to decryptors in different parts of the world. Several decryptors simultaneously listen to these fragments and check the text that the computer decrypted. For privacy reasons, the entire decryptor does not see the entire order.

Then the employees of the quality assessment department read the full text, and it is sent to the client. Due to the fact that hundreds of people work on one order at a time, it turns out quickly.

Decryptors are registered on our website, we check what they can do, and when the order arrives, we show them fragments of it. Minimum runtime is a few seconds.

We are not specifically looking for decryptors. They either come on the recommendations of other employees, or find out about us on the forums. There are many scammers in our industry who write: “Work from home and get paid,” but they don’t pay. For us, the main task is to pay on time. It all started when our first decryptor wrote on the forum: "TranscribeMe paid me on time, here is a screenshot from PayPal." Now more than 10,000 people in more than 70 countries are involved in decryption. Most of the decryptors we have in America, they work with English, Spanish and Chinese.

Prototype and product

There are three stages of company development: from an idea to a prototype, from a prototype to a product, from a product to the market. At the first stage, having only an idea in hand, we raised $ 100,000 - all who could give us money. But basically the money was mine and Igor. On them, by February 2012, we ourselves had created a prototype.

It worked slowly and only in Chrome, its main blocks were not interconnected, everything looked ugly, but it worked. We also spent this first money on marketing, advertising and the salary of the first programmer. He took our prototype and began to make a commercial product based on it.

In Eastern Europe, programmers said that they did not need options, but a large salary. It is like factory thinking: I come to work, work out a shift, go home

In the spring and fall of 2012 in the angelic round, we raised about $ 1 million. With this money, in a year we turned the prototype into a commercial product. Now we are at the stage of launching the product on the market. Our annual turnover is slightly more than $ 1 million.

No personal contacts affected the search for investors - they simply did not exist. I did not know almost any of my investors, although I was finishing Stanford. So we went the longest way: a business plan by e-mail, a selection committee, a 10-minute telephone conversation, an invitation to dinner, a presentation. Then investors ask you questions and make a decision.

There are two groups of investors in Silicon Valley: some want the company to get on its feet and grow, while others want it to raise investment, and then it was bought by a large company. We are now considering both directions.


The team now has 25 people, all work full-time. Our main office is in San Francisco - here we have sales and marketing departments. In Auckland (New Zealand) are the main business management and crowd management. We just opened an office in Minsk, now there are five developers working there. It was possible to hire programmers in California, but there is a problem of access to people. The best ones have to pay huge salaries - they all already work quietly in Google and Apple, where they are happy with everything. The same thing in Moscow and St. Petersburg.

Minsk seemed to us the most optimal of all the cities of the CIS for opening an office of a technology company. In Belarus, Russia and Ukraine it is very difficult to explain what a startup is. People have a completely different perception of risk. For example, in Eastern Europe, most of the programmers to whom we offered options said that they do not need them, but they need a big salary. It is like factory thinking: I come to work, work out a shift, go home.


We have both corporate and private customers, their ratio is approximately 60/40. We support English, Spanish, Portuguese, Chinese and Japanese. Russian and Ukrainian languages ​​are now in beta version - we carry out a small number of orders in these languages. Demand for transcription services in Russia is still being formed.

Deciphering one minute of audio recording costs $ 1 if the voice contains only one person, and $ 2 if two or more people participate in the recording. Orders are completed within two business days.


We will continue to develop our technology, we plan to add new languages ​​to the platform. We have many languages ​​in beta now. Whether they become products will depend on demand.

By 2016, we plan to bring our turnover to more than $ 10 million. In two or three years, when the value of the TranscribeMe brand grows, the company will either be bought or it will raise a VC round.


Must be borne in mind that investments in a new business will amount to three times more than you initially think.

To raise $ 100,000, it takes as much energy and effort as it takes to raise a million. Why spend time, which is not there, to raise $ 100,000, if you can immediately raise a million?

Author: Galina Shmeleva

Watch the video: Express Scribe Transcription Software Tutorial. Speech to Text (April 2020).

Popular Posts

Category Made by myself, Next Article

New Black Mirror: Pros and Cons
Movie Premieres

New Black Mirror: Pros and Cons

On December 28, Netflix released a new series of Black Mirror about a young programmer who in the early 80s makes a computer game with numerous endings and begins to realize that it is controlled by external forces. This is the first major release in which the platform’s interactive features are involved: part of the decisions for the hero are made by the viewer - and this determines how the series ends, and there may be several endings.
Read More
20 years later: What T2: Trainspotting is made of
Movie Premieres

20 years later: What T2: Trainspotting is made of

The long-awaited continuation of one of the main films of the 90s "On the Needle" is released, about the same guys from dysfunctional Edinburgh, who hunt with small criminal schemes. 20 years have passed since Mark Renton left his sidekicks and with the slogan "Choose Life" and a large bag of money, fled to the bright future.
Read More