Adobe experiments with VoCo to ‘Photoshop’ voice and speech

November 9, 2016

Last Friday at its annual MAX conference, Adobe presented VoCo, an experimental software project to edit voice recordings and speech just like in Photoshop. The developer Zeyu Jin showcased how VoCo works using a voice track of Keegan-Michael Key.

The VoCo demo was part of the Sneak Peeks section of this year’s Adobe MAX conference. Adobe gave glimpses of working projects like Stylit for style transfer in artistic design software and CloverVR for editing VR video footage directly.

Adobe also exhibited new additions to its premium suite of software applications on the Creative Cloud. The company showcased Project Felix for 3D image editing and Adobe Sensei for AI management across flagship brand platforms.

VoCo lets you edit what people say to make them say new things

During the Sneak Peeks panel co-hosted by Jordan Peele of ‘Key and Peele,’ the comedian introduced the Adobe developer Zeyu Jin for him to demonstrate how VoCo works and what it can do.

To do this, Jin used a sample audio from Keegan-Michael Key where he talks about his reaction after being nominated for an Emmy. The developer noted he was going to play with Key’s words to make him say something different.

Now, Adobe knows users are used to Photoshop’s intuitive and straightforward interface for photo editing, but what about audio? Sound waves are not exactly the friendliest thing to display and manage on a screen, so the company developed a new system for making this process easier.

Zeyu Jin showed how VoCo’s interface allows users to see the sound waves of the audio clip at the top of the program’s interface and a text transcription of what is said at the bottom. Image Source: Adobe

VoCo users need only to copy and paste words to rearrange them in the order they want the speaker to say it on the track. Moreover, they can add more words to create new small phrases even if they are not contained in the recording.

VoCo’s applications raise concerns among users

VoCo’s convincing ability to generate new words and phrases based on the speaker’s speech patterns is certainly incredible, but to some, it is also worrying.

The implications of technology such as this could lead users to use VoCo for questionable purposes. Manipulated recordings edited in VoCo could pose significant security threats to voice-activated services and platforms, and it could result in cases of identity theft and forged testimonies.

Adobe is aware of this, and it addressed this issue during the showcase at Adobe MAX. Image Source: MAX

Zeyu Jin said the development team has taken into account these concerns and has included security measures like watermarking to prevent potentially harmful results.

The developer further reaffirmed the audience by claiming the final product would have mechanisms for voice detection to distinguish fake productions from real ones.

Adobe’s VoCo is still a work in development alongside a team from Princeton University, and the company has not yet disclosed a possible release date or price for the complete software.

Source: Adobe