Expand / A screen capture of AI-composed songs listed on Udio on April 10, 2024.

Benj Edwards

During the years 2002 to 2005, I managed a musical portal where guests could suggest track names that I would compose and record a ridiculous melody for. In the sleeve notes for my first album release in 2003, I talked about a time when computers might potentially outpace me, producing music automatically at a speed I couldn’t match. Even though I no longer actively share music on that platform, that era is almost upon us.

A new AI music-creation service called Udio was launched by a team of former DeepMind staff members. Udio has the capability to fabricate fresh top-quality musical sound from written cues, including lyrics provided by users. It’s akin to Suno, which we discussed previously. With crucial human involvement, Udio can emulate human-made music in styles like country, barbershop quartet, German pop, classical, hard rock, hip hop, show tunes, and more. It’s presently free to utilize during its beta phase.

Udio has also caused a stir among musicians on Reddit. As mentioned in our Suno piece, Udio is precisely the type of AI-driven music-generation service that had over 200 music artists worried when they signed an open letter of protest last week.

Although the initial impression of Udio’s songs may seem remarkable from a technical AI-generation perspective (not necessarily judged based on musical quality), its generation capability isn’t flawless. We tested its composing tool, and the outcomes appeared less impressive compared to those produced by Suno. The superb music samples showcased on Udio’s site likely resulted from a substantial amount of imaginative human input (like human-penned lyrics) and selecting the best sections of songs from numerous iterations. In fact, Udio outlines a five-step process to create a 1.5-minute track in an FAQ.

For instance, we crafted an Ars Technica “Moonshark” track on Udio utilizing the same cue as used previously on Suno. In its raw state, the results sound incomplete and somewhat nightmarish (here is the Suno version for comparison). By default, it’s also notably shorter at 32 seconds compared to Suno’s 1-minute and 32-second creation. However, Udio offers the option to extend tracks or generate a new outcome by altering the cues for diverse outputs.

Upon registering a Udio account, anyone can produce a track by submitting a textual prompt that may include lyrics, a narrative direction, and music genre tags. Udio then handles the task in two stages. Initially, it employs a large language model (LLM) akin to ChatGPT to generate lyrics (if required) based on the supplied prompt. Following this, it creates music using a technique undisclosed by Udio, but potentially a diffusion model, similar to Stability AI’s Stable Audio.

Based on the given cue, Udio’s AI model generates two unique song clips for you to choose from. Subsequently, you can share the track with the Udio community, download the audio or video file to share on other platforms, or instantly post it on social media. Other Udio users can also remix or enhance existing tracks. Udio’s terms of service state that the company does not lay claim to the musical creations and they can be used for commercial purposes.

While the Udio team has not disclosed the specific details of its model or training data (likely involving copyrighted material), they informed Tom’s Guide that the system incorporates mechanisms to recognize and block tracks resembling the work of specific artists too closely, guaranteeing the generated music maintains its originality.

This raises concerns among some humans who do not welcome the advent of AI-generated music. One Redditor commented in a thread about Udio, “Honestly, this is quite disheartening,” noted the individual. “I’m cautiously optimistic that music will endure somehow in the long run. But why pursue this avenue? Why automate art?”

We can surmise that replicating art is a pivotal target for AI exploration due to its ability to produce results that may be imperfect and imprecise, yet retain a remarkable or awe-inspiring quality, a key characteristic of generative AI. It’s flashy and visually appealing while allowing for a general absence of stringent analysis. AI’s encroachment into still images, video, and text has produced varying results in terms of representational accuracy. Fully composed musical recordings appear to be the next challenge for AI to tackle, and the competition is gradually intensifying.