Generating AI music for videos to help content creators find the right audio-fit
My Role
Design Researcher & Product Designer - Primary+Secondary Research, UX Research, Visual Design, User Flows, Prototyping, User Testing
Mentors
Jim Wicks
Avani Agarwal
Amy Schwartz
Timeline
6 months, Sep 2023 - Mar 2024
Background
For my graduation thesis for my M.Sc in Product Design at Northwestern University - I designed muse.studio, a mobile app to help content creators quickly generate high-quality, AI-powered music tailored to their videos. By eliminating the frustration of searching for the perfect soundtrack, muse.studio provides creators with customized soundtracks that match the mood and style of their content, all while maintaining affordability and ease of use.
Over a six-month design thinking process, I worked closely with mentors from Google and Sony to conduct extensive user research, prototype iterations, and usability testing. This approach allowed me to refine the app’s functionality and ensure it effectively meets the needs of aspiring content creators while simplifying the music creation process.
The app’s AI technology allows creators to input natural language descriptions, producing professional-grade music in seconds, making it an ideal solution for creators looking to enhance their content without the need for musical expertise.
SHOWCASE
One Platform, Endless Possibilities: Content Creation Powered by AI Music Generation
0.1
Landing and New Project
0.2
Prompting and Specification - Generating Tracks
0.3
Switching Video Timeline and Browsing Experience
0.4
Customizing Tracks and Recomposing
CONTEXT
Setting the stage
Why does music creation have to be so difficult?
Inspiration
Have you ever tried to learn how to make music? It’s really hard! Unless of course, you are a musical prodigy, which I am certainly not...
As a creative individual who has dabbled in many forms of art like sketching, painting, dancing and even theatre, I always wanted to learn how to make music, but I failed miserably at it - and that’s because in order to create music, you need three important things.
Need to have knowledge of Music Theory
Have to learn DAW (Digital Audio Workstation) Software
And it takes lot of time!
GOAL
I wanted to make music creation Enjoyable, Easy and Efficient
Why not? Why does music creation have to be limited to professionals? A lot of people might want to make music but be constrained in terms of knowledge, skill, and time!
As a designer, I strongly believe in breaking constraints and helping people achieve their goals by empowering them with the meaningful tools.
Q. But, who might even need help with music production everyday?
A. Content Creators
As I started delving deeper into the music production space, I came across this specific profession in general needing help with music almost everyday.
Since they aren't exactly musicians they find it rather annoying to find music/sound effects for their content.
Travel Vloggers
Podcasters
Short-film Makers
Art/Film Students
Social Media Influencers
THE RESEARCH
I found design energy here
Opportunity
Content creation is really hot at the moment! Moreover, almost 75% of the market is owned by giants like Youtube, Instagram and Tiktok (so this space can do with a little competition).
There are about 207 million content creators worldwide right now.
The creator economy market size is currently estimated to be $250 Billion.
So I talked to a bunch of content creators...
In order to really understand the music needs of content creators, I needed to get down to the root level and discuss the workflow of different kinds of creators.
32 content creators responded to my survey
I interviewed 9 content creators one-on-one
Also did secondary research through social listening on social media like Reddit, X, Youtube etc.
Here's what I found
Even though there are millions of content creators, not all become successful overnight.
My research informed me that the path to success for a content creator starts with a lot of limitations.
Most identify as aspiring creators who have little to no income
They often work alone and on a low-budget as a side hustle
They frequently shoot content on their phone due to the lack of adequate gear
THE ROLE OF MUSIC IN CONTENT CREATION
“Music defines the pulse of a scene”
Simone, 22, Amateur Filmmaker
Music plays an indispensable role in enhancing video content by creating emotional impact and depth of meaning. Most content creators really value making good content. They believed that good music helps create good content.
But...
The entire reddit community is raging about finding and editing audio for content...
2.1
Social Listening - Issues faced by Redditors
THE PROBLEM
Finding the root-cause
The pursuit of finding "good music"
Passionate content creators actually go to great lengths to find the right music for their scenes.
However, there are significant challenges that make this process complex.
3.1
Journey-map of content creation - musical perspective
Insights
Everyone seems to be having similar issues with the whole process. So what exactly is the problem here?
Music search engines don’t understand human needs
People want human comprehension instead of describing their content in musical terms. But search engines don't handle human language description for music well enough.
For example: They might need something like -
"Intro music for a climate change podcast"
Custom-made audio is too expensive for amateur creators.
Royalty-free music is not customized for their content - but hiring a music producer is nor viable.
“when you're doing something as low budget as I am, you have to let it go and settle for whatever you find”
Bishshayon, 23, aspiring film-maker
Browsing and sampling different kinds of free-music is a super lengthy process.
Since there is no way to know how much a sound-bite match your needs, it's impossible to determine without listening to it and comparing it with your video to check how it fits the vibe.
“It takes me anywhere between a few hours to multiples days to find the right music for a 10 min video”
Bridget, 22, podcaster
Music editing software have a very steep learning curve
Since this how a music editing software looks like - it deters any content creator from making their own music which requires editing skills like sound-mixing.
SPECIFYING THE PROBLEM
Finding the music or sound effects that fits your content perfectly is really difficult to acquire. It requires patience, music editing, and hours of searching!
Audio-fit, therefore, means finding the right sound/music, customized for you video specifically.
“Finding an audio-fit for content is honestly a pain in the a** ”
Laurie, 26, Journalist & Video Editor
PROBLEM SOLVING
Uncovering requirements
Design choices
Based on hours of collaborative brainstorming with Research Participants on possible design directions - I finally found what are the fundamental design decisions I need to make for a solution that addresses their needs-
Mobile-first Approach
To enable creation on-the-go and support aspiring creators with limited gear.
All in One Solution
That combines music creation with video editing to eliminate learning multiple products.
Generative AI Music
A mechanism that understands human descriptions of music needs to achieve audio-fit.
Collaboratively shaping the experience and user-flow with participant feedback
Setup video environment
Creators are used video editing apps like Adobe Rush, Filmora, Instagram. Keeping the UI similar - enabled with basic video editing - the focus was define the Music Generation interaction.
4.1
New project setup
Users prefer an indicators that keeps track of the order of the videos uploaded.
Creators usually work with several layers of video clips. The "add audio" CTA will be out of view if pushed down by the video layers. It would be better to separate video and audio layers into two separate collapsable layers.
Prompting for specific audio
The backbone of the app was the actual prompting and generation of specific AI music. This needed a well thought out interaction of its own.
4.2
Lack of prompt education
While there are instructions to write a prompt to generate music, this interface doesn't actually educate the user on how to write a prompt. Having a structure to help all users get started was required.
Browsing and customization
A behavior that is universal among all content creators is to repeatedly play different tracks to identify which one suits their content the most before going into editing chosen track.
4.3
Missing video playback experience
One thing I discovered during experimenting with wireframes is that it would be quite beneficial for creators to have video playback along with their generated tracks to have both audio and visual feedback to make better and faster judgements.
In order to fit a video preview, the customization panel needed to be accommodated somewhere else. The most appropriate solution was to present the customization options in a scrollable bottom nav instead.
Track placement and layering
Once the desired track is generated - it was best to treat it just like another layer to help users place it anywhere they want.
4.4
Viewport constraints
The problem however was that due to the limited screen size and the low RAM on mobile devices, it is hard to deal with several different layers, making the audio/video manipulation a rigid and slow experience.
INTERACTIVE PROTOTYPE
The following section is loading figma prototypes and AI generated music, please be patient if things take time!
Put on some earphones and just click through...
Generating AI music with prompts
Prompt
A dark shady street, raining, rising suspense, eerie and funky hip hop style
Look out for
-Mood and genre: Suspenseful, hip-hop style
- Sound effects: Rain + dog barking
5.1
Prompt - Music Generation
Recompose generated music with customizations for audio-fit
Customizations
- Increase tempo
- Add drums
- Add base
- Change barking sound effect
5.2
Customize AI track
REFLECTION
Thought ahead of the big players
Lots of players in the race for AI powered content, but I designed it before them
Everyone seems to be having similar issues with the whole process. So what exactly is the problem here?
6.1
Differentiation table
Generating AI sounds with Meta's Audiocraft API
Taking wizard of oz prototyping quite seriously - in order to create a realistic experience for user-testing, I generated my own AI music variations with Meta's AI music creation tool.
I merged different tracks and sound effects on Audacity to create the initial and final versions after customizations.
This helped me work under time constraints to create a working prototype.
6.2
Audiocraft AI API Interface
Ensuring multi-platform design
While mobile video editing is becoming a preferred user preference for Gen-Zs but the tech available to us is not ideal - specifically due to -
Limited screen size - making it hard to keep track
Low processing power - AI processing takes up a lot of RAM
Any piece of creative work requires less clutter and more flexibility. It needs to be accessed on multiple end points and should be saved in the cloud.