Spotify Song Visualization
Spotify is a digital music service that provides access to millions of songs. One of the main reasons for Spotify’s rapid growth is its ability to provide personalized song recommendations that are often actually useful! There is a novel input that goes into their recommendation algorithm: a set of metrics based on the raw audio data from each track, such as danceability and energy. When a person tends to like songs with a specific set of audio properties, Spotify will recommend songs with similar properties.
The SongAttributes Excel file contains the set of metrics for approximately 225,000 songs on Spotify. (The full list of attributes and definitions is at the end of this document.) You will probably find this dataset overwhelming. There are so many possible questions to be asked and relationships to be explored. Let’s see if we can create visualizations in Tableau to reveal some insights about songs on Spotify.
For this activity, you will need to create several visualizations, as well as provide some written explanations. Start by opening Tableau and extracting the SongAttributes data.
1. (15 pts.) A few of the attributes in this dataset are straightforward physical measurements (e.g. duration_ms, tempo). However, most of them do not have such clear quantitative interpretations. Create a histogram of danceability to illustrate how frequently various levels of danceability are observed in the dataset, and set the bin size to 0.05.
2. (15 pts.) Create a scatterplot showing the relationship between valence and energy, where each point in the scatterplot represents one song. Change the Shape of the Marks to a solid circle, and decrease the Size of the Marks to make the individual points in the plot visible. Describe the relationship that you observe between these two attributes (30-50 words).
3. (20 pts.) It is possible that the relationship between valence and energy might differ by genre; the overall relationship in the scatterplot might not be true for each individual genre. Create a single scatterplot of valence and energy that includes only songs from the genres Ska and R&B, with a different color for each genre. (Use Genre as both a Filter and Color.) How does the relationship between valence and energy differ between these two genres?
4. (50 pts.) Create a dashboard that includes 4 charts, where each chart includes Genre and one measure. (Use all genres in each chart; do not filter by genre.) Choose measures and chart types that reveal differences across genres that you believe are noteworthy. Based on your visualizations, identify at least 3 genres that appear to be unique. Provide a brief explanation of how and why each of them is unique based on the attributes in the dataset.
Once you have finished the visualizations and written explanations, combine all of it into a Word file that includes the full names of all of your group members. Then, go to the Activities page on Blackboard and click on the Spotify activity. On the submission page, submit the Word file. Only one submission per group is needed.
Definitions of attributes (from Spotify):
acousticness: A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic.
danceability: Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.
duration_ms: The duration of the track in milliseconds.
energy: Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale.
instrumentalness: Predicts whether a track contains no vocals. “Ooh” and “aah” sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly “vocal”. The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content. Values above 0.5 are intended to represent instrumental tracks, but confidence is higher as the value approaches 1.0.
liveness: Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live.
loudness: The overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks. Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). Values typical range between -60 and 0 db.
speechiness: Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken words. Values between 0.33 and 0.66 describe tracks that may contain both music and speech, either in sections or layered, including such cases as rap music. Values below 0.33 most likely represent music and other non-speech-like tracks.
tempo: The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration.
valence: A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).
(the dataset also includes genre, artist name, track name, and track id)