In this project, we want to use Billboard's Top 100 Songs from 1950-2012 to create a topic modeling algorithm to group songs by theme. From there, we want to analyze the lyrics of songs to find popular themes or genres. We ultimately want to use this information to determine the negative or positive connotation of songs and look at these trends over time.
For the final Milestone project, Sarah Fox and Emily Powers will be working together. In order to collaborate, we will use Google Collab to simultaneously work on code and then download it as an ipynb file for submission. We plan to meet once a week to discuss progress and then meet over Zoom as needed. We will also schedule our meeting close to office hours so we can quickly address issues if they arise.
The dataset we are interested in is a collection of song lyrics from the Billboard Top 100 songs from 1950 to 2012. Link to Dataset
We think this is an unique dataset because song data is usually related to its success like tracking its ranking on the Billboard chart; however, this data examines the content of the songs. Music has changed so much over that time period that it would be interesting to compare summary statistics on word usage in the 1950s and the 2000s. Through Natural Language Processing, we can find the most popular words overall and per decade. For our main question, we want to design a topic modeling algorithm to cluster each song by theme. This will tell us the most popular theme and expose other underlying genres or associations. Finally, the same algorithm can be used for sentiment analysis to classify songs as negative or positive. Our initial prediction is that songs have gotten more negative over time, but our project will give us a better understanding of the songs content.
We loaded in the Billboard Top 100 dataset and faced only one minor issue. The function read_csv identified 7 columns instead of 4 so we had to drop those columns which contained no data. The dataset contains 4 columns of Year, Artist, Song Name and Lyrics with 5148 rows. This is a great dataset for analysis because it is large and the lyrics column will be great for NLP analysis.
We want to use this data with Song Name, Arist, and Lyrics to find trends in these things over time. We want to explore average word counts and most common words over time to be able to create a NLP model of these lyrics and their meanings in songs. We also want to potentially explore more about the artist and verifying the Billboard rating of artists compared to other factors that could determine an artists success.
Our hypothesis is that songs have gotten more negative in sentiment over time.
pip install nltk
Collecting nltk Downloading nltk-3.6.5-py3-none-any.whl (1.5 MB) |████████████████████████████████| 1.5 MB 642 kB/s eta 0:00:01 Requirement already satisfied: click in /opt/conda/lib/python3.9/site-packages (from nltk) (8.0.1) Requirement already satisfied: tqdm in /opt/conda/lib/python3.9/site-packages (from nltk) (4.62.2) Collecting regex>=2021.8.3 Downloading regex-2021.11.10-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (763 kB) |████████████████████████████████| 763 kB 2.7 MB/s eta 0:00:01 Requirement already satisfied: joblib in /opt/conda/lib/python3.9/site-packages (from nltk) (1.0.1) Installing collected packages: regex, nltk Successfully installed nltk-3.6.5 regex-2021.11.10 Note: you may need to restart the kernel to use updated packages.
pip install --upgrade gensim
Collecting gensim Downloading gensim-4.1.2-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (24.0 MB) |████████████████████████████████| 24.0 MB 290 kB/s eta 0:00:01 |███▎ | 2.5 MB 2.0 MB/s eta 0:00:11 Collecting smart-open>=1.8.1 Downloading smart_open-5.2.1-py3-none-any.whl (58 kB) |████████████████████████████████| 58 kB 6.5 MB/s eta 0:00:01 Requirement already satisfied: scipy>=0.18.1 in /opt/conda/lib/python3.9/site-packages (from gensim) (1.7.1) Requirement already satisfied: numpy>=1.17.0 in /opt/conda/lib/python3.9/site-packages (from gensim) (1.20.3) Installing collected packages: smart-open, gensim Successfully installed gensim-4.1.2 smart-open-5.2.1 Note: you may need to restart the kernel to use updated packages.
pip install -U spacy
Collecting spacy Downloading spacy-3.2.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.1 MB) |████████████████████████████████| 6.1 MB 2.2 MB/s eta 0:00:01 |█████████▋ | 1.8 MB 1.5 MB/s eta 0:00:03 Collecting wasabi<1.1.0,>=0.8.1 Downloading wasabi-0.9.0-py3-none-any.whl (25 kB) Requirement already satisfied: packaging>=20.0 in /opt/conda/lib/python3.9/site-packages (from spacy) (21.0) Requirement already satisfied: numpy>=1.15.0 in /opt/conda/lib/python3.9/site-packages (from spacy) (1.20.3) Requirement already satisfied: requests<3.0.0,>=2.13.0 in /opt/conda/lib/python3.9/site-packages (from spacy) (2.26.0) Collecting pydantic!=1.8,!=1.8.1,<1.9.0,>=1.7.4 Downloading pydantic-1.8.2-cp39-cp39-manylinux2014_x86_64.whl (11.3 MB) |████████████████████████████████| 11.3 MB 9.0 MB/s eta 0:00:01 Requirement already satisfied: jinja2 in /opt/conda/lib/python3.9/site-packages (from spacy) (3.0.1) Collecting spacy-loggers<2.0.0,>=1.0.0 Downloading spacy_loggers-1.0.1-py3-none-any.whl (7.0 kB) Collecting typer<0.5.0,>=0.3.0 Downloading typer-0.4.0-py3-none-any.whl (27 kB) Collecting thinc<8.1.0,>=8.0.12 Downloading thinc-8.0.13-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (635 kB) |████████████████████████████████| 635 kB 11.7 MB/s eta 0:00:01 Collecting murmurhash<1.1.0,>=0.28.0 Downloading murmurhash-1.0.6-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (21 kB) Collecting srsly<3.0.0,>=2.4.1 Downloading srsly-2.4.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (452 kB) |████████████████████████████████| 452 kB 14.7 MB/s eta 0:00:01 Requirement already satisfied: tqdm<5.0.0,>=4.38.0 in /opt/conda/lib/python3.9/site-packages (from spacy) (4.62.2) Collecting langcodes<4.0.0,>=3.2.0 Downloading langcodes-3.3.0-py3-none-any.whl (181 kB) |████████████████████████████████| 181 kB 8.8 MB/s eta 0:00:01 Collecting blis<0.8.0,>=0.4.0 Downloading blis-0.7.5-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.9 MB) |████████████████████████████████| 9.9 MB 3.8 MB/s eta 0:00:011 Collecting pathy>=0.3.5 Downloading pathy-0.6.1-py3-none-any.whl (42 kB) |████████████████████████████████| 42 kB 1.4 MB/s eta 0:00:01 Collecting catalogue<2.1.0,>=2.0.6 Downloading catalogue-2.0.6-py3-none-any.whl (17 kB) Collecting spacy-legacy<3.1.0,>=3.0.8 Downloading spacy_legacy-3.0.8-py2.py3-none-any.whl (14 kB) Requirement already satisfied: setuptools in /opt/conda/lib/python3.9/site-packages (from spacy) (58.0.4) Collecting cymem<2.1.0,>=2.0.2 Downloading cymem-2.0.6-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (35 kB) Collecting preshed<3.1.0,>=3.0.2 Downloading preshed-3.0.6-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (128 kB) |████████████████████████████████| 128 kB 14.0 MB/s eta 0:00:01 Requirement already satisfied: pyparsing>=2.0.2 in /opt/conda/lib/python3.9/site-packages (from packaging>=20.0->spacy) (2.4.7) Requirement already satisfied: smart-open<6.0.0,>=5.0.0 in /opt/conda/lib/python3.9/site-packages (from pathy>=0.3.5->spacy) (5.2.1) Requirement already satisfied: typing-extensions>=3.7.4.3 in /opt/conda/lib/python3.9/site-packages (from pydantic!=1.8,!=1.8.1,<1.9.0,>=1.7.4->spacy) (3.10.0.0) Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.9/site-packages (from requests<3.0.0,>=2.13.0->spacy) (2021.5.30) Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.9/site-packages (from requests<3.0.0,>=2.13.0->spacy) (3.1) Requirement already satisfied: charset-normalizer~=2.0.0 in /opt/conda/lib/python3.9/site-packages (from requests<3.0.0,>=2.13.0->spacy) (2.0.0) Requirement already satisfied: urllib3<1.27,>=1.21.1 in /opt/conda/lib/python3.9/site-packages (from requests<3.0.0,>=2.13.0->spacy) (1.26.6) Requirement already satisfied: click<9.0.0,>=7.1.1 in /opt/conda/lib/python3.9/site-packages (from typer<0.5.0,>=0.3.0->spacy) (8.0.1) Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/lib/python3.9/site-packages (from jinja2->spacy) (2.0.1) Installing collected packages: murmurhash, cymem, catalogue, wasabi, typer, srsly, pydantic, preshed, blis, thinc, spacy-loggers, spacy-legacy, pathy, langcodes, spacy Successfully installed blis-0.7.5 catalogue-2.0.6 cymem-2.0.6 langcodes-3.3.0 murmurhash-1.0.6 pathy-0.6.1 preshed-3.0.6 pydantic-1.8.2 spacy-3.2.1 spacy-legacy-3.0.8 spacy-loggers-1.0.1 srsly-2.4.2 thinc-8.0.13 typer-0.4.0 wasabi-0.9.0 Note: you may need to restart the kernel to use updated packages.
import pandas as pd
import string, re, nltk, gensim
nltk.download('stopwords')
import spacy
import numpy as np
from gensim.models import CoherenceModel, Phrases
url = 'https://raw.githubusercontent.com/sfox2819/DSMilestone/main/Billboard_Top_100_Data.csv'
df = pd.read_csv(url)
df = df.drop(columns= ['Unnamed: 4', 'Unnamed: 5', 'Unnamed: 6'])
df["words"] = df["Lyrics"].str.split(" ")
[nltk_data] Downloading package stopwords to /home/jovyan/nltk_data... [nltk_data] Unzipping corpora/stopwords.zip.
display(df)
Year | Artist | Song Name | Lyrics | words | |
---|---|---|---|---|---|
0 | 1950 | Gordon Jenkins | Goodnight Irene | Gordon Jenkins Miscellaneous Goodnight Irene G... | [Gordon, Jenkins, Miscellaneous, Goodnight, Ir... |
1 | 1950 | Nat King Cole | Mona Lisa | Mona Lisa Mona Lisa men have named you Youre s... | [Mona, Lisa, Mona, Lisa, men, have, named, you... |
2 | 1950 | Anton Karas | Third Man Theme | When a zither starts to play youll remember ye... | [When, a, zither, starts, to, play, youll, rem... |
3 | 1950 | Gary | Sam’s Song | Ah heres a happy tune youll love to croon They... | [Ah, heres, a, happy, tune, youll, love, to, c... |
4 | 1950 | Gary | Simple Melody | Wont you play some simple melody Like my mothe... | [Wont, you, play, some, simple, melody, Like, ... |
... | ... | ... | ... | ... | ... |
5143 | 2012 | Rihanna | Diamonds | Shine bright like a diamond Shine bright like ... | [Shine, bright, like, a, diamond, Shine, brigh... |
5144 | 2012 | Miguel | Adorn | These lips cant wait to taste your skin baby n... | [These, lips, cant, wait, to, taste, your, ski... |
5145 | 2012 | Jason Aldean | Fly Over States | A couple guys in first class on a flight From ... | [A, couple, guys, in, first, class, on, a, fli... |
5146 | 2012 | Eli Young Band | Even If It Breaks Your Heart | Way back on the radio dial a fire got lit insi... | [Way, back, on, the, radio, dial, a, fire, got... |
5147 | 2012 | Linkin Park | Burn It Down | The cycle repeated as explosions broke in the ... | [The, cycle, repeated, as, explosions, broke, ... |
5148 rows × 5 columns
We first thought it would be interesting to explore the average number of words per song over time. We found that the average number of words peaked in the early 2000s and has steeply declining since. This trend could signify that songs have gotten longer over time or specifc genres of music where words are sang faster, such as rap, could've gained in popularity around this peak time.
pd.options.mode.chained_assignment = None
df["NumWords"] = 0
for index, row in df.iterrows():
df["NumWords"][index] = len(row["words"])
df.groupby("Year")["NumWords"].mean().plot.line(ylabel="Average Number of Words per Song")
<AxesSubplot:xlabel='Year', ylabel='Average Number of Words per Song'>
We also wanted to explore who the best artists were. We counted the number of times an artist has a song in the top 100 and selected the top 10. With this information, we could potentially match to top selling artists during the same time period to see if the Billboard rankings match up with the top sales. We could take a direction to verify the validity of the Billboard rankings based on artists other achivements whether it be sales, awards, streams, etc. to see if the Billboard rankings have valuable meaning behind them.
top_artists = df["Artist"].value_counts().head(10)
top_artists.plot.bar(title="Top 10 Artists", xlabel="Artist Name", ylabel="Number of Songs in Top 100")
<AxesSubplot:title={'center':'Top 10 Artists'}, xlabel='Artist Name', ylabel='Number of Songs in Top 100'>
If we pull in a list of Top 10 Best-Selling Artists we can see how well the rankings match up
source: https://ledgernote.com/blog/interesting/best-selling-artists-of-all-time/
best_selling = ["Beatles", "Elvis Presley", "Michael Jackson", "Elton John", "Madonna", "Led Zeppelin", "Rihanna", "Pink Floyd", "Eminem", "Taylor Swift"]
match_count = 0
for i in best_selling:
if i in top_artists:
print(i)
match_count += 1
print(match_count)
Beatles Elvis Presley Michael Jackson Elton John Madonna Rihanna 6
We can see that 6/10 of the Top Artists match the Billboard Rankings. The artists the matched printed above. It seems the rankings are not based solely on sales which is expected, so the question becomes what other factors could we pull in to match these rankings better?
Another avenue we wanted to explore was what songs were in the top charts over multiple years. This could give us insight to their popularity over fad songs that are only popular for a year. Are these songs considered to be classics? Should they be considered classics just because of their continuous ranking or do we have a more culture lens over "classic" songs?
df["Song Name"].value_counts().head(10).plot.bar(title="Top Songs on Chart over Multiple Years", xlabel="Song Name", ylabel="Number of Years on Chart")
<AxesSubplot:title={'center':'Top Songs on Chart over Multiple Years'}, xlabel='Song Name', ylabel='Number of Years on Chart'>
We wanted to explore the most frequently occuring words in songs to set us up for NLP where we can begin to make conclusions about the themes behind these words. We removed common filler words (stopwords) that have little to no impact on determining the context of the lyrics to reduce the corpus size. We are looking for words with more theme significance or meaning behind them to be able to draw conclusions on song themes over time.
sources: https://stackoverflow.com/questions/22219004/how-to-group-dataframe-rows-into-list-in-pandas-groupby
https://stackoverflow.com/questions/952914/how-to-make-a-flat-list-out-of-a-list-of-lists
https://docs.python.org/3/library/collections.html#collections.Counter
https://pandas.pydata.org/pandas-docs/stable/user_guide/options.html
import itertools
from collections import Counter
words_per_year = df.groupby("Year")["words"].apply(list).to_frame()
words_per_year["all words"] = ""
words_per_year["most popular"] = ""
stopword = nltk.corpus.stopwords.words('english')
#Words that don't give valuable insights
other_stop = ["im", "like", "oh", "yeah", "know", "la", "ill", "dont", "da"]
for index,row in words_per_year.iterrows():
flat_list = list(itertools.chain.from_iterable(row["words"]))
#Remove common words with no contextual meaning (stopwords)
flat_list = [word for word in flat_list if word.lower() not in stopword]
flat_list = [word for word in flat_list if word.lower() not in other_stop]
words_per_year["all words"][index] = flat_list
#Find most used words
counter = Counter(row["all words"]).most_common(5)
words_per_year["most popular"][index] = counter
pd.set_option("max_colwidth", 400)
words_per_year["most popular"].to_frame()
most popular | |
---|---|
Year | |
1950 | [(love, 39), (boom, 38), (heart, 33), (goose, 29), (never, 20)] |
1951 | [(love, 58), (dab, 42), (boom, 38), (heart, 34), (truly, 29)] |
1952 | [(poke, 25), (love, 23), (heart, 22), (slow, 21), (little, 20)] |
1953 | [(love, 39), (heart, 20), (tell, 19), (go, 15), (story, 15)] |
1954 | [(love, 38), (boom, 30), (heart, 27), (loves, 25), (house, 23)] |
... | ... |
2008 | [(br, 363), (got, 317), (love, 290), (cant, 201), (wanna, 179)] |
2009 | [(love, 296), (go, 217), (got, 194), (get, 187), (wanna, 168)] |
2010 | [(love, 368), (baby, 246), (got, 204), (say, 177), (want, 172)] |
2011 | [(go, 234), (baby, 209), (got, 199), (love, 188), (get, 181)] |
2012 | [(baby, 274), (love, 264), (never, 161), (one, 155), (go, 137)] |
63 rows × 1 columns
We can see that the word "love" appears a lot. For further analysis, let's track its usage over time. In order to account for the increase in average song length over time, we used the proportion of the count of occurrences of 'love' out of the total number of lyrics for the year.
pd.options.mode.chained_assignment = None
words_per_year["love_present"] = False
words_per_year["love_prop"] = 0.0
for index,row in words_per_year.iterrows():
for i in row["most popular"]:
if i[0] == "love":
words_per_year["love_present"][index] = True
#Find proportion to account for change in song length over time
words_per_year["love_prop"][index] = i[1] / len(row["all words"])
#lambda func, apply func
words_per_year[words_per_year["love_present"] == True]["love_prop"].plot.line(ylabel = "Proportion of Occurances of 'Love'")
<AxesSubplot:xlabel='Year', ylabel="Proportion of Occurances of 'Love'">
We can see that the usage of the word love peaked around 1980 but was at an all time low in the early 2000s. This is interesting because song length was the highest in the early 2000s, so there being such a low proportion of "love" is signifcant.
With this data, we plan to use NLP to build our sentiment analysis. Sentiment analysis identifies the positivity or negativity of lyrics which can answer the question if sounds have gotten more negative over time. Each word is classfied as positive (good i.e. love) or negative (bad i.e. death) which can be used to predict the feeling of the song.
We used the Natural Language Toolkit (nltk) to run this experiment, sepcifically the Sentiment Intensity Analyzer.
To quantify the sentiment analysis we used VADER (Valence Aware Dictionary and Sentiment Reasoner) scores in conjunction with the Sentiment Intensity Analyzer. This method returns a dictionary with a positive, negative, netural, and compound score (normalization of positive, negative, and neutral scores) for each song. We used compound score which gives overall sentiment as well as the intensity of the emotion. The way songs are classfied are as follows:
Positive if compound >= 0.5
Neutral if -0.5 < compound < 0.5
Negative if -0.5 >= compound
source: https://www.analyticsvidhya.com/blog/2021/06/rule-based-sentiment-analysis-in-python/
from nltk.sentiment.vader import SentimentIntensityAnalyzer
import nltk
nltk.download('vader_lexicon')
analyzer = SentimentIntensityAnalyzer()
df['Vader Score'] = ""
df['Sentiment'] = ""
def vader(text):
score = analyzer.polarity_scores(text)
return score['compound']
def sentiment(compound):
if compound >= 0.5:
return 'Positive'
elif compound <= -0.5 :
return 'Negative'
else:
return 'Neutral'
df['Vader Score'] = df['Lyrics'].apply(vader)
df['Sentiment'] = df['Vader Score'].apply(sentiment)
df.head(10)
[nltk_data] Downloading package vader_lexicon to [nltk_data] /home/jovyan/nltk_data...
Year | Artist | Song Name | Lyrics | words | NumWords | Vader Score | Sentiment | |
---|---|---|---|---|---|---|---|---|
0 | 1950 | Gordon Jenkins | Goodnight Irene | Gordon Jenkins Miscellaneous Goodnight Irene Goodnight Irene Gordon Jenkins his orchestra The Weavers chorus Irene goodnight Irene goodnight Goodnight Irene Goodnight Irene Ill see you in my dreams Last Saturday night I got married Me and my wife settle down Now me and my wife are parted Im gonna take another stroll in town repeat chorus Sometimes I live in the country Sometimes I live in town... | [Gordon, Jenkins, Miscellaneous, Goodnight, Irene, Goodnight, Irene, Gordon, Jenkins, his, orchestra, The, Weavers, chorus, Irene, goodnight, Irene, goodnight, Goodnight, Irene, Goodnight, Irene, Ill, see, you, in, my, dreams, Last, Saturday, night, I, got, married, Me, and, my, wife, settle, down, Now, me, and, my, wife, are, parted, Im, gonna, take, another, stroll, in, town, repeat, chorus,... | 92 | 0.4939 | Neutral |
1 | 1950 | Nat King Cole | Mona Lisa | Mona Lisa Mona Lisa men have named you Youre so like the lady with the mystic smile Is it only cause youre lonely they have blamed you For that Mona Lisa strangeness in your smile Do you smile to tempt a lover Mona Lisa Or is this your way to hide a broken heart Many dreams have been brought to your doorstep They just lie there and they die there Are you warm are you real Mona Lisa Or just a c... | [Mona, Lisa, Mona, Lisa, men, have, named, you, Youre, so, like, the, lady, with, the, mystic, smile, Is, it, only, cause, youre, lonely, they, have, blamed, you, For, that, Mona, Lisa, strangeness, in, your, smile, Do, you, smile, to, tempt, a, lover, Mona, Lisa, Or, is, this, your, way, to, hide, a, broken, heart, Many, dreams, have, been, brought, to, your, doorstep, They, just, lie, there,... | 145 | 0.8638 | Positive |
2 | 1950 | Anton Karas | Third Man Theme | When a zither starts to play youll remember yesterday in its haunting strain vienna lives again free and bright and gay in your mind a sudden gleam of a half forgotten dream seems to glimmer when you hear the third man theme Once again there comes to mind someone that you left behind love that somehow didnt last in that happy city of the past does she still recall the dream that rapture so sup... | [When, a, zither, starts, to, play, youll, remember, yesterday, in, its, haunting, strain, vienna, lives, again, free, and, bright, and, gay, in, your, mind, a, sudden, gleam, of, a, half, forgotten, dream, seems, to, glimmer, when, you, hear, the, third, man, theme, Once, again, there, comes, to, mind, someone, that, you, left, behind, love, that, somehow, didnt, last, in, that, happy, city, ... | 396 | 0.9973 | Positive |
3 | 1950 | Gary | Sam’s Song | Ah heres a happy tune youll love to croon They call it Sams Song Its catchy as can be The melody They call it Sams Song Nothing on your mind then you find youre humming Sams Song Why it makes you grim gets under your skin As only a song can do The people that you meet Out on the street all whistling Sams Song Everyone you see will soon agree that its a Grand Song So forget your troubles and we... | [Ah, heres, a, happy, tune, youll, love, to, croon, They, call, it, Sams, Song, Its, catchy, as, can, be, The, melody, They, call, it, Sams, Song, Nothing, on, your, mind, then, you, find, youre, humming, Sams, Song, Why, it, makes, you, grim, gets, under, your, skin, As, only, a, song, can, do, The, people, that, you, meet, Out, on, the, street, all, whistling, Sams, Song, Everyone, you, see,... | 288 | 0.9733 | Positive |
4 | 1950 | Gary | Simple Melody | Wont you play some simple melody Like my mother sang to me One with a good old fashioned harmony Play some simple melody Musical demon set your honey a dreamin Wont you play me some rag Just change that classical nag to some sweet beautiful drag If you will play from a copy of a tune that is choppy Youll get all my applause And that is simply because I wanna listen to rag Musical demon set you... | [Wont, you, play, some, simple, melody, Like, my, mother, sang, to, me, One, with, a, good, old, fashioned, harmony, Play, some, simple, melody, Musical, demon, set, your, honey, a, dreamin, Wont, you, play, me, some, rag, Just, change, that, classical, nag, to, some, sweet, beautiful, drag, If, you, will, play, from, a, copy, of, a, tune, that, is, choppy, Youll, get, all, my, applause, And, ... | 199 | 0.9696 | Positive |
5 | 1950 | Guy Lombardo | Third Man Theme | The Third Man Theme Instrumental version by Anton Karas hit 1 for 11 weeks in 1950 Instrumental version by Guy Lombardo ALSO hit 1 for 11 weeks in 1950 four other versions also charted that year Freddy Martin 17 Hugo Winterhalter 21 Victor Young 22 and Owen Bradley 23 Title song from the Orson Welles film co starring Joseph Cotton Words by Walter Lord Music by Anton Karas When a zither starts ... | [The, Third, Man, Theme, Instrumental, version, by, Anton, Karas, hit, 1, for, 11, weeks, in, 1950, Instrumental, version, by, Guy, Lombardo, ALSO, hit, 1, for, 11, weeks, in, 1950, four, other, versions, also, charted, that, year, Freddy, Martin, 17, Hugo, Winterhalter, 21, Victor, Young, 22, and, Owen, Bradley, 23, Title, song, from, the, Orson, Welles, film, co, starring, Joseph, Cotton, Wo... | 469 | 0.9973 | Positive |
6 | 1950 | Red Foley | Chattanoogie Shoe Shine Boy | Have you ever passed the corner of Forth and Grand Where a little ball o rhythm has a shoe shine stand People gather round and they clap their hands Hes a great big bundle o joy He pops the boogie woogie rag The Chattanoogie shoe shine boy He charges you a nickel just to shine one shoe He makes the oldest kind o leather look like new You feel as though you wanna dance when he gets through Hes ... | [Have, you, ever, passed, the, corner, of, Forth, and, Grand, Where, a, little, ball, o, rhythm, has, a, shoe, shine, stand, People, gather, round, and, they, clap, their, hands, Hes, a, great, big, bundle, o, joy, He, pops, the, boogie, woogie, rag, The, Chattanoogie, shoe, shine, boy, He, charges, you, a, nickel, just, to, shine, one, shoe, He, makes, the, oldest, kind, o, leather, look, lik... | 246 | 0.9933 | Positive |
7 | 1950 | Sammy Kaye | Harbor Lights | I saw the harbor lights They only told me we were parting The same old harbor lights That once brought you to me I watched the harbor lights How could I help if tears were starting Goodbye to tender nights Beside the silvery sea I long to hold you near And kiss you just once more But I was on the ship And you were on the shore Now I know lonely nights For all the while my heart is whispering S... | [I, saw, the, harbor, lights, They, only, told, me, we, were, parting, The, same, old, harbor, lights, That, once, brought, you, to, me, I, watched, the, harbor, lights, How, could, I, help, if, tears, were, starting, Goodbye, to, tender, nights, Beside, the, silvery, sea, I, long, to, hold, you, near, And, kiss, you, just, once, more, But, I, was, on, the, ship, And, you, were, on, the, shore... | 138 | 0.5423 | Positive |
8 | 1950 | Sammy Kaye | It Isn’t Fair | It isnt fair for you to taunt me How can you make me care this way It isnt fair for you to want me If its just for a day It isnt fair for you to thrill me Why do you do the things you do It isnt fair for you to fill me With those dreams that cant come true dear Why is it that you came into my life And made it complete You gave me just a taste of high life If this is love then I repeat It isnt ... | [It, isnt, fair, for, you, to, taunt, me, How, can, you, make, me, care, this, way, It, isnt, fair, for, you, to, want, me, If, its, just, for, a, day, It, isnt, fair, for, you, to, thrill, me, Why, do, you, do, the, things, you, do, It, isnt, fair, for, you, to, fill, me, With, those, dreams, that, cant, come, true, dear, Why, is, it, that, you, came, into, my, life, And, made, it, complete, ... | 125 | 0.7876 | Positive |
9 | 1950 | Kay Starr | Bonaparte’s Retreat | Met the man I love In a town way down in Dixie Neath the stars above He was the sweetest man you ever did see When he held me in his arms And told me of my many charms He kissed me while the fiddles played The Bonapartes Retreat All the world was bright When he held me on that night And I heard him say Please dont ever go away When he held me in his arms And told me of my many charms He kissed... | [Met, the, man, I, love, In, a, town, way, down, in, Dixie, Neath, the, stars, above, He, was, the, sweetest, man, you, ever, did, see, When, he, held, me, in, his, arms, And, told, me, of, my, many, charms, He, kissed, me, while, the, fiddles, played, The, Bonapartes, Retreat, All, the, world, was, bright, When, he, held, me, on, that, night, And, I, heard, him, say, Please, dont, ever, go, a... | 180 | 0.9947 | Positive |
Now that we have a VADER score for each song, we can group the songs by year and take the average score for that year so get a general idea of how positive or negative lyrics were for that year. We chose to use the mean VADER score for that year because that would give us the average sentiment of top 100 popular songs for that year.
from scipy import stats
import matplotlib
import matplotlib.pyplot as plt
df_year = pd.DataFrame(df.groupby("Year")['Vader Score'].mean())
df_year = df_year.reset_index()
df_year.plot.line(x='Year', y='Vader Score', ylabel = "Average Compound Score")
slope, intercept, r_value, p_value, std_err = stats.linregress(df_year.Year, df_year['Vader Score'])
plt.plot(df_year.Year, slope*df_year.Year + intercept, color='red')
print("Slope of trendline:", slope)
Slope of trendline: -0.002654143317592896
We can see from the trendline that the compound sentiment score has gone down over time which means that according to the VADER score analysis songs have gotten more negative since 1950.
Additionally, we want to break down this compound score to see the trendlines for songs that were categorized as Positive over time (VADER score >= 0.5) and Negative over time (VADER score <= -0.5)
pos = df[df["Sentiment"] == "Positive"]
pos_counts = pos.groupby("Year")["Sentiment"].count()
year_counts = df["Year"].value_counts()
#Use proportion to account for differences in number of songs per year
prop_pos = pos_counts / year_counts
df_pos = pd.DataFrame(prop_pos)
df_pos = df_pos.reset_index()
df_pos = df_pos.rename(columns={'index':'Year', 0:'Pos'})
df_pos.plot.line(x='Year', y='Pos', ylabel = "Positive Sentiment")
slope, intercept, r_value, p_value, std_err = stats.linregress(df_pos)
plt.plot(df_pos.Year, slope*df_pos.Year + intercept, color='red')
print("Slope of trendline:", slope)
Slope of trendline: -0.0007946728593223154
We can see a negative trendline for positive songs showing that their positive sentiment (even for just the songs that are categorized as positive) has decreased over time.
neg = df[df["Sentiment"] == "Negative"]
neg_counts = neg.groupby("Year")["Sentiment"].count()
year_counts = df["Year"].value_counts()
prop_neg = neg_counts / year_counts
df_neg = pd.DataFrame(prop_neg)
df_neg = df_neg.reset_index()
df_neg = df_neg.rename(columns={'index':'Year', 0:'Neg'})
df_neg = df_neg.dropna()
df_neg.plot.line(x='Year', y='Neg', ylabel = "Negative Sentiment")
slope, intercept, r_value, p_value, std_err = stats.linregress(df_neg)
plt.plot(df_neg.Year, slope*df_neg.Year + intercept, color='red')
print("Slope of trendline:", slope)
Slope of trendline: 0.001605484420070733
When looking at our negative sentiment songs, the negative proportion has increased over time suggesting songs have gotten more negative over time (not necessarily that there are more negative songs now but we are measuring the extent to which the lyrics have become negative over time).
As part of our Exploratory Data Analysis was tracking Top Artists, we were curious to see what the sentiment analysis of these top artists were.
#source: https://seaborn.pydata.org/generated/seaborn.countplot.html
import seaborn as sns
import matplotlib.pyplot as plt
art = df[df["Artist"].isin(top_artists.index)]
plt.figure(figsize=(15,5))
sns.countplot(x="Artist", hue="Sentiment", data=art, hue_order=["Positive", "Neutral", "Negative"], palette="Set2")
<AxesSubplot:xlabel='Artist', ylabel='count'>
Interestingly enough, all of the top artists have overwhelmingly more positive songs than negative or neutral ones. However, looking deeper into this chart, we can see that most of these artists were popular decades ago. Rihanna, a more current artist, shows far fewer positive songs than someone like Elvis Presley who peaked in popularity in the 50s and 60s.
Our original hypothesis with our sentiment analysis was that songs have gotten more negative over time. Through our NLP model we found this to be true. In order to verify these results, we visualized not only the compound scores but also the trends over time for the songs already categorized into being positive or negative. We calculated a simple regression on each of these plots and found the slopes of those lines supported our hypothesis.
There are a variety of inclinations that we had as to why this would be the case (psychological priming, cultural acceptance of negativity, remembering negative stimuli better than positive ones, etc). We could easily expand upon this study to see if we could attribute a cause to the phenomenon we observed of increased lyrical negativity.