I have a column in a data frame that contains values of what languages people have worked with. Each row is a new individual and the languages are separated by a delimiter(;).
Is there any way to count occurrences of each language in the entire column,
eg, python occurs n times, JavaScript occurs N times, etc?
I tried this but I’m confused about how I could count the occurrences of each language in the entire column
df['LanguageHaveWorkedWith'].value_counts()
I also tried to use get_dummies
to one-hot encode it but how would I count the occurrences of each element?
df['LanguageHaveWorkedWith'].str.get_dummies(sep = ';')
use split data by separator, then explode, and calculate unique items counts
import pandas as pd
data = {'Languages': ['Python;JavaScript;Java', 'Python;C++;Python;JavaScript', 'JavaScript;C++']}
df = pd.DataFrame(data)
# 'Languages' split column by ';', then explode the list
df['Languages'] = df['Languages'].str.split(';')
df = df.explode('Languages')
# count each language sum
language_counts = df['Languages'].value_counts().reset_index()
language_counts.columns = ['Language', 'Count']