Counting Occurrences of Text in a Column in a data frames containing separators

I have a column in a data frame that contains values of what languages people have worked with. Each row is a new individual and the languages are separated by a delimiter(;).

Column to be evaluated

Is there any way to count occurrences of each language in the entire column,
eg, python occurs n times, JavaScript occurs N times, etc?

I tried this but I’m confused about how I could count the occurrences of each language in the entire column
df['LanguageHaveWorkedWith'].value_counts()

what I tried

I also tried to use get_dummies to one-hot encode it but how would I count the occurrences of each element?
df['LanguageHaveWorkedWith'].str.get_dummies(sep = ';')

Get dummies

use split data by separator, then explode, and calculate unique items counts

import pandas as pd

data = {'Languages': ['Python;JavaScript;Java', 'Python;C++;Python;JavaScript', 'JavaScript;C++']}
df = pd.DataFrame(data)

# 'Languages' split column by ';', then explode the list
df['Languages'] = df['Languages'].str.split(';')
df = df.explode('Languages')

# count each language sum
language_counts = df['Languages'].value_counts().reset_index()
language_counts.columns = ['Language', 'Count']

Leave a Comment