This Python project focuses on analyzing the mental health status of college students, particularly looking into anxiety, depression, and panic attacks. The aim is to understand the prevalence of these issues and their impact on students' academic performance and well-being.
- Data Loading and Cleaning: The dataset includes responses from college students on various aspects such as age, course, CGPA, and mental health status. The script includes cleaning operations like renaming columns, handling missing data, and correcting data formats.
- Exploratory Data Analysis (EDA): The project performs a comprehensive EDA, including distribution of gender, age, course enrollment, and academic year. It utilizes visualizations to understand the patterns in mental health issues among the student population.
- Mental Health Insights: Analysis focuses on the relationship between students' academic year, course of study, and their mental health status, providing insights into anxiety, depression, and panic attacks prevalence.
- Pandas for data manipulation
- Matplotlib and Seaborn for visualization
- Load the dataset using Pandas.
- Follow the cleaning steps to prepare the data.
- Run the EDA code blocks to generate insights and visualizations.
This project provides valuable insights into the mental health challenges faced by college students and can aid in the development of targeted support strategies.
Importing Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
original_data = pd.read_csv("../data/Student Mental health.csv")
original_data.head()
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
</style>
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
Timestamp | Choose your gender | Age | What is your course? | Your current year of Study | What is your CGPA? | Marital status | Do you have Depression? | Do you have Anxiety? | Do you have Panic attack? | Did you seek any specialist for a treatment? | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 8/7/2020 12:02 | Female | 18.0 | Engineering | year 1 | 3.00 - 3.49 | No | Yes | No | Yes | No |
1 | 8/7/2020 12:04 | Male | 21.0 | Islamic education | year 2 | 3.00 - 3.49 | No | No | Yes | No | No |
2 | 8/7/2020 12:05 | Male | 19.0 | BIT | Year 1 | 3.00 - 3.49 | No | Yes | Yes | Yes | No |
3 | 8/7/2020 12:06 | Female | 22.0 | Laws | year 3 | 3.00 - 3.49 | Yes | Yes | No | No | No |
4 | 8/7/2020 12:13 | Male | 23.0 | Mathemathics | year 4 | 3.00 - 3.49 | No | No | No | No | No |
- Renaming column head.
- Handling Missing data
- Handling Duplicates
original_data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 101 entries, 0 to 100
Data columns (total 11 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Timestamp 101 non-null object
1 Choose your gender 101 non-null object
2 Age 100 non-null float64
3 What is your course? 101 non-null object
4 Your current year of Study 101 non-null object
5 What is your CGPA? 101 non-null object
6 Marital status 101 non-null object
7 Do you have Depression? 101 non-null object
8 Do you have Anxiety? 101 non-null object
9 Do you have Panic attack? 101 non-null object
10 Did you seek any specialist for a treatment? 101 non-null object
dtypes: float64(1), object(10)
memory usage: 8.8+ KB
original_data.nunique()
Timestamp 92
Choose your gender 2
Age 7
What is your course? 49
Your current year of Study 7
What is your CGPA? 6
Marital status 2
Do you have Depression? 2
Do you have Anxiety? 2
Do you have Panic attack? 2
Did you seek any specialist for a treatment? 2
dtype: int64
original_data['What is your course?'].unique()
array(['Engineering', 'Islamic education', 'BIT', 'Laws', 'Mathemathics',
'Pendidikan islam', 'BCS', 'Human Resources', 'Irkhs',
'Psychology', 'KENMS', 'Accounting ', 'ENM', 'Marine science',
'KOE', 'Banking Studies', 'Business Administration', 'Law',
'KIRKHS', 'Usuluddin ', 'TAASL', 'Engine', 'ALA',
'Biomedical science', 'koe', 'Kirkhs', 'BENL', 'Benl', 'IT', 'CTS',
'engin', 'Econs', 'MHSC', 'Malcom', 'Kop', 'Human Sciences ',
'Biotechnology', 'Communication ', 'Diploma Nursing',
'Pendidikan Islam ', 'Radiography', 'psychology', 'Fiqh fatwa ',
'DIPLOMA TESL', 'Koe', 'Fiqh', 'Islamic Education', 'Nursing ',
'Pendidikan Islam'], dtype=object)
original_data['What is your CGPA?'].unique()
array(['3.00 - 3.49', '3.50 - 4.00', '3.50 - 4.00 ', '2.50 - 2.99',
'2.00 - 2.49', '0 - 1.99'], dtype=object)
original_data['Your current year of Study'].unique()
array(['year 1', 'year 2', 'Year 1', 'year 3', 'year 4', 'Year 2',
'Year 3'], dtype=object)
original_data[original_data['Age'].isna()]
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
</style>
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
Timestamp | Choose your gender | Age | What is your course? | Your current year of Study | What is your CGPA? | Marital status | Do you have Depression? | Do you have Anxiety? | Do you have Panic attack? | Did you seek any specialist for a treatment? | |
---|---|---|---|---|---|---|---|---|---|---|---|
43 | 8/7/2020 15:07 | Male | NaN | BIT | year 1 | 0 - 1.99 | No | No | No | No | No |
- We will rename all the column heading (shorten them).
- Timestamp column is not required in our analysis, so we will drop this column.
- Age column has one missing value, we will either fill it with mean of age column or remove that particular row.
- Course Name has formatting issues, this has to be handcorrectedled. ( Laws is same as law)
- Current year of study has formatting issues, this has to be corrected. ( year 1 is same as Year 1)
- Few rows in CGPA column has space in end, has to be corrected.
# Making a copy of original data so original data is kept intact.
working_data = original_data.copy()
# Dropping timestamp column
working_data.drop(['Timestamp'], axis=1, inplace=True)
working_data.head()
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
</style>
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
Choose your gender | Age | What is your course? | Your current year of Study | What is your CGPA? | Marital status | Do you have Depression? | Do you have Anxiety? | Do you have Panic attack? | Did you seek any specialist for a treatment? | |
---|---|---|---|---|---|---|---|---|---|---|
0 | Female | 18.0 | Engineering | year 1 | 3.00 - 3.49 | No | Yes | No | Yes | No |
1 | Male | 21.0 | Islamic education | year 2 | 3.00 - 3.49 | No | No | Yes | No | No |
2 | Male | 19.0 | BIT | Year 1 | 3.00 - 3.49 | No | Yes | Yes | Yes | No |
3 | Female | 22.0 | Laws | year 3 | 3.00 - 3.49 | Yes | Yes | No | No | No |
4 | Male | 23.0 | Mathemathics | year 4 | 3.00 - 3.49 | No | No | No | No | No |
# Renaming column headings
working_data.rename(columns={
'Choose your gender' : 'gender',
'Age' : 'age',
'What is your course?' : 'course',
'Your current year of Study' : 'current_year',
'What is your CGPA?' : 'cgpa',
'Marital status' : 'marital_status',
'Do you have Depression?' : 'has_depression',
'Do you have Anxiety?' : 'has_anxity',
'Do you have Panic attack?' : 'has_panic_attack',
'Did you seek any specialist for a treatment?' : 'visited_specialist'}, inplace=True)
working_data.head()
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
</style>
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
gender | age | course | current_year | cgpa | marital_status | has_depression | has_anxity | has_panic_attack | visited_specialist | |
---|---|---|---|---|---|---|---|---|---|---|
0 | Female | 18.0 | Engineering | year 1 | 3.00 - 3.49 | No | Yes | No | Yes | No |
1 | Male | 21.0 | Islamic education | year 2 | 3.00 - 3.49 | No | No | Yes | No | No |
2 | Male | 19.0 | BIT | Year 1 | 3.00 - 3.49 | No | Yes | Yes | Yes | No |
3 | Female | 22.0 | Laws | year 3 | 3.00 - 3.49 | Yes | Yes | No | No | No |
4 | Male | 23.0 | Mathemathics | year 4 | 3.00 - 3.49 | No | No | No | No | No |
# dropping row where value of age is null
working_data.dropna(inplace=True)
working_data.isna().sum()
gender 0
age 0
course 0
current_year 0
cgpa 0
marital_status 0
has_depression 0
has_anxity 0
has_panic_attack 0
visited_specialist 0
dtype: int64
# Correcting format of course names
working_data.replace({
'Engine':'Engineering',
'engin' : 'Engineering',
'Islamic education' : 'Islamic Education',
'BIT': 'IT',
'Laws' : 'Law',
'Pendidikan islam': 'Pendidikan Islam',
'Pendidikan Islam ': 'Pendidikan Islam',
'KIRKHS':'Irkhs',
'Kirkhs':'Irkhs',
'Marine science' : 'Marine Science',
'koe': 'Koe',
'KOE': 'Koe',
'Biomedical science' : 'Biomedical Science',
'Benl':'BENL',
'Econs' : 'Economics',
'Human Sciences ':'Human Sciences',
'psychology' : 'Psychology',
'Fiqh fatwa ' : 'Fiqh Fatwa',
'DIPLOMA TESL': 'Diploma TESL',
'Fiqh' : 'Fiqh Fatwa',
'Accounting ':'Accounting',
'Communication ':'Communication',
'Nursing ':'Nursing'}, inplace=True)
working_data['course'].unique()
array(['Engineering', 'Islamic Education', 'IT', 'Law', 'Mathemathics',
'Pendidikan Islam', 'BCS', 'Human Resources', 'Irkhs',
'Psychology', 'KENMS', 'Accounting', 'ENM', 'Marine Science',
'Koe', 'Banking Studies', 'Business Administration', 'Usuluddin ',
'TAASL', 'ALA', 'Biomedical Science', 'BENL', 'CTS', 'Economics',
'MHSC', 'Malcom', 'Kop', 'Human Sciences', 'Biotechnology',
'Communication', 'Diploma Nursing', 'Radiography', 'Fiqh Fatwa',
'Diploma TESL', 'Nursing'], dtype=object)
# Correcting format of current year
working_data.replace({'year 1' : 'Year 1',
'year 2' : 'Year 2',
'year 3' : 'Year 3',
'year 4' : 'Year 4'}, inplace=True)
working_data['current_year'].unique()
array(['Year 1', 'Year 2', 'Year 3', 'Year 4'], dtype=object)
# Removing trailing white spaces from cgpa column
working_data['cgpa'] = working_data['cgpa'].apply(lambda x:x.strip())
working_data['cgpa'].unique()
array(['3.00 - 3.49', '3.50 - 4.00', '2.50 - 2.99', '2.00 - 2.49',
'0 - 1.99'], dtype=object)
plt.figure(figsize=(14, 5))
plt.title('Gender Distribution')
plt.pie(working_data['gender'].value_counts(),labels=working_data['gender'].value_counts().index, autopct='%1.1f%%', explode=(0.025,0.025), labeldistance=0.5)
plt.show()
- The survey exhibits a higher participation rate among women, which may introduce a potential gender bias in the resulting analysis.
plt.figure(figsize=(4,4))
plt.title('Age Distribution')
plt.bar(x=working_data['age'].value_counts().index, height=working_data['age'].value_counts().values)
plt.show()
plt.figure(figsize=(4,4))
plt.title('Age Distribution')
plt.bar(x=working_data['current_year'].value_counts().index.tolist(), height=working_data['current_year'].value_counts().values.tolist())
plt.show()
- The survey predominantly includes college freshmen, who are primarily aged 18 and 19, which may introduce a potential bias towards the experiences and perspectives of this specific group.
fig, axis = plt.subplots(2,2, figsize=(12,12))
fig.suptitle("Course Distribution across Student Cohorts",fontweight="bold", size=12)
current_year_list = sorted(working_data['current_year'].unique().tolist())
counter = 0
for i in range(2):
for j in range(2):
x = working_data[working_data['current_year'] == current_year_list[i]].groupby('course')['gender'].count().sort_values(ascending=False).head().index.to_list()
y = working_data[working_data['current_year'] == current_year_list[i]].groupby('course')['gender'].count().sort_values(ascending=False).head().values.tolist()
axis[i,j].bar(x, y)
axis[i,j].set_title(current_year_list[counter])
axis[i,j].set_xticklabels(x, rotation=20)
counter += 1
/var/folders/n4/669_syrx0qzf2gktz416h1d00000gn/T/ipykernel_12231/3018482063.py:14: UserWarning: FixedFormatter should only be used together with FixedLocator
axis[i,j].set_xticklabels(x, rotation=20)
/var/folders/n4/669_syrx0qzf2gktz416h1d00000gn/T/ipykernel_12231/3018482063.py:14: UserWarning: FixedFormatter should only be used together with FixedLocator
axis[i,j].set_xticklabels(x, rotation=20)
/var/folders/n4/669_syrx0qzf2gktz416h1d00000gn/T/ipykernel_12231/3018482063.py:14: UserWarning: FixedFormatter should only be used together with FixedLocator
axis[i,j].set_xticklabels(x, rotation=20)
/var/folders/n4/669_syrx0qzf2gktz416h1d00000gn/T/ipykernel_12231/3018482063.py:14: UserWarning: FixedFormatter should only be used together with FixedLocator
axis[i,j].set_xticklabels(x, rotation=20)
working_data[working_data['current_year'] == current_year_list[i]].groupby('course')['gender'].count().sort_values(ascending=False).head().index.to_list()
['Engineering', 'BCS', 'Koe', 'Pendidikan Islam', 'Biomedical Science']
x = working_data.groupby('course')['gender'].count().sort_values(ascending=False).head().index.to_list()
y = working_data.groupby('course')['gender'].count().sort_values(ascending=False).head().values.tolist()
plt.title("Course Distribution across Student Cohorts")
plt.bar(x, y)
plt.show()
- The survey primarily focuses on the participation of students studying Engineering and Computer Science disciplines, which may result in a potential bias towards the perspectives and experiences of individuals within these fields.
sns.catplot(data=working_data, x="current_year", hue='has_anxity', kind="count")\
.set(title="Mental Health and Academic Years in College: Anxiety", \
xlabel='Current Year',\
ylabel='Count of Students')
<seaborn.axisgrid.FacetGrid at 0x7ff15070e6e0>
- Analyzing Anxiety Levels among College Students: A Majority with No Reported Symptoms.
sns.catplot(data=working_data, x="current_year", hue='has_depression', kind="count")\
.set(title="Mental Health and Academic Years in College: Depression", \
xlabel='Current Year',\
ylabel='Count of Students')
<seaborn.axisgrid.FacetGrid at 0x7ff14eea2890>
- Examining Depression Rates among College Students: Majority Reporting No Symptoms
sns.catplot(data=working_data, x="current_year", hue='has_panic_attack', kind="count")\
.set(title="Mental Health and Academic Years in College: Panic Attack", \
xlabel='Current Year',\
ylabel='Count of Students')
<seaborn.axisgrid.FacetGrid at 0x7ff1506c4f40>
- Exploring the Prevalence of Panic Attacks among College Students: Majority Reporting No Incidents
sns.catplot(data=working_data, x="age", hue='has_panic_attack', kind="count")\
.set(title="The Impact of Age on Panic Attack", \
xlabel='Age',\
ylabel='Count of Students')
<seaborn.axisgrid.FacetGrid at 0x7ff155495450>
- Exploring the Prevalence of Panic Attacks among College Students: Majority Reporting No Incidents
sns.catplot(data=working_data, x="age", hue='has_anxity', kind="count")\
.set(title="The Impact of Age on Anxity", \
xlabel='Age',\
ylabel='Count of Students')
<seaborn.axisgrid.FacetGrid at 0x7ff15266e530>
- Examining Anxiety Levels among College Students: Majority Report No Symptoms
sns.catplot(data=working_data, x="age", hue='has_depression', kind="count")\
.set(title="The Impact of Age on Anxity", \
xlabel='Age',\
ylabel='Count of Students')
<seaborn.axisgrid.FacetGrid at 0x7ff154592c20>
- Depression Prevalence among College Students: Majority Report No Symptoms
print('has_depression')
print(working_data[working_data['has_depression'] == 'Yes']['has_anxity'].value_counts())
print('\nno_depression')
print(working_data[working_data['has_depression'] == 'No']['has_anxity'].value_counts())
has_depression
has_anxity
Yes 18
No 17
Name: count, dtype: int64
no_depression
has_anxity
No 49
Yes 16
Name: count, dtype: int64
sns.catplot(data=working_data, x="has_depression", hue='has_anxity', kind="count")
<seaborn.axisgrid.FacetGrid at 0x7ff154575000>
- Students with depression have an equal likelihood of experiencing and not experiencing anxiety.
- Students without depression have a lower probability of having anxiety.
- The majority of college students neither have anxiety nor depression. These points summarize the relationship between anxiety, depression, and college students, highlighting the equal chance of anxiety with depression, a lower chance without depression, and the majority being unaffected by either condition.
sns.catplot(data=working_data, x="has_panic_attack", hue='has_anxity', kind="count")
<seaborn.axisgrid.FacetGrid at 0x7ff15266f130>
print('has_panic_attack')
print(working_data[working_data['has_panic_attack'] == 'Yes']['has_anxity'].value_counts())
print('\nno_panic_attack')
print(working_data[working_data['has_panic_attack'] == 'No']['has_anxity'].value_counts())
has_panic_attack
has_anxity
No 20
Yes 13
Name: count, dtype: int64
no_panic_attack
has_anxity
No 46
Yes 21
Name: count, dtype: int64
- Students who experience panic attacks have a slightly higher than 50% chance of also having anxiety.
- Students who do not experience panic attacks have a less than 50% chance of having anxiety.
- The majority of college students neither have anxiety nor panic attacks.
print('has_depression')
print(working_data[working_data['has_depression'] == 'Yes']['has_panic_attack'].value_counts())
print('\nno_depression')
print(working_data[working_data['has_depression'] == 'No']['has_panic_attack'].value_counts())
has_depression
has_panic_attack
No 18
Yes 17
Name: count, dtype: int64
no_depression
has_panic_attack
No 49
Yes 16
Name: count, dtype: int64
sns.catplot(data=working_data, x="has_depression", hue='has_panic_attack', kind="count")
<seaborn.axisgrid.FacetGrid at 0x7ff1545903a0>
- Students with depression have an equal likelihood of experiencing and not experiencing panic attacks.
- Students without depression have a lower probability of having panic attacks.
- The majority of college students neither have depression nor panic attacks.
fig, axis = plt.subplots(1,3, figsize=(10, 6))
fig.suptitle("Anxiety, Depression, Panic Attacks, and Doctor Visits: A Comparative Analysis",fontweight="bold", size=12)
column_list = ['has_anxity', 'has_depression', 'has_panic_attack']
for i in range(len(column_list)):
x = working_data[working_data[column_list[i]] == 'Yes']['visited_specialist'].value_counts().index.tolist()
y = working_data[working_data[column_list[i]] == 'Yes']['visited_specialist'].value_counts().values.tolist()
axis[i].bar(x, y)
axis[i].set_xlabel(column_list[i])