This Python project focuses on analyzing the mental health status of college students, particularly looking into anxiety, depression, and panic attacks. The aim is to understand the prevalence of these issues and their impact on students' academic performance and well-being.
- Data Loading and Cleaning: The dataset includes responses from college students on various aspects such as age, course, CGPA, and mental health status. The script includes cleaning operations like renaming columns, handling missing data, and correcting data formats.
- Exploratory Data Analysis (EDA): The project performs a comprehensive EDA, including distribution of gender, age, course enrollment, and academic year. It utilizes visualizations to understand the patterns in mental health issues among the student population.
- Mental Health Insights: Analysis focuses on the relationship between students' academic year, course of study, and their mental health status, providing insights into anxiety, depression, and panic attacks prevalence.
- Pandas for data manipulation
- Matplotlib and Seaborn for visualization
- Load the dataset using Pandas.
- Follow the cleaning steps to prepare the data.
- Run the EDA code blocks to generate insights and visualizations.
This project provides valuable insights into the mental health challenges faced by college students and can aid in the development of targeted support strategies.
Importing Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
original_data = pd.read_csv("../data/Student Mental health.csv")
Timestamp | Choose your gender | Age | What is your course? | Your current year of Study | What is your CGPA? | Marital status | Do you have Depression? | Do you have Anxiety? | Do you have Panic attack? | Did you seek any specialist for a treatment? | |
0 | 8/7/2020 12:02 | Female | 18.0 | Engineering | year 1 | 3.00 - 3.49 | No | Yes | No | Yes | No |
1 | 8/7/2020 12:04 | Male | 21.0 | Islamic education | year 2 | 3.00 - 3.49 | No | No | Yes | No | No |
2 | 8/7/2020 12:05 | Male | 19.0 | BIT | Year 1 | 3.00 - 3.49 | No | Yes | Yes | Yes | No |
3 | 8/7/2020 12:06 | Female | 22.0 | Laws | year 3 | 3.00 - 3.49 | Yes | Yes | No | No | No |
4 | 8/7/2020 12:13 | Male | 23.0 | Mathemathics | year 4 | 3.00 - 3.49 | No | No | No | No | No |
- Renaming column head.
- Handling Missing data
- Handling Duplicates
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 101 entries, 0 to 100
Data columns (total 11 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Timestamp 101 non-null object
1 Choose your gender 101 non-null object
2 Age 100 non-null float64
3 What is your course? 101 non-null object
4 Your current year of Study 101 non-null object
5 What is your CGPA? 101 non-null object
6 Marital status 101 non-null object
7 Do you have Depression? 101 non-null object
8 Do you have Anxiety? 101 non-null object
9 Do you have Panic attack? 101 non-null object
10 Did you seek any specialist for a treatment? 101 non-null object
dtypes: float64(1), object(10)
memory usage: 8.8+ KB
Timestamp 92
Choose your gender 2
Age 7
What is your course? 49
Your current year of Study 7
What is your CGPA? 6
Marital status 2
Do you have Depression? 2
Do you have Anxiety? 2
Do you have Panic attack? 2
Did you seek any specialist for a treatment? 2
dtype: int64
original_data['What is your course?'].unique()
array(['Engineering', 'Islamic education', 'BIT', 'Laws', 'Mathemathics',
'Pendidikan islam', 'BCS', 'Human Resources', 'Irkhs',
'Psychology', 'KENMS', 'Accounting ', 'ENM', 'Marine science',
'KOE', 'Banking Studies', 'Business Administration', 'Law',
'KIRKHS', 'Usuluddin ', 'TAASL', 'Engine', 'ALA',
'Biomedical science', 'koe', 'Kirkhs', 'BENL', 'Benl', 'IT', 'CTS',
'engin', 'Econs', 'MHSC', 'Malcom', 'Kop', 'Human Sciences ',
'Biotechnology', 'Communication ', 'Diploma Nursing',
'Pendidikan Islam ', 'Radiography', 'psychology', 'Fiqh fatwa ',
'DIPLOMA TESL', 'Koe', 'Fiqh', 'Islamic Education', 'Nursing ',
'Pendidikan Islam'], dtype=object)
original_data['What is your CGPA?'].unique()
array(['3.00 - 3.49', '3.50 - 4.00', '3.50 - 4.00 ', '2.50 - 2.99',
'2.00 - 2.49', '0 - 1.99'], dtype=object)
original_data['Your current year of Study'].unique()
array(['year 1', 'year 2', 'Year 1', 'year 3', 'year 4', 'Year 2',
'Year 3'], dtype=object)
Timestamp | Choose your gender | Age | What is your course? | Your current year of Study | What is your CGPA? | Marital status | Do you have Depression? | Do you have Anxiety? | Do you have Panic attack? | Did you seek any specialist for a treatment? | |
43 | 8/7/2020 15:07 | Male | NaN | BIT | year 1 | 0 - 1.99 | No | No | No | No | No |
- We will rename all the column heading (shorten them).
- Timestamp column is not required in our analysis, so we will drop this column.
- Age column has one missing value, we will either fill it with mean of age column or remove that particular row.
- Course Name has formatting issues, this has to be handcorrectedled. ( Laws is same as law)
- Current year of study has formatting issues, this has to be corrected. ( year 1 is same as Year 1)
- Few rows in CGPA column has space in end, has to be corrected.
# Making a copy of original data so original data is kept intact.
working_data = original_data.copy()
# Dropping timestamp column
working_data.drop(['Timestamp'], axis=1, inplace=True)
Choose your gender | Age | What is your course? | Your current year of Study | What is your CGPA? | Marital status | Do you have Depression? | Do you have Anxiety? | Do you have Panic attack? | Did you seek any specialist for a treatment? | |
0 | Female | 18.0 | Engineering | year 1 | 3.00 - 3.49 | No | Yes | No | Yes | No |
1 | Male | 21.0 | Islamic education | year 2 | 3.00 - 3.49 | No | No | Yes | No | No |
2 | Male | 19.0 | BIT | Year 1 | 3.00 - 3.49 | No | Yes | Yes | Yes | No |
3 | Female | 22.0 | Laws | year 3 | 3.00 - 3.49 | Yes | Yes | No | No | No |
4 | Male | 23.0 | Mathemathics | year 4 | 3.00 - 3.49 | No | No | No | No | No |
# Renaming column headings
'Choose your gender' : 'gender',
'Age' : 'age',
'What is your course?' : 'course',
'Your current year of Study' : 'current_year',
'What is your CGPA?' : 'cgpa',
'Marital status' : 'marital_status',
'Do you have Depression?' : 'has_depression',
'Do you have Anxiety?' : 'has_anxity',
'Do you have Panic attack?' : 'has_panic_attack',
'Did you seek any specialist for a treatment?' : 'visited_specialist'}, inplace=True)
gender | age | course | current_year | cgpa | marital_status | has_depression | has_anxity | has_panic_attack | visited_specialist | |
0 | Female | 18.0 | Engineering | year 1 | 3.00 - 3.49 | No | Yes | No | Yes | No |
1 | Male | 21.0 | Islamic education | year 2 | 3.00 - 3.49 | No | No | Yes | No | No |
2 | Male | 19.0 | BIT | Year 1 | 3.00 - 3.49 | No | Yes | Yes | Yes | No |
3 | Female | 22.0 | Laws | year 3 | 3.00 - 3.49 | Yes | Yes | No | No | No |
4 | Male | 23.0 | Mathemathics | year 4 | 3.00 - 3.49 | No | No | No | No | No |
# dropping row where value of age is null
gender 0
age 0
course 0
current_year 0
cgpa 0
marital_status 0
has_depression 0
has_anxity 0
has_panic_attack 0
visited_specialist 0
dtype: int64
# Correcting format of course names
'engin' : 'Engineering',
'Islamic education' : 'Islamic Education',
'BIT': 'IT',
'Laws' : 'Law',
'Pendidikan islam': 'Pendidikan Islam',
'Pendidikan Islam ': 'Pendidikan Islam',
'Marine science' : 'Marine Science',
'koe': 'Koe',
'KOE': 'Koe',
'Biomedical science' : 'Biomedical Science',
'Econs' : 'Economics',
'Human Sciences ':'Human Sciences',
'psychology' : 'Psychology',
'Fiqh fatwa ' : 'Fiqh Fatwa',
'Fiqh' : 'Fiqh Fatwa',
'Accounting ':'Accounting',
'Communication ':'Communication',
'Nursing ':'Nursing'}, inplace=True)
array(['Engineering', 'Islamic Education', 'IT', 'Law', 'Mathemathics',
'Pendidikan Islam', 'BCS', 'Human Resources', 'Irkhs',
'Psychology', 'KENMS', 'Accounting', 'ENM', 'Marine Science',
'Koe', 'Banking Studies', 'Business Administration', 'Usuluddin ',
'TAASL', 'ALA', 'Biomedical Science', 'BENL', 'CTS', 'Economics',
'MHSC', 'Malcom', 'Kop', 'Human Sciences', 'Biotechnology',
'Communication', 'Diploma Nursing', 'Radiography', 'Fiqh Fatwa',
'Diploma TESL', 'Nursing'], dtype=object)
# Correcting format of current year
working_data.replace({'year 1' : 'Year 1',
'year 2' : 'Year 2',
'year 3' : 'Year 3',
'year 4' : 'Year 4'}, inplace=True)
array(['Year 1', 'Year 2', 'Year 3', 'Year 4'], dtype=object)
# Removing trailing white spaces from cgpa column
working_data['cgpa'] = working_data['cgpa'].apply(lambda x:x.strip())
array(['3.00 - 3.49', '3.50 - 4.00', '2.50 - 2.99', '2.00 - 2.49',
'0 - 1.99'], dtype=object)
plt.figure(figsize=(14, 5))
plt.title('Gender Distribution')
plt.pie(working_data['gender'].value_counts(),labels=working_data['gender'].value_counts().index, autopct='%1.1f%%', explode=(0.025,0.025), labeldistance=0.5)
- The survey exhibits a higher participation rate among women, which may introduce a potential gender bias in the resulting analysis.
plt.title('Age Distribution')['age'].value_counts().index, height=working_data['age'].value_counts().values)
plt.title('Age Distribution')['current_year'].value_counts().index.tolist(), height=working_data['current_year'].value_counts().values.tolist())
- The survey predominantly includes college freshmen, who are primarily aged 18 and 19, which may introduce a potential bias towards the experiences and perspectives of this specific group.
fig, axis = plt.subplots(2,2, figsize=(12,12))
fig.suptitle("Course Distribution across Student Cohorts",fontweight="bold", size=12)
current_year_list = sorted(working_data['current_year'].unique().tolist())
counter = 0
for i in range(2):
for j in range(2):
x = working_data[working_data['current_year'] == current_year_list[i]].groupby('course')['gender'].count().sort_values(ascending=False).head().index.to_list()
y = working_data[working_data['current_year'] == current_year_list[i]].groupby('course')['gender'].count().sort_values(ascending=False).head().values.tolist()
axis[i,j].bar(x, y)
axis[i,j].set_xticklabels(x, rotation=20)
counter += 1
working_data[working_data['current_year'] == current_year_list[i]].groupby('course')['gender'].count().sort_values(ascending=False).head().index.to_list()
['Engineering', 'BCS', 'Koe', 'Pendidikan Islam', 'Biomedical Science']
x = working_data.groupby('course')['gender'].count().sort_values(ascending=False).head().index.to_list()
y = working_data.groupby('course')['gender'].count().sort_values(ascending=False).head().values.tolist()
plt.title("Course Distribution across Student Cohorts"), y)
- The survey primarily focuses on the participation of students studying Engineering and Computer Science disciplines, which may result in a potential bias towards the perspectives and experiences of individuals within these fields.
sns.catplot(data=working_data, x="current_year", hue='has_anxity', kind="count")\
.set(title="Mental Health and Academic Years in College: Anxiety", \
xlabel='Current Year',\
ylabel='Count of Students')
- Analyzing Anxiety Levels among College Students: A Majority with No Reported Symptoms.
sns.catplot(data=working_data, x="current_year", hue='has_depression', kind="count")\
.set(title="Mental Health and Academic Years in College: Depression", \
xlabel='Current Year',\
ylabel='Count of Students')
- Examining Depression Rates among College Students: Majority Reporting No Symptoms
sns.catplot(data=working_data, x="current_year", hue='has_panic_attack', kind="count")\
.set(title="Mental Health and Academic Years in College: Panic Attack", \
xlabel='Current Year',\
ylabel='Count of Students')
- Exploring the Prevalence of Panic Attacks among College Students: Majority Reporting No Incidents
sns.catplot(data=working_data, x="age", hue='has_panic_attack', kind="count")\
.set(title="The Impact of Age on Panic Attack", \
ylabel='Count of Students')
- Exploring the Prevalence of Panic Attacks among College Students: Majority Reporting No Incidents
sns.catplot(data=working_data, x="age", hue='has_anxity', kind="count")\
.set(title="The Impact of Age on Anxity", \
ylabel='Count of Students')
- Examining Anxiety Levels among College Students: Majority Report No Symptoms
sns.catplot(data=working_data, x="age", hue='has_depression', kind="count")\
.set(title="The Impact of Age on Anxity", \
ylabel='Count of Students')
- Depression Prevalence among College Students: Majority Report No Symptoms
print(working_data[working_data['has_depression'] == 'Yes']['has_anxity'].value_counts())
print(working_data[working_data['has_depression'] == 'No']['has_anxity'].value_counts())
Yes 18
No 17
Name: count, dtype: int64
No 49
Yes 16
Name: count, dtype: int64
sns.catplot(data=working_data, x="has_depression", hue='has_anxity', kind="count")
- Students with depression have an equal likelihood of experiencing and not experiencing anxiety.
- Students without depression have a lower probability of having anxiety.
- The majority of college students neither have anxiety nor depression. These points summarize the relationship between anxiety, depression, and college students, highlighting the equal chance of anxiety with depression, a lower chance without depression, and the majority being unaffected by either condition.
sns.catplot(data=working_data, x="has_panic_attack", hue='has_anxity', kind="count")
print(working_data[working_data['has_panic_attack'] == 'Yes']['has_anxity'].value_counts())
print(working_data[working_data['has_panic_attack'] == 'No']['has_anxity'].value_counts())
No 20
Yes 13
Name: count, dtype: int64
No 46
Yes 21
Name: count, dtype: int64
- Students who experience panic attacks have a slightly higher than 50% chance of also having anxiety.
- Students who do not experience panic attacks have a less than 50% chance of having anxiety.
- The majority of college students neither have anxiety nor panic attacks.
print(working_data[working_data['has_depression'] == 'Yes']['has_panic_attack'].value_counts())
print(working_data[working_data['has_depression'] == 'No']['has_panic_attack'].value_counts())
No 18
Yes 17
Name: count, dtype: int64
No 49
Yes 16
Name: count, dtype: int64
sns.catplot(data=working_data, x="has_depression", hue='has_panic_attack', kind="count")
- Students with depression have an equal likelihood of experiencing and not experiencing panic attacks.
- Students without depression have a lower probability of having panic attacks.
- The majority of college students neither have depression nor panic attacks.
fig, axis = plt.subplots(1,3, figsize=(10, 6))
fig.suptitle("Anxiety, Depression, Panic Attacks, and Doctor Visits: A Comparative Analysis",fontweight="bold", size=12)
column_list = ['has_anxity', 'has_depression', 'has_panic_attack']
for i in range(len(column_list)):
x = working_data[working_data[column_list[i]] == 'Yes']['visited_specialist'].value_counts().index.tolist()
y = working_data[working_data[column_list[i]] == 'Yes']['visited_specialist'].value_counts().values.tolist()
axis[i].bar(x, y)