Skip to content

Praanshu98/student_mental_health

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Student Mental Health Analysis - README

Overview

This Python project focuses on analyzing the mental health status of college students, particularly looking into anxiety, depression, and panic attacks. The aim is to understand the prevalence of these issues and their impact on students' academic performance and well-being.

Features

  • Data Loading and Cleaning: The dataset includes responses from college students on various aspects such as age, course, CGPA, and mental health status. The script includes cleaning operations like renaming columns, handling missing data, and correcting data formats.
  • Exploratory Data Analysis (EDA): The project performs a comprehensive EDA, including distribution of gender, age, course enrollment, and academic year. It utilizes visualizations to understand the patterns in mental health issues among the student population.
  • Mental Health Insights: Analysis focuses on the relationship between students' academic year, course of study, and their mental health status, providing insights into anxiety, depression, and panic attacks prevalence.

Technologies Used

  • Pandas for data manipulation
  • Matplotlib and Seaborn for visualization

How to Use

  1. Load the dataset using Pandas.
  2. Follow the cleaning steps to prepare the data.
  3. Run the EDA code blocks to generate insights and visualizations.

This project provides valuable insights into the mental health challenges faced by college students and can aid in the development of targeted support strategies.

Analysis

Importing Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

Data Loading

original_data = pd.read_csv("../data/Student Mental health.csv")
original_data.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Timestamp Choose your gender Age What is your course? Your current year of Study What is your CGPA? Marital status Do you have Depression? Do you have Anxiety? Do you have Panic attack? Did you seek any specialist for a treatment?
0 8/7/2020 12:02 Female 18.0 Engineering year 1 3.00 - 3.49 No Yes No Yes No
1 8/7/2020 12:04 Male 21.0 Islamic education year 2 3.00 - 3.49 No No Yes No No
2 8/7/2020 12:05 Male 19.0 BIT Year 1 3.00 - 3.49 No Yes Yes Yes No
3 8/7/2020 12:06 Female 22.0 Laws year 3 3.00 - 3.49 Yes Yes No No No
4 8/7/2020 12:13 Male 23.0 Mathemathics year 4 3.00 - 3.49 No No No No No

Data Cleaning

  • Renaming column head.
  • Handling Missing data
  • Handling Duplicates
original_data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 101 entries, 0 to 100
Data columns (total 11 columns):
 #   Column                                        Non-Null Count  Dtype
---  ------                                        --------------  -----
 0   Timestamp                                     101 non-null    object
 1   Choose your gender                            101 non-null    object
 2   Age                                           100 non-null    float64
 3   What is your course?                          101 non-null    object
 4   Your current year of Study                    101 non-null    object
 5   What is your CGPA?                            101 non-null    object
 6   Marital status                                101 non-null    object
 7   Do you have Depression?                       101 non-null    object
 8   Do you have Anxiety?                          101 non-null    object
 9   Do you have Panic attack?                     101 non-null    object
 10  Did you seek any specialist for a treatment?  101 non-null    object
dtypes: float64(1), object(10)
memory usage: 8.8+ KB
original_data.nunique()
Timestamp                                       92
Choose your gender                               2
Age                                              7
What is your course?                            49
Your current year of Study                       7
What is your CGPA?                               6
Marital status                                   2
Do you have Depression?                          2
Do you have Anxiety?                             2
Do you have Panic attack?                        2
Did you seek any specialist for a treatment?     2
dtype: int64
original_data['What is your course?'].unique()
array(['Engineering', 'Islamic education', 'BIT', 'Laws', 'Mathemathics',
       'Pendidikan islam', 'BCS', 'Human Resources', 'Irkhs',
       'Psychology', 'KENMS', 'Accounting ', 'ENM', 'Marine science',
       'KOE', 'Banking Studies', 'Business Administration', 'Law',
       'KIRKHS', 'Usuluddin ', 'TAASL', 'Engine', 'ALA',
       'Biomedical science', 'koe', 'Kirkhs', 'BENL', 'Benl', 'IT', 'CTS',
       'engin', 'Econs', 'MHSC', 'Malcom', 'Kop', 'Human Sciences ',
       'Biotechnology', 'Communication ', 'Diploma Nursing',
       'Pendidikan Islam ', 'Radiography', 'psychology', 'Fiqh fatwa ',
       'DIPLOMA TESL', 'Koe', 'Fiqh', 'Islamic Education', 'Nursing ',
       'Pendidikan Islam'], dtype=object)
original_data['What is your CGPA?'].unique()
array(['3.00 - 3.49', '3.50 - 4.00', '3.50 - 4.00 ', '2.50 - 2.99',
       '2.00 - 2.49', '0 - 1.99'], dtype=object)
original_data['Your current year of Study'].unique()
array(['year 1', 'year 2', 'Year 1', 'year 3', 'year 4', 'Year 2',
       'Year 3'], dtype=object)
original_data[original_data['Age'].isna()]
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Timestamp Choose your gender Age What is your course? Your current year of Study What is your CGPA? Marital status Do you have Depression? Do you have Anxiety? Do you have Panic attack? Did you seek any specialist for a treatment?
43 8/7/2020 15:07 Male NaN BIT year 1 0 - 1.99 No No No No No

From above analysis we can deduce that

  1. We will rename all the column heading (shorten them).
  2. Timestamp column is not required in our analysis, so we will drop this column.
  3. Age column has one missing value, we will either fill it with mean of age column or remove that particular row.
  4. Course Name has formatting issues, this has to be handcorrectedled. ( Laws is same as law)
  5. Current year of study has formatting issues, this has to be corrected. ( year 1 is same as Year 1)
  6. Few rows in CGPA column has space in end, has to be corrected.
# Making a copy of original data so original data is kept intact.
working_data = original_data.copy()
# Dropping timestamp column

working_data.drop(['Timestamp'], axis=1, inplace=True)
working_data.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Choose your gender Age What is your course? Your current year of Study What is your CGPA? Marital status Do you have Depression? Do you have Anxiety? Do you have Panic attack? Did you seek any specialist for a treatment?
0 Female 18.0 Engineering year 1 3.00 - 3.49 No Yes No Yes No
1 Male 21.0 Islamic education year 2 3.00 - 3.49 No No Yes No No
2 Male 19.0 BIT Year 1 3.00 - 3.49 No Yes Yes Yes No
3 Female 22.0 Laws year 3 3.00 - 3.49 Yes Yes No No No
4 Male 23.0 Mathemathics year 4 3.00 - 3.49 No No No No No
# Renaming column headings

working_data.rename(columns={
                            'Choose your gender' : 'gender',
                            'Age' : 'age',
                            'What is your course?' : 'course',
                            'Your current year of Study' : 'current_year',
                            'What is your CGPA?' : 'cgpa',
                            'Marital status' : 'marital_status',
                            'Do you have Depression?' : 'has_depression',
                            'Do you have Anxiety?' : 'has_anxity',
                            'Do you have Panic attack?'  : 'has_panic_attack',
                            'Did you seek any specialist for a treatment?' : 'visited_specialist'}, inplace=True)
working_data.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
gender age course current_year cgpa marital_status has_depression has_anxity has_panic_attack visited_specialist
0 Female 18.0 Engineering year 1 3.00 - 3.49 No Yes No Yes No
1 Male 21.0 Islamic education year 2 3.00 - 3.49 No No Yes No No
2 Male 19.0 BIT Year 1 3.00 - 3.49 No Yes Yes Yes No
3 Female 22.0 Laws year 3 3.00 - 3.49 Yes Yes No No No
4 Male 23.0 Mathemathics year 4 3.00 - 3.49 No No No No No
# dropping row where value of age is null

working_data.dropna(inplace=True)
working_data.isna().sum()
gender                0
age                   0
course                0
current_year          0
cgpa                  0
marital_status        0
has_depression        0
has_anxity            0
has_panic_attack      0
visited_specialist    0
dtype: int64
# Correcting format of course names

working_data.replace({
                        'Engine':'Engineering',
                        'engin' : 'Engineering',
                        'Islamic education' : 'Islamic Education',
                        'BIT': 'IT',
                        'Laws' : 'Law',
                        'Pendidikan islam': 'Pendidikan Islam',
                        'Pendidikan Islam ': 'Pendidikan Islam',
                        'KIRKHS':'Irkhs',
                        'Kirkhs':'Irkhs',
                        'Marine science' : 'Marine Science',
                        'koe': 'Koe',
                        'KOE': 'Koe',
                        'Biomedical science' : 'Biomedical Science',
                        'Benl':'BENL',
                        'Econs' : 'Economics',
                        'Human Sciences ':'Human Sciences',
                        'psychology' : 'Psychology',
                        'Fiqh fatwa ' : 'Fiqh Fatwa',
                        'DIPLOMA TESL': 'Diploma TESL',
                        'Fiqh' : 'Fiqh Fatwa',
                        'Accounting ':'Accounting',
                        'Communication ':'Communication',
                        'Nursing ':'Nursing'}, inplace=True)
working_data['course'].unique()
array(['Engineering', 'Islamic Education', 'IT', 'Law', 'Mathemathics',
       'Pendidikan Islam', 'BCS', 'Human Resources', 'Irkhs',
       'Psychology', 'KENMS', 'Accounting', 'ENM', 'Marine Science',
       'Koe', 'Banking Studies', 'Business Administration', 'Usuluddin ',
       'TAASL', 'ALA', 'Biomedical Science', 'BENL', 'CTS', 'Economics',
       'MHSC', 'Malcom', 'Kop', 'Human Sciences', 'Biotechnology',
       'Communication', 'Diploma Nursing', 'Radiography', 'Fiqh Fatwa',
       'Diploma TESL', 'Nursing'], dtype=object)
# Correcting format of current year

working_data.replace({'year 1' : 'Year 1',
                    'year 2' : 'Year 2',
                    'year 3' : 'Year 3',
                    'year 4' : 'Year 4'}, inplace=True)
working_data['current_year'].unique()
array(['Year 1', 'Year 2', 'Year 3', 'Year 4'], dtype=object)
# Removing trailing white spaces from cgpa column

working_data['cgpa'] = working_data['cgpa'].apply(lambda x:x.strip())
working_data['cgpa'].unique()
array(['3.00 - 3.49', '3.50 - 4.00', '2.50 - 2.99', '2.00 - 2.49',
       '0 - 1.99'], dtype=object)

Exploratory Data Analysis

plt.figure(figsize=(14, 5))
plt.title('Gender Distribution')
plt.pie(working_data['gender'].value_counts(),labels=working_data['gender'].value_counts().index, autopct='%1.1f%%', explode=(0.025,0.025), labeldistance=0.5)
plt.show()

png

Analysis from above chart

  1. The survey exhibits a higher participation rate among women, which may introduce a potential gender bias in the resulting analysis.
plt.figure(figsize=(4,4))
plt.title('Age Distribution')
plt.bar(x=working_data['age'].value_counts().index, height=working_data['age'].value_counts().values)
plt.show()

png

plt.figure(figsize=(4,4))
plt.title('Age Distribution')
plt.bar(x=working_data['current_year'].value_counts().index.tolist(), height=working_data['current_year'].value_counts().values.tolist())
plt.show()

png

Analysis from above chart

  1. The survey predominantly includes college freshmen, who are primarily aged 18 and 19, which may introduce a potential bias towards the experiences and perspectives of this specific group.
fig, axis = plt.subplots(2,2, figsize=(12,12))
fig.suptitle("Course Distribution across Student Cohorts",fontweight="bold", size=12)

current_year_list = sorted(working_data['current_year'].unique().tolist())

counter = 0

for i in range(2):
    for j in range(2):
        x = working_data[working_data['current_year'] == current_year_list[i]].groupby('course')['gender'].count().sort_values(ascending=False).head().index.to_list()
        y = working_data[working_data['current_year'] == current_year_list[i]].groupby('course')['gender'].count().sort_values(ascending=False).head().values.tolist()
        axis[i,j].bar(x, y)
        axis[i,j].set_title(current_year_list[counter])
        axis[i,j].set_xticklabels(x, rotation=20)

        counter += 1
/var/folders/n4/669_syrx0qzf2gktz416h1d00000gn/T/ipykernel_12231/3018482063.py:14: UserWarning: FixedFormatter should only be used together with FixedLocator
  axis[i,j].set_xticklabels(x, rotation=20)
/var/folders/n4/669_syrx0qzf2gktz416h1d00000gn/T/ipykernel_12231/3018482063.py:14: UserWarning: FixedFormatter should only be used together with FixedLocator
  axis[i,j].set_xticklabels(x, rotation=20)
/var/folders/n4/669_syrx0qzf2gktz416h1d00000gn/T/ipykernel_12231/3018482063.py:14: UserWarning: FixedFormatter should only be used together with FixedLocator
  axis[i,j].set_xticklabels(x, rotation=20)
/var/folders/n4/669_syrx0qzf2gktz416h1d00000gn/T/ipykernel_12231/3018482063.py:14: UserWarning: FixedFormatter should only be used together with FixedLocator
  axis[i,j].set_xticklabels(x, rotation=20)

png

working_data[working_data['current_year'] == current_year_list[i]].groupby('course')['gender'].count().sort_values(ascending=False).head().index.to_list()
['Engineering', 'BCS', 'Koe', 'Pendidikan Islam', 'Biomedical Science']
x = working_data.groupby('course')['gender'].count().sort_values(ascending=False).head().index.to_list()
y = working_data.groupby('course')['gender'].count().sort_values(ascending=False).head().values.tolist()
plt.title("Course Distribution across Student Cohorts")
plt.bar(x, y)
plt.show()

png

Analysis from above chart

  1. The survey primarily focuses on the participation of students studying Engineering and Computer Science disciplines, which may result in a potential bias towards the perspectives and experiences of individuals within these fields.
sns.catplot(data=working_data, x="current_year", hue='has_anxity', kind="count")\
    .set(title="Mental Health and Academic Years in College: Anxiety", \
        xlabel='Current Year',\
        ylabel='Count of Students')
<seaborn.axisgrid.FacetGrid at 0x7ff15070e6e0>

png

Analysis from above chart

  1. Analyzing Anxiety Levels among College Students: A Majority with No Reported Symptoms.
sns.catplot(data=working_data, x="current_year", hue='has_depression', kind="count")\
    .set(title="Mental Health and Academic Years in College: Depression", \
        xlabel='Current Year',\
        ylabel='Count of Students')
<seaborn.axisgrid.FacetGrid at 0x7ff14eea2890>

png

Analysis from above chart

  1. Examining Depression Rates among College Students: Majority Reporting No Symptoms
sns.catplot(data=working_data, x="current_year", hue='has_panic_attack', kind="count")\
    .set(title="Mental Health and Academic Years in College: Panic Attack", \
        xlabel='Current Year',\
        ylabel='Count of Students')
<seaborn.axisgrid.FacetGrid at 0x7ff1506c4f40>

png

Analysis from above chart

  1. Exploring the Prevalence of Panic Attacks among College Students: Majority Reporting No Incidents
sns.catplot(data=working_data, x="age", hue='has_panic_attack', kind="count")\
    .set(title="The Impact of Age on Panic Attack", \
        xlabel='Age',\
        ylabel='Count of Students')
<seaborn.axisgrid.FacetGrid at 0x7ff155495450>

png

Analysis from above chart

  1. Exploring the Prevalence of Panic Attacks among College Students: Majority Reporting No Incidents
sns.catplot(data=working_data, x="age", hue='has_anxity', kind="count")\
    .set(title="The Impact of Age on Anxity", \
        xlabel='Age',\
        ylabel='Count of Students')
<seaborn.axisgrid.FacetGrid at 0x7ff15266e530>

png

Analysis from above chart

  1. Examining Anxiety Levels among College Students: Majority Report No Symptoms
sns.catplot(data=working_data, x="age", hue='has_depression', kind="count")\
    .set(title="The Impact of Age on Anxity", \
        xlabel='Age',\
        ylabel='Count of Students')
<seaborn.axisgrid.FacetGrid at 0x7ff154592c20>

png

Analysis from above chart

  1. Depression Prevalence among College Students: Majority Report No Symptoms
print('has_depression')
print(working_data[working_data['has_depression'] == 'Yes']['has_anxity'].value_counts())
print('\nno_depression')
print(working_data[working_data['has_depression'] == 'No']['has_anxity'].value_counts())
has_depression
has_anxity
Yes    18
No     17
Name: count, dtype: int64

no_depression
has_anxity
No     49
Yes    16
Name: count, dtype: int64
sns.catplot(data=working_data, x="has_depression", hue='has_anxity', kind="count")
<seaborn.axisgrid.FacetGrid at 0x7ff154575000>

png

Analysis from above chart

  1. Students with depression have an equal likelihood of experiencing and not experiencing anxiety.
  2. Students without depression have a lower probability of having anxiety.
  3. The majority of college students neither have anxiety nor depression. These points summarize the relationship between anxiety, depression, and college students, highlighting the equal chance of anxiety with depression, a lower chance without depression, and the majority being unaffected by either condition.
sns.catplot(data=working_data, x="has_panic_attack", hue='has_anxity', kind="count")
<seaborn.axisgrid.FacetGrid at 0x7ff15266f130>

png

print('has_panic_attack')
print(working_data[working_data['has_panic_attack'] == 'Yes']['has_anxity'].value_counts())
print('\nno_panic_attack')
print(working_data[working_data['has_panic_attack'] == 'No']['has_anxity'].value_counts())
has_panic_attack
has_anxity
No     20
Yes    13
Name: count, dtype: int64

no_panic_attack
has_anxity
No     46
Yes    21
Name: count, dtype: int64

Analysis from above chart

  1. Students who experience panic attacks have a slightly higher than 50% chance of also having anxiety.
  2. Students who do not experience panic attacks have a less than 50% chance of having anxiety.
  3. The majority of college students neither have anxiety nor panic attacks.
print('has_depression')
print(working_data[working_data['has_depression'] == 'Yes']['has_panic_attack'].value_counts())
print('\nno_depression')
print(working_data[working_data['has_depression'] == 'No']['has_panic_attack'].value_counts())
has_depression
has_panic_attack
No     18
Yes    17
Name: count, dtype: int64

no_depression
has_panic_attack
No     49
Yes    16
Name: count, dtype: int64
sns.catplot(data=working_data, x="has_depression", hue='has_panic_attack', kind="count")
<seaborn.axisgrid.FacetGrid at 0x7ff1545903a0>

png

Analysis from above chart

  1. Students with depression have an equal likelihood of experiencing and not experiencing panic attacks.
  2. Students without depression have a lower probability of having panic attacks.
  3. The majority of college students neither have depression nor panic attacks.
fig, axis = plt.subplots(1,3, figsize=(10, 6))
fig.suptitle("Anxiety, Depression, Panic Attacks, and Doctor Visits: A Comparative Analysis",fontweight="bold", size=12)

column_list = ['has_anxity', 'has_depression', 'has_panic_attack']

for i in range(len(column_list)):
    x = working_data[working_data[column_list[i]] == 'Yes']['visited_specialist'].value_counts().index.tolist()
    y = working_data[working_data[column_list[i]] == 'Yes']['visited_specialist'].value_counts().values.tolist()
    axis[i].bar(x, y)
    axis[i].set_xlabel(column_list[i])

png

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published