univariant analysis
it is the analysis done with a single column to know the information of single column
for categorical data
1.countplot/bargraph
simply it is bar graph plot plotted \
for eg. if you want to find out how many passenger were male in titanic died and how many females were female
import seaborn as sns
sns.countplot(df['sex'])
df['Survived'].value_counts()
df['Embarked'].value_counts().plot(kind='bar')
2. to watch in pie chart
df['Embarked'].value_counts().plot(kind='pie')
for numerical data
1. histogram
it shows all the data variation within that data
for age:
import matplotlib.pyplot as plt
plt.hist(df['Age'],bins=5)
you can also eliminate bins: bins is for the visual easy of plot
result:
here you can see and make conclusion: age of 0 and just above was few in titanic eldest person waere also few in no. but middle aged person were max in no.
2.Displot
to analyze data in terms of probability:
what is the probability that age=40 years be inside titanic? from the data:
sns.distplot(df['Age'])
from the graph you can say that almost 15% is the probability that people having 40 yrs would be on the titanic
3. Boxplot
boxplot consists:
boxplot consists:
I. median: it divides data to half
ii. 1st quartile: left hand side 25% data and other side 75% data;
iii. 3rd quartile: left hand side 75% data and other side 25% data;
iii. 3rd quartile: left hand side 75% data and other side 25% data;
minimum: it is the data calculated using formula(Q1-1.5*IQR)
-lies left hand side in in front of Q1
maximum: it is the data calculated using formula(Q1+1.5*IQR)
-lies right hand side behind Q3
note: they must be in your box plot
how box plot seems like?
there is outliers in your data which are not required: they are noisy data you can remove by analyzing box plot
outliers: unnecessary data which doesn't fit in your data range
sns.boxplot(df['Age'])
in the data you can see upper outliers are there and you can remove them they are not necessaryyou can calulate mean median standard deviation of particular column
1. df['age'].mean()
1. df['age'].min()1. df['age'].max()etc.
Comments
Post a Comment