univariant analysis
it is the analysis done with a single column to know the information of single column
for categorical data
simply it is bar graph plot plotted \
for eg. if you want to find out how many passenger were male in titanic died and how many females were female
import seaborn as sns
2. to watch in pie chart
for numerical data
1. histogram
it shows all the data variation within that data
for age:
import matplotlib.pyplot as plt
you can also eliminate bins: bins is for the visual easy of plot
here you can see and make conclusion: age of 0 and just above was few in titanic eldest person waere also few in no. but middle aged person were max in no.
to analyze data in terms of probability:
what is the probability that age=40 years be inside titanic? from the data:
from the graph you can say that almost 15% is the probability that people having 40 yrs would be on the titanic
3. Boxplot
boxplot consists:
boxplot consists:
I. median: it divides data to half
ii. 1st quartile: left hand side 25% data and other side 75% data;
iii. 3rd quartile: left hand side 75% data and other side 25% data;
iii. 3rd quartile: left hand side 75% data and other side 25% data;
minimum: it is the data calculated using formula(Q1-1.5*IQR)
-lies left hand side in in front of Q1
maximum: it is the data calculated using formula(Q1+1.5*IQR)
-lies right hand side behind Q3
note: they must be in your box plot
how box plot seems like?
there is outliers in your data which are not required: they are noisy data you can remove by analyzing box plot
outliers: unnecessary data which doesn't fit in your data range
in the data you can see upper outliers are there and you can remove them they are not necessaryyou can calulate mean median standard deviation of particular column
1. df['age'].mean()
1. df['age'].min()1. df['age'].max()etc.
Post a Comment