proceduce for data analysis some basic functions

 1.df.shape() : to print row and columns

2.df.head()

3.df.sample(5): to print sample of data with random data,5 is for printing 5 data rows

4. df.info(): - to find data type of data  

                             object: string

                             intrger: int64(consumes less memory than float)

                             float: float64

                   - to find not null value in particular column

                  - memory space occupied by data

4. df.isnull().sum()

                   -to find total no. of missing value in all the column

5 df.describe(): to visualize data in mean std, deviation etc.

6.df.duplicated().sum()

       - to find no. of duplicate rows in datasets

     if so use drop duplicate function :google it

7. df.corr()

          to find corelation among multiple data:

          it gives us person corelation: -1<c<1: if more positive: than directly proportional;

           if more negative: inversely proportional

         values that is soo close to 0 i.e 0.0000034 wont create any difference in output: so you can eliminate those column

 

Comments

Popular posts from this blog

working with json and sql data