Posts

ordinal and label encoding

 there are normally two types of data  1. ordinal data: those data which have any levels of dependencies i.e: for a degree column:                              bachelor degree is higher than high school; degree and master is greater than bachelor                      in such case we can level master as 3, Bachelor as 2  and high school as 1 and school as 0                              it has rank 2. Nominal data: those data which doesn't have any level of dependencies                             eg: car fuel type I.,e automatic and manual                                     gender:  either male or female...

Feature scaling-Standardization

all the data sets are brought to mean =0 and standard deviation=1;   datasets are brought to orifgin i.e mean=0  it is also called Z-distribution refer:untitled7 in jupyter notebook

bivarient analysis

 numerical numerical combination  

univariant analysis

Image
 it is the analysis done with a single column to know the information of single column for categorical data 1.countplot/bargraph  simply it is bar graph plot plotted \  for eg. if you want to find out how many passenger were male in titanic died and how many females were female                 import seaborn as sns                  sns . countplot ( df [ 'sex' ])        you can get it numerically through:          df['Survived'].value_counts()       df['Embarked'].value_counts().plot(kind='bar') 2. to watch in pie chart      df['Embarked'].value_counts().plot(kind='pie')         this doesn't show percentage inside : to watch percentage:     df [ ' Embarked ' ] . value_counts () . plot ( kind = 'pie' , autopct = ' %.2f ' ) for numerical data        1. histo...

proceduce for data analysis some basic functions

 1.df.shape() : to print row and columns 2.df.head() 3.df.sample(5): to print sample of data with random data,5 is for printing 5 data rows 4. df.info(): - to find data type of data                                object: string                              intrger: int64(consumes less memory than float)                              float: float64                    - to find not null value in particular column                   - memory space occupied by data 4. df.isnull().sum()                    -to find total no. of missing value in all the column 5 df.describe(): to visualize data in mean std, deviation ...

working with json and sql data

 to read json data:    df=pd.read_json("train.json") to real sql data:     sql data consists query that have insert command and those command you need to fetch in your ide of python using xampp server    step 1: download the data to your local machine   step 2: open xampp server   step 3: create database and upload the file of your designed file   step 4: install sql conector in your ide by typing the given code              !pip install mysql.connector   step 5: import mysql.connector   step 6: conn=mysql.connector.connect(host='localhost',user='root',password='',database='world')       this creates an object of sql format : localhost= host ip name, user=name of database, password=database password   step 7: pd.read_sql_query("SELECT * FROM CITY",conn)       select * from city is the query you want to show you can show any rquery as you want a...

working with csv files

 1. CSV files         i. reading csv files from local storage                df=pd.read_csv('healthcare-dataset-stroke-data.csv')        ii. making your own column instead of pandas default column               df=pd.read_csv('healthcare-dataset-stroke-data.csv',index_col='id')       iii. reading file from some link            import requests            from io import StringIO  url = "link name" headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:66.0) Gecko/20100101 Firefox/66.0"} req = requests.get(url, headers=headers) data = StringIO(req.text) pd.read_csv(data)    iv.if there is no column name or column name become as a temple becoming first row value      pd.read_csv('test.csv',header=1) v.if you want to select only some of the colum...