working with csv files

March 31, 2021

1. CSV files

i. reading csv files from local storage

df=pd.read_csv('healthcare-dataset-stroke-data.csv')

ii. making your own column instead of pandas default column

df=pd.read_csv('healthcare-dataset-stroke-data.csv',index_col='id')

iii. reading file from some link

import requests

from io import StringIO

url = "link name"

headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:66.0) Gecko/20100101 Firefox/66.0"}

req = requests.get(url, headers=headers)

data = StringIO(req.text)

pd.read_csv(data)

iv.if there is no column name or column name become as a temple becoming first row value

pd.read_csv('test.csv',header=1)

v.if you want to select only some of the columns during reading of data

pd.read_csv('aug_train.csv',usecols=['enrollee_id','gender','education_level'])

vi. want to load only few rows

pd.read_csv('aug_train.csv',nrows=100)

vi. if you see any abnormal emojis datasets then use encoder -encoding encodes to particular language

pd.read_csv('zomato.csv',encoding='latin-1')

vii. if there is parser error: sometimes more no. of values in columns than expected:skip bad lines

pd.read_csv('BX-Books.csv', sep=';', encoding="latin-1",error_bad_lines=False)

viii.if you want to change datatype of column during reading csv files

pd.read_csv('aug_train.csv',dtype={'target':int})

ix.if there is date in your dataset then while reading date will be string : so date should be date instead

you use following method to make date=date:

pd.read_csv('IPL Matches 2008-2020.csv',parse_dates=['date_col_name'])

x. some values will be there in your dataset like NaN-,NaN% etc. which problems you the most so in order to convert them into null values i.e actual NaN use the following method:

pd.read_csv('aug_train.csv',na_values=['Male',])

here, Male is converted to NaN value

Search This Blog

100 days ML

working with csv files

Comments

Post a Comment

Popular posts from this blog

proceduce for data analysis some basic functions

working with json and sql data

bivarient analysis