working with csv files
1. CSV files
i. reading csv files from local storage
df=pd.read_csv('healthcare-dataset-stroke-data.csv')
ii. making your own column instead of pandas default column
df=pd.read_csv('healthcare-dataset-stroke-data.csv',index_col='id')
iii. reading file from some link
import requests
from io import StringIO
url = "link name"
headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:66.0) Gecko/20100101 Firefox/66.0"}
req = requests.get(url, headers=headers)
data = StringIO(req.text)
pd.read_csv(data)
iv.if there is no column name or column name become as a temple becoming first row value
pd.read_csv('test.csv',header=1)
v.if you want to select only some of the columns during reading of data
pd.read_csv('aug_train.csv',usecols=['enrollee_id','gender','education_level'])
vi. want to load only few rows
pd.read_csv('aug_train.csv',nrows=100)
vi. if you see any abnormal emojis datasets then use encoder -encoding encodes to particular language
pd.read_csv('zomato.csv',encoding='latin-1')
vii. if there is parser error: sometimes more no. of values in columns than expected:skip bad lines
pd.read_csv('BX-Books.csv', sep=';', encoding="latin-1",error_bad_lines=False)
viii.if you want to change datatype of column during reading csv files
pd.read_csv('aug_train.csv',dtype={'target':int})
ix.if there is date in your dataset then while reading date will be string : so date should be date instead
you use following method to make date=date:
pd.read_csv('IPL Matches 2008-2020.csv',parse_dates=['date_col_name'])
x. some values will be there in your dataset like NaN-,NaN% etc. which problems you the most so in order to convert them into null values i.e actual NaN use the following method:
pd.read_csv('aug_train.csv',na_values=['Male',])
here, Male is converted to NaN value
Comments
Post a Comment