

Note: data of type chr is actually 'characters' ( or strings), you know the column is categorical after you examine it, now the chr data type allow you to do the follow R script : ( tip : use the R script first before you change the column type to factor ) # replacing (none) to direct and replace Email -> email used when the column is of class =charĭata $medium 'direct' Ĭhange all capital Email -> email ( or email-> Email), here I choose to move Email -> emailĪnd (none)-> direct to avoid brackets, like below R script shown Step 3 :clean up the column 'medium' a little bit by. "(none)" NA "Display" "Email" "cpc" "email" "organic" "referral" "banner" "partner" unique(data $medium) # use unique to find out what are the values in this column Notice that you have 3 categorical columns( data type =chr columns =medium, Device and Label), it would be interesting to see the unique values inside for each categorical column, starting with medium as an example. Or you can use R script like this library(readr) # the library you need to import data from csvġ 1 (none) 0 mobile 1 21 519 1 13 NonPurchaserĢ 2 (none) 0 tablet 1 15 429 1 5 NonPurchaserģ 3 (none) 0 desktop 1 7 116 1 4 NonPurchaserĤ 4 (none) 0 mobile 1 8 174 1 7 NonPurchaserĥ 5 (none) 0 desktop 1 7 825 1 3 NonPurchaserĦ 6 (none) 0 desktop 1 0 298 1 10 NonPurchaser You can of cause use RStudio's native import function to import your dataset, like this Step 1 : import/load the raw data into your workspace ! Note: if you have not yet install R + RStudio, you should visit below link (for Windows) to make the installation as well as all the packaged used in this post. This should make things a bit easier to follow in steps below, at least I hope so :) In all cases, I tried to have a flow and a goal to work toward to ( i.e fitting a neuralnet as the end goal). I created below post with R script in it, so you can copy and paste it directly using, of cause, your own dataset :) With an intention to assist you in getting a head-start on how to deal with raw data in general. >_<.Īnd in our daily work as a data scientist, data wrangling takes most of the time ( I would dare say about 80-90% of the time before I even consider fitting any models at all ) as you guys might have quickly realized already. I have been thinking to write some more useful posts from a data wrangling perspective.
