Saturday, November 19, 2011

R-blog post: fine-tuning read.csv for NAs and other issues

The blog indiacrunchin has a helpful (if short and terse) post on how to deal with complicated data read-in issues.

The key bits of code:

"stringsAsFactors = FALSE mitigates character columns being turned into factor type"

Tell R what different NA values are in the dataset: eg, "na.strings = c("","999","—-","MISS"))"

Not sure exactly what he means here:
"function argument colClasses predefine the column types in the input file"
inData <- read.csv("inputData.csv",header=T,colClasses = c("numeric","character"))

sapply(inData,class)



No comments:

Post a Comment