Why do we dummify variables fro Liner regressions ?
1) Why do you want to convert race into numbers? I'm assuming you want to do something like a regression model, is that correct? I'm going to assume you're asking how to handle "categorical data" (categories like different races) in regression. So, you want numerical variables, and you could just assign a number to each race. But, if you choose White=1, Black=2, Asian=3 then does it really make sense that the distance between White's and Black's is exactly half the distance between White's and Asian's? And, is that ordering even correct? Probably not. Instead, what you do is create dummy variables. Let's say you have just those three races. Then, you create two dummy variables: White, Black. You could also use White, Asian or Black, Asian; the key is that you always create one fewer dummy variables then categories. Now, the White variable is 1 if the individual is white and is 0 otherwise, and the Black variable is 1 if the individual is bl...
