Today I had a crash course in statistical analysis from Yanwen, a member of the team who recently immigrated here from China and is basically a computer genius. She is starting to analyze the data set I spent the last two weeks helping collect, organize, and clean. The program she uses, SAS, has its own programming language full of semicolons and weird abbreviations. Apparently there are much more user-friendly programs, but SAS is for “hardcore” statisticians who need to do a lot of comparison across different variables and be very precise in determining the quality of the data.
So, this afternoon she took me through the process of checking for inconsistencies in the data. She started by cross-referencing the individuals who never got fully interviewed to see if there were significant differences based on several factors (age, race, gender, etc.). The only real difference was in the grade of peoples’ tumors, which makes sense, because people with more severe tumors are less likely to agree to sit through a forty-five minute interview, and even if they do, they might not get a chance to before they (as the medical charts put it) “expire.” She judged the significance of the difference using a chi-squared value, which is something we talked about in AP Bio and so I actually kind of knew what was going on at that point. Then Yanwen started doing “logistical regression” and I was totally lost. But tomorrow she says she will let me try out some of the coding, a lot of which is pretty straightforward if/then statements. The program is friendly in that it says “ERROR!” in bright red if you make a typo, so that should be helpful.
It was cool to see more of the purely mathematical side of the project. This team spans such a wide range of skill sets, from interacting with grieving family members to interacting with spreadsheets, and it’s great to get to see how all the different elements work together.
No comments:
Post a Comment