This report correspond to the project for the class Stat597A Bioinformatics for High Throuhput Experiments during the Spring semester in 2014. The project is a collaboration between Yong Jung (YJ) & Hannier Pulido (HP).
The following is the workflow of the data analysis and the person responsible for each step:
-
Access to CEL files (YJ & HP)
-
Quality Assessment on raw data (HP)
- RNA degradation plots
- Boxplots
- Density plots
-
Preprocess data (YJ & HP)
- Background correction and normalization
- Filtering absent genes
- Remove outliers and unidentified groups
-
Preliminary analysis (YJ)
- K-means clustering
-
Quality assessment of filtered data (HP)
- Boxplots
- Density plots
- MVA plots
-
Exploratory analysis (YJ & HP)
- K-means clustering
- HCA
- PCA
-
Random Forest for feature selection (HP)
-
Visualization of results (YJ & HP)
-
Functional enrichment analysis using Gene Ontology (YJ)
-
Literature review using the GO terms (YJ & HP)