Some Examples From my Experience
Data Flow
Gene sequencing --> base calling

Data organization -- terrabytes of data, annual doubling, various users

Sequence/qual score extraction, filtering, cleanup

Assembly of short sequences into models of genetic expression & variability

  • between individuals (e.g., SNPs)
  • between tissues (splice variants)
  • between healthy & diseased tissue
  • Annotation -- what is it or what is it like?

