TCGA data mining
- TCGA data is re-aligned to a ref genome (eg. hg38) before being stored in to repo.
- Project means different cancers. Case means different patients. File means sequencing measurements. So each case can have many files. Projects are the highest level of data model.
- Apply different filters to select the kind of data we are intersted.
- Metadata and Manifest can be downloaded and used in gdc-client or python API or Bioconductor API