• TCGA data is re-aligned to a ref genome (eg. hg38) before being stored in to repo.
  • Project means different cancers. Case means different patients. File means sequencing measurements. So each case can have many files. Projects are the highest level of data model.
  • Apply different filters to select the kind of data we are intersted.
  • Metadata and Manifest can be downloaded and used in gdc-client or python API or Bioconductor API