Data section:
Table view of all 32384 entries within the RNA-Seq database. 50 entries per page were displayed be default, eligible the value can be changed to 100, 500, 1000 per page. Furthmore a sorting option is supported whereas the user can sort by Symbol, Transcript, Entrez Gene Id or EnsEmbl Gene Id.
Fullt text search - Single or multiple gene names or ids can be entered into the query form. After the query is submitted the resulting gene(s) will be displayed in the result table.
Comparison of specific tissues profiles - Explore individual expression patterns of genes within normal or pathological tissues. Expression values of RNA-Seq data were represented by RPKM and microarray by Z-Score values. A Z-Score >=5 may suggest that the gene is expressed within the corresponding tissue.
Explore common (and diverse) gene expression profiles between tissues - Information about shared genes within diverse tissues is supported within this query interface.
Explore translational profile - Three lists were supported which represent all entries of the gene, pathway and ontology categories represented in RNA-Seq Atlas. Single or multiple entries of one list can be selected, also selections between the categories are possible
All entries of the gene, pathway and ontology categorized entries are listed in this query form. The user can explore all e.g. selecting one or multiple KEGG pathway resulting in a list of involved genes.
Download section:
Download RNA-Seq Atlas in tab separated text file format. All data are open access.
RNA-Seq:
The provided genome-wide expression compendium originates from eleven, healthy, human tissues samples pooled from multiple donors spanning 32384 specific transcripts corresponding to 21399 unique genes. The tissues include adipose, colon, heart, hypothalamus, kidney, liver, lung, ovary, skeletal muscle, spleen and testes. Total RNA from the reference tissues were purchased from Ambion (Austin, USA) and represent a pool of RNA from multiple donors. Libraries were prepared as described in Armour et al, 2009, including both poly[A]+ and poly[A]- fractions.
Sequencing was performed on an Illumina GA-II sequencer, generating. An average of 50 million reads per tissue, with sequence reads of 36 nt or 50 nt depending on tissue and deposited at EMBL (ENA ERP000257; ArrayExpress E-MTAB-305). After trimming reads to a common length of 28 nt to avoid aligning sequences of amplified primers, the obtained reads were aligned to the human hg18 genome assembly using BWA (Li et al., 2009). For mRNAs, RefSeq transcript coordinates and associated gene symbols were downloaded from the UCSC genome browser. Only the reads mapping to a single gene were used. Next, we determined the reads overlapping each transcript in the correct genomic orientation. The expression levels were estimated by mapping and counting reads to single gene sequences derived from the UCSC genome browser followed by normalization to RPKM values. For ncRNAs, the corresponding genomic coordinates were downloaded from the two tracks in the UCSC genome browser, assembly hg18, tracks RNA Genes and sno/miRNAs. For this analysis, pseudogenes, miRNAs, tRNAs, and rRNAs were removed. Genes labeled as “related” were combined, such as 7SK and 7SK-related, into a single cluster while preserving all genomic locations (Castle et al., 2010).
Microarray:
Multiple Microarray experiments, representing normal and disease (cancer) states, were included into RNA-Seq Atlas to enable an integrative detailed comparison between RNA-Seq and microarray expression profiles. The BioGPS and 'Normal Tissue Gene Expression Study' gene profiles were used as basis for normal and the NCI60 gene profiles for pathological states.
To ensure homogenous and comparable gene expression profiles only microarrays from the Affymetrix Human Genome U133A Array platform were integrated into RNA-Seq Atlas. The Human Genome U133 (HG-U133) Set contains almost 45,000 probe sets representing more than 39,000 transcripts derived from approximately 33,000 well-substantiated human genes. The data were downloaded in CEL file format from the Gene Expression Omnibus web-interface of the NCBI. Furthermore, normal tissue arrays were limited to tissues represented in RNA-Seq Atlas. Thus, 16 gene expression profiles of the BioGPS project were included, 9 of the 'Normal tissue gene expression study' and 59 (LC:NCI_H23 microarray experiment failed) of the NCI60 (supplement table1). The included tissues representing the normal state are as followed: colon, heart, hypothalamus, kidney, liver, lung, ovary, skeletal muscle, spleen and testes; Cancer tissues: breast, CNS, colon, kidney, leukemia, lung, melanoma, ovary and prostate.
The analysis were implemented in R-Project using the bioconductor libraries affy, hgu133a.db and fRMA (McCall 2011, McCall 2010).
Processing Affymetrix HG U133A:
After loading gene expression profiles into an AffyBatch the phenotype data were assigned. Next, background correction, normalization and summarization was done by applying the frma function of the fRMA package to the AffyBatch with default options.
Z-Score Transformation:
The Z-Score transformation was calculated using the barcode function of the fRMA (McCall et al. 2011, McCall et al. 2010) package to standardize the gene values of the Microarray data. Barcode options were set to the corresponding platform and the output method was set to 'z-score'. Next, the Z-Scores were summarized via mean for each tissue and each pathological state (normal, cancer), whereas a Z-Score > 5 suggests that the gene is expressed in that tissue. Finally, the Z-Score was summarized via mean for each tissue and state (normal, cancer) and stored within the PostgreSQL database.
Use Case 1: Biomarker revaluation:
Provided that a researcher is interested in the evaluation of an identified biomarker (e.g. cancer). A keyword search at the RNA-Seq. Atlas enables a quick and integrative overview of the tissue specific expression of the gene of interest in cancer and normal tissue.
This query will provide important information on different layers of translational biology:
Query pipeline:
Thus, the RNA-Seq. Atlas reveals important information of both the diagnostic and therapeutic potential of the identified marker as a basis for subsequent and more detailed investigations.
Use Case 2: Identification of liver specific genes:
In case that the researcher is interested in a transcriptional profile highly specific for a healthy tissue which could deal as a diagnostic tool to distinguish between a healthy or diseased tissue, he can query the RNA-Seq Atlas.
This query will provide important information to tissue specific genes representing normal cell behavior.
Query pipeline:
Thus, the RNA-seq. Atlas provides integrative information of genes physiologically and specifically expressed within a given tissue. This enables a context specific view on genes that can be used to study disease states. Furthermore, investigations on pathway and functional levels can be revealed by navigating to the details section.