RNA Seq Atlas - Help

Help Sections

Data access and query tools

Material / Data Sources

Methods

Use Cases

 

 

Data access and query tools

Data section:
Table view of all 32384 entries within the RNA-Seq database. 50 entries per page were displayed be default, eligible the value can be changed to 100, 500, 1000 per page. Furthmore a sorting option is supported whereas the user can sort by Symbol, Transcript, Entrez Gene Id or EnsEmbl Gene Id.

Search section:

  • Fullt text search - Single or multiple gene names or ids can be entered into the query form. After the query is submitted the resulting gene(s) will be displayed in the result table.

  • Comparison of specific tissues profiles - Explore individual expression patterns of genes within normal or pathological tissues. Expression values of RNA-Seq data were represented by RPKM and microarray by Z-Score values. A Z-Score >=5 may suggest that the gene is expressed within the corresponding tissue.

  • Explore common (and diverse) gene expression profiles between tissues - Information about shared genes within diverse tissues is supported within this query interface.

  • Explore translational profile - Three lists were supported which represent all entries of the gene, pathway and ontology categories represented in RNA-Seq Atlas. Single or multiple entries of one list can be selected, also selections between the categories are possible

  • All entries of the gene, pathway and ontology categorized entries are listed in this query form. The user can explore all e.g. selecting one or multiple KEGG pathway resulting in a list of involved genes.

Download section:
Download RNA-Seq Atlas in tab separated text file format. All data are open access.

Material / Data Sources

RNA-Seq:
The provided genome-wide expression compendium originates from eleven, healthy, human tissues samples pooled from multiple donors spanning 32384 specific transcripts corresponding to 21399 unique genes. The tissues include adipose, colon, heart, hypothalamus, kidney, liver, lung, ovary, skeletal muscle, spleen and testes. Total RNA from the reference tissues were purchased from Ambion (Austin, USA) and represent a pool of RNA from multiple donors. Libraries were prepared as described in Armour et al, 2009, including both poly[A]+ and poly[A]- fractions.

Sequencing was performed on an Illumina GA-II sequencer, generating. An average of 50 million reads per tissue, with sequence reads of 36 nt or 50 nt depending on tissue and deposited at EMBL (ENA ERP000257; ArrayExpress E-MTAB-305). After trimming reads to a common length of 28 nt to avoid aligning sequences of amplified primers, the obtained reads were aligned to the human hg18 genome assembly using BWA (Li et al., 2009). For mRNAs, RefSeq transcript coordinates and associated gene symbols were downloaded from the UCSC genome browser. Only the reads mapping to a single gene were used. Next, we determined the reads overlapping each transcript in the correct genomic orientation. The expression levels were estimated by mapping and counting reads to single gene sequences derived from the UCSC genome browser followed by normalization to RPKM values. For ncRNAs, the corresponding genomic coordinates were downloaded from the two tracks in the UCSC genome browser, assembly hg18, tracks RNA Genes and sno/miRNAs. For this analysis, pseudogenes, miRNAs, tRNAs, and rRNAs were removed. Genes labeled as “related” were combined, such as 7SK and 7SK-related, into a single cluster while preserving all genomic locations (Castle et al., 2010).

Microarray:
Multiple Microarray experiments, representing normal and disease (cancer) states, were included into RNA-Seq Atlas to enable an integrative detailed comparison between RNA-Seq and microarray expression profiles. The BioGPS and 'Normal Tissue Gene Expression Study' gene profiles were used as basis for normal and the NCI60 gene profiles for pathological states.
To ensure homogenous and comparable gene expression profiles only microarrays from the Affymetrix Human Genome U133A Array platform were integrated into RNA-Seq Atlas. The Human Genome U133 (HG-U133) Set contains almost 45,000 probe sets representing more than 39,000 transcripts derived from approximately 33,000 well-substantiated human genes. The data were downloaded in CEL file format from the Gene Expression Omnibus web-interface of the NCBI. Furthermore, normal tissue arrays were limited to tissues represented in RNA-Seq Atlas. Thus, 16 gene expression profiles of the BioGPS project were included, 9 of the 'Normal tissue gene expression study' and 59 (LC:NCI_H23 microarray experiment failed) of the NCI60 (supplement table1). The included tissues representing the normal state are as followed: colon, heart, hypothalamus, kidney, liver, lung, ovary, skeletal muscle, spleen and testes; Cancer tissues: breast, CNS, colon, kidney, leukemia, lung, melanoma, ovary and prostate.

Methods

The analysis were implemented in R-Project using the bioconductor libraries affy, hgu133a.db and fRMA (McCall 2011, McCall 2010).

Processing Affymetrix HG U133A:
After loading gene expression profiles into an AffyBatch the phenotype data were assigned. Next, background correction, normalization and summarization was done by applying the frma function of the fRMA package to the AffyBatch with default options.

Z-Score Transformation:
The Z-Score transformation was calculated using the barcode function of the fRMA (McCall et al. 2011, McCall et al. 2010) package to standardize the gene values of the Microarray data. Barcode options were set to the corresponding platform and the output method was set to 'z-score'. Next, the Z-Scores were summarized via mean for each tissue and each pathological state (normal, cancer), whereas a Z-Score > 5 suggests that the gene is expressed in that tissue. Finally, the Z-Score was summarized via mean for each tissue and state (normal, cancer) and stored within the PostgreSQL database.

Use Cases

Use Case 1: Biomarker revaluation:

Provided that a researcher is interested in the evaluation of an identified biomarker (e.g. cancer). A keyword search at the RNA-Seq. Atlas enables a quick and integrative overview of the tissue specific expression of the gene of interest in cancer and normal tissue.

This query will provide important information on different layers of translational biology:

  1. By comparing the expression of the marker in the tissue of interest the researcher will get immediate information on the potential of the biomarker to assess pathological states.
  2. By comparing the expression of the biomarker in different tissue the researcher can estimate tissue specificity.

Query pipeline:

  1. Open the ‘Fulltext Search’ of RNA-Seq Atlas and enter BioMarker of interest (e.g. EPCAM)
  2. Execute query and use the preview advantage of the RNA-Seq Atlas result table by ‘mouseover’ over the expression charts to get a quick overview about the expression.
  3. Proceed to the details section by clicking on the magnifier glass and compare the expression of the biomarker in different tissue.

Thus, the RNA-Seq. Atlas reveals important information of both the diagnostic and therapeutic potential of the identified marker as a basis for subsequent and more detailed investigations.

Use Case 2: Identification of liver specific genes:

In case that the researcher is interested in a transcriptional profile highly specific for a healthy tissue which could deal as a diagnostic tool to distinguish between a healthy or diseased tissue, he can query the RNA-Seq Atlas.

This query will provide important information to tissue specific genes representing normal cell behavior.

  1. By comparing the expression of the tissue of interested in comparison to the expression of the reference tissues the researcher will get immediate information on genes exclusively expressed within the tissue of interested.

Query pipeline:

  1. Open the ‘Compare specific tissue profile’ search of RNA-Seq Atlas and define appropriate cutoff values (e.g. in case for a highly specific liver transcriptional profile liver >= 10 and reference tissue <= 2).
  2. Execute query

Thus, the RNA-seq. Atlas provides integrative information of genes physiologically and specifically expressed within a given tissue. This enables a context specific view on genes that can be used to study disease states. Furthermore, investigations on pathway and functional levels can be revealed by navigating to the details section.