The project was funded by NSF EAGER grant #
1248090.
|
|
The hypothesis for this project
is presented in "Goff SA (2011) A unifying theory for general multigenic heterosis: energy efficiency, protein metabolism, and implications for molecular breeding. New Phytol 189: 923-937."
Download
- The dataset is 20 libraries (5 strains x 4 tissues).
The strains
are two mouse inbreds (C57BL/6J (B6) and BALB/cByJ (Bc)) and three B6/Bc hybrids (two young and one old).
The tissues are brain, kidney, liver and muscle.
- The libraries were sequenced by postdoc Qi Cai (Goff lab) in collaboration with
Arizona Genomics Institute.
- The reads were aligned using the iPlant
cyberinfrastructure and transferred to the Soderlund lab, where further processing was performed with the
Allele Workbench (AW) and the results entered into the mouse AW database.
- The AW software was used to select ASE (allele specific expression) transcripts,
and the sequence pairs (i.e. the two inbred sequences) have been sent to the
Cheng lab to test for folding.
The data can be queried from the following two Java applets:
- Execute mouse AW v1.0
(Allele WorkBench11) -- Allele Specific Expression
- Execute mouse TCW v1.6
(Transcriptome Computational Workbench12) -- Differential Expression between libraries
Updated with UniProt and GO Oct-2016. See overview of TCW mouse database.
Resources
- C57BL/6J is the reference sequence (GRCm38p2), and was downloaded from
Genbank.
- The reference annotation was downloaded from
Ensembl.
- The alternative BALB/cJ (a strain closely related to cByJ) SNPs and Indels were downloaded from the
Sanger Centre.
- The C57BL/6J protein sequences were downloaded from
Ensembl.
Software
-
fastx_trimmer and
Sickle were used for trimming.
- TopHat aligndd the reads to the genome for subsequent variant processing.
Trapnell C, Pachter L, Salzberg SL (2009) TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25: 1105-1111.
- Samtool mpileup determine the read coverage for the variants.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R (2009) The Sequence Alignment/Map format and SAMtools. Bioinformatics 25: 2078-2079.
- GATK was used to find unique SNPs for cByJ, since the SNPs and Indels are from the BALB/cB genome.
DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, Philippakis AA, del Angel G,
Rivas MA, Hanna M, McKenna A, Fennell TJ, Kernytsky AM, Sivachenko AY, Cibulskis K, Gabriel SB, Altshuler D,
Daly MJ (2011) A framework for variation discovery and genotyping using next-generation DNA sequencing data.
Nat Genet 43: 491-498.
- Ensembl Variant Predictor determined the effect of the SNPs: (1) consequence (e.g.
missense), (2) SIFT and PolyPen scores for changes to protein sequences (referred to as 'damaging'
in HW).
McLaren W, Pritchard B, Rios D, Chen Y, Flicek P, Cunningham F (2010) Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor. Bioinformatics 26: 2069-2070.
- STAR aligned the reads to the genome for subsequent read calling, as it reports multiple
mapped reads necessary for eXpress.
Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR (2013) STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29: 15-21.
- eXpress assigned reads to transcripts, where it called both total reads (for TCW)
and transcript allele reads (for HW).
Roberts A, Pachter L (2013) Streaming fragment assignment for real-time analysis of sequencing experiments. Nat Methods 10: 71-73.
- edgeR computed the differential expression between the reference and alterative counts
for both the HW SNPs and reads, and between TCW transcript libraries.
Robinson MD, McCarthy DJ, Smyth GK (2010) edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26: 139-140.
- The alternative transcripts were created using a script in the RSEM package (the reference transcripts
were downloaded from Ensembl).
Li B, Dewey CN (2011) RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12: 323.
- BEDtools was used for various reformatting.
Quinlan AR, Hall IM (2010) BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26: 841-842.
- The AW pipeline was used to perform various parts of the processing not
covered by the above software packages, e.g. masking the SNPs in the reference genome, converting
the mpileup output to variant coverage numbers. The AW Java build interface was used to compute allele
imbalance and build the database. The AW Java query interface was used to analyze the allele-specific expression.
Freely available at AW.
Soderlund C, Nelson W, Goeff S (2014) Allele Workbench: transcriptome pipeline and interactive graphics
for allele-specific expression. PLoS ONE.
Link
- TCW was used to find differential expressed libraries.
Freely available at TCW.
Soderlund C, Nelson W, Willer M, Gang DR (2013) TCW: transcriptome computational workbench. PLoS One 8: e69401.
|