Protein domains and color-coded functional SNP-predictions in Jbrowse

Jbrowse is a fast embeddable genome  browser built with JavaScript and HTML5.  A Jbrowse instance allows for displaying all kinds of genomic data, which facilitate research and development in plant breeding. Genetic variation is of particular importance, as this determines the different phenotypes that exist. Most genetic variation however is ‘neutral’ which means that it does not influence the coding DNA, that is: genes, which are translated into proteins. Some genetic variants however lead to changes in the amino acids in proteins, which can cause disruption of or changes in gene function.

Using the translated DNA sequence, which gives rise to the amino acid sequence, it is possible to search for conserved domains with a known function using NCBI’s conserved diamond database search. Knowing which conserved domains are present in a gene aids in determine the function of the gene. For this purpose, I wrote scripts that translate the output of NCBI CD-HIT to GFF3 files suitable for display in Jbrowse. For this, we need to ‘map’ the domain positions in the protein sequence back to the nucleotide sequence in the gene.

Here, I describe a Jbrowse instance containing tomato data, which contains all domain features per gene. Let’s take a look at a typical plant resistance, or R-gene.

Tomato putative R gene

Besides these domain features, I have also included the genetic variation between 360 tomato variants described in the paper “Genomic analyses provide insights into the history of tomato breeding“. As mentioned previously, it is important to distinguish genetic variants that have an impact on genes from those that are neutral. SNPeffect is a software package that enables the functional prediction of variants, which are categorized by their predicted impact on the protein impact as high, low, moderate or “modifier” which is neutral. I have color-coded these variants by potential impact:

  • High: different splicing or stop codon gained
  • Moderate: differen amino acid
  • Low & modifier: synonymous variant (e.g. not resulting in changed amino acid sequence)*

(Credits: I got a script for getting this to work from Richards D. Hayes)

As it is difficult and not desirable to visualize 360 SNPs in individual tracks, I have created a custom interface that allows for displaying the genetic variation in a pre-sorted table, which can be easily modified, exported or changed. To get this information right click on a SNP:

 

This should be the result:

 

If you did not do it already and you’ve made it this far, please take a look at Jbrowse.deenabio.com!

If you are interested in the features described here, please get in touch (thomas|AT|deenabio.com). I provide bioinformatics consultancy serviced that include making custom Jbrowse installations and related tasks.

 

 

Ready-made GBS adapters for Sale

Pre-annealed, equilibrated and tested GBS adapters will be available soon! Please send an email to sales@deenabio.com