Tutorial
The science behind oRNAment
Protein-RNA interactions execute important roles in several biological functions, including RNA replication, repair, splicing, polyadenylation, capping, modification, export, localization, stability and degradation. Recent in vitro technologies, such as RNAcompete [1] and RNA Bind-n-Seq (RBNS) [2], have provided exquisite resources for identifying the binding preferences of RBPs.
We acquired all motifs from RNAcompete in the form a position weight matrix (PWM) and executed the RBNS algorithm on the sequencing data made available by ENCODE. As most motifs determined by RNAcompete were of length 7, we concentrated on those and set the output motif PWM length of RBNS to also be 7. Therefore, all motifs in the database are of length 7 nucleotides and are comparable (see motifs tab).
We developed a novel algorithm that allows us to scan for these motifs with yet unachieved efficiency (see algorithm tab). We then executed it for each RBP on the complete coding and non-coding transcriptomes of human (GRCh38) and 4 main model organisms described by Ensembl release 97 (C. elegans (WBcel235), D. rerio (GRCz11), D. melanogaster (BDGP6), M. musculus (GRCm38)). As there might by interesting evolutionary question answered with the help of this database, we scanned for each RBPs in all transcripts of every organism independently of whether a given RBP is expressed or not. We have shown that our method can reasonably predict the putative binding sites observed by eCLIP [3] in humans (See validation tab).
[1] Ray, D., et al. (2013) A compendium of RNA-binding motifs for decoding gene regulation. Nature.
[2] Lambert,N., et al. (2014) RNA Bind-n-Seq: Quantitative Assessment of the Sequence and Structural Binding Specificity of RNA Binding Proteins. Mol Cell, 54, 887.
[3] Nostrand,E., et al. (2016) Robust transcriptome-wide discovery of RNA-binding protein binding sites with enhanced CLIP (eCLIP). Nat Methods, 13.
The search algorithm scans each transcript and for each subsequence of length 7, the length of each motif, calculate its matrix similarity score (MSS). The MSS is defined as MSS = current_score – mimimum_score / maximum_score – minimum score. This will provide a value between 0 and 1 where 1 is the canonical motif. The score_current is defined as the product of each probability to observe a given nucleotide at a given position in the PWM. The score_maximum is the product of each maximum probability value in the PWM at each position. The score_minimum is the product of each minimum probability value in the PWM at each position.
Independently, the sum of all MSS score for each 4^7 possible substrings (k-mer) was calculated. A score, MSS’, is calculated by sequentially adding the MSS value of each k-mer incrementally. A score MSS’ % is obtained by dividing, for each k-mer, its MSS’ by the sum of all MSS’. A threshold can thus be calculated by taking the MSS scores representing up to 50% of all possible scores, by steps of 5%.
As shown in the figure, when searching for a given RBP motif (A) in a specific transcript (B), we can rapidly scan for each k-mer (C) and directly look it up in a table (D) in constant time, through the use of a dictionary, and populate the database with substrings passing the threshold as putative motif instances (E).
Table showing all RBP motifs scanned by oRNAment as motif logo separated by origin (RBNS or RNAcompete). Due to its nature, this table is very large, yet sparse, please scroll right to see more.
RBP | RBNS | RNAcompete | |||||||||||
A1CF |
|||||||||||||
A2BP1 |
|||||||||||||
ANIA04546 |
|||||||||||||
ANKHD1 |
|||||||||||||
aret |
|||||||||||||
asd1 |
|||||||||||||
AT1G76460TAIRG |
|||||||||||||
B52 |
|||||||||||||
BOLL |
|||||||||||||
bru3 |
|||||||||||||
BRUNOL4 |
|||||||||||||
BRUNOL5 |
|||||||||||||
BRUNOL6 |
|||||||||||||
CADANIAG00004740 |
|||||||||||||
CELF1 |
|||||||||||||
CG2931 |
|||||||||||||
CG2950 |
|||||||||||||
CG7804 |
|||||||||||||
CG7903 |
|||||||||||||
CG17838 |
|||||||||||||
CG33714 |
|||||||||||||
CNOT4 |
|||||||||||||
CPEB1 |
|||||||||||||
CPEB2 |
|||||||||||||
CPEB4 |
|||||||||||||
cpo |
|||||||||||||
Cscaffold4000008 |
|||||||||||||
DAZ3 |
|||||||||||||
DAZAP1 |
|||||||||||||
egw103003671 |
|||||||||||||
egw109003261 |
|||||||||||||
eIF2alpha |
|||||||||||||
EIF4B |
|||||||||||||
EIF4G2 |
|||||||||||||
elav |
|||||||||||||
ELAVL1 |
|||||||||||||
ELAVL4 |
|||||||||||||
ENOX1 |
|||||||||||||
ENSDARG00000058818 |
|||||||||||||
ENSGALG00000000814 |
|||||||||||||
ENSGALG00000003765 |
|||||||||||||
ENSGALG00000014267 |
|||||||||||||
ENSXETG00000007102 |
|||||||||||||
ENSXETG00000012802 |
|||||||||||||
ENSXETG00000018075 |
|||||||||||||
ESRP1 |
|||||||||||||
ESRP2 |
|||||||||||||
estExtfgeneshHS |
|||||||||||||
etr1 |
|||||||||||||
EWSR1 |
|||||||||||||
exc7 |
|||||||||||||
Fmr1 |
|||||||||||||
fne |
|||||||||||||
fox1 |
|||||||||||||
FUBP1 |
|||||||||||||
FUBP3 |
|||||||||||||
FUS |
|||||||||||||
Fusip1 |
|||||||||||||
FXR2 |
|||||||||||||
G3BP2 |
|||||||||||||
gw121961CONSTRUC |
|||||||||||||
gw184451 |
|||||||||||||
HNRNPA0 |
|||||||||||||
HNRNPA1 |
|||||||||||||
HNRNPA1L2 |
|||||||||||||
HNRNPA2B1 |
|||||||||||||
HNRNPAB |
|||||||||||||
HNRNPC |
|||||||||||||
HNRNPCL1 |
|||||||||||||
HNRNPD |
|||||||||||||
HNRNPDL |
|||||||||||||
HNRNPF |
|||||||||||||
HNRNPH2 |
|||||||||||||
HNRNPK |
|||||||||||||
HNRNPL |
|||||||||||||
hnRNPLL |
|||||||||||||
how |
|||||||||||||
Hrb27C |
|||||||||||||
Hrb87F |
|||||||||||||
Hrb98DE |
|||||||||||||
HRP1 |
|||||||||||||
IGF2BP1 |
|||||||||||||
IGF2BP2 |
|||||||||||||
IGF2BP3 |
|||||||||||||
ILF2 |
|||||||||||||
Khdrbs1 |
|||||||||||||
KHDRBS2 |
|||||||||||||
KHDRBS3 |
|||||||||||||
KHSRP |
|||||||||||||
lark |
|||||||||||||
LIN28A |
|||||||||||||
LmjF180180 |
|||||||||||||
LmjF320750 |
|||||||||||||
LmjF352550 |
|||||||||||||
LmjF354130 |
|||||||||||||
MAL8P140 |
|||||||||||||
MAL13P135 |
|||||||||||||
MATR3 |
|||||||||||||
MBNL1 |
|||||||||||||
mec8 |
|||||||||||||
mod |
|||||||||||||
msi |
|||||||||||||
MSI1 |
|||||||||||||
mubCONSTRUCT |
|||||||||||||
NAB2CONSTRUCT |
|||||||||||||
NCU02404 |
|||||||||||||
NOVA1 |
|||||||||||||
NUPL2 |
|||||||||||||
orb2 |
|||||||||||||
pAbp |
|||||||||||||
PABPC1 |
|||||||||||||
PABPC3 |
|||||||||||||
PABPC4 |
|||||||||||||
PABPC5 |
|||||||||||||
PABPN1 |
|||||||||||||
PABPN1L |
|||||||||||||
papi |
|||||||||||||
PCBP1 |
|||||||||||||
PCBP2 |
|||||||||||||
PCBP4 |
|||||||||||||
PF100214 |
|||||||||||||
PF130315 |
|||||||||||||
PFF0320c |
|||||||||||||
PFI1435w |
|||||||||||||
PPRC1 |
|||||||||||||
PRR3 |
|||||||||||||
PTBP1 |
|||||||||||||
PTBP3 |
|||||||||||||
PUF60 |
|||||||||||||
pUf68 |
|||||||||||||
pum |
|||||||||||||
PUM1 |
|||||||||||||
QKI |
|||||||||||||
qkr58E1 |
|||||||||||||
RALY |
|||||||||||||
RALYL |
|||||||||||||
RBFOX2 |
|||||||||||||
RBFOX3 |
|||||||||||||
RBM3 |
|||||||||||||
RBM4 |
|||||||||||||
RBM4B |
|||||||||||||
RBM5 |
|||||||||||||
RBM6 |
|||||||||||||
RBM8A |
|||||||||||||
RBM15B |
|||||||||||||
RBM22 |
|||||||||||||
RBM23 |
|||||||||||||
RBM24 |
|||||||||||||
RBM25 |
|||||||||||||
RBM28 |
|||||||||||||
Rbm38 |
|||||||||||||
RBM41 |
|||||||||||||
RBM42 |
|||||||||||||
RBM45 |
|||||||||||||
RBM46 |
|||||||||||||
RBM47 |
|||||||||||||
RBMS1 |
|||||||||||||
RBMS2 |
|||||||||||||
RBMS3 |
|||||||||||||
Rbp1 |
|||||||||||||
Rbp1like |
|||||||||||||
Rbp9 |
|||||||||||||
RC3H1 |
|||||||||||||
ref2 |
|||||||||||||
Rnp4F |
|||||||||||||
RO3G00049 |
|||||||||||||
Rox8 |
|||||||||||||
Rsf1 |
|||||||||||||
SAMD4A |
|||||||||||||
SART3 |
|||||||||||||
SF1 |
|||||||||||||
SF2 |
|||||||||||||
sf3b4 |
|||||||||||||
SFPQ |
|||||||||||||
shep |
|||||||||||||
sm |
|||||||||||||
Smp067420 |
|||||||||||||
SNRNP70 |
|||||||||||||
SNRPA |
|||||||||||||
Srp54 |
|||||||||||||
SRSF1 |
|||||||||||||
SRSF2 |
|||||||||||||
SRSF4 |
|||||||||||||
SRSF5 |
|||||||||||||
SRSF8 |
|||||||||||||
SRSF9 |
|||||||||||||
SRSF10 |
|||||||||||||
SRSF11 |
|||||||||||||
STARPAP |
|||||||||||||
sup12 |
|||||||||||||
sup26CONSTRUCT |
|||||||||||||
Sxl |
|||||||||||||
TAF15 |
|||||||||||||
TARDBP |
|||||||||||||
Tbg97234300 |
|||||||||||||
Tbg97262300 |
|||||||||||||
Tbg97294840 |
|||||||||||||
Tbg97295210 |
|||||||||||||
Tbg972111000 |
|||||||||||||
Tbg9721117950CONS |
|||||||||||||
THAPSDRAFT1841 |
|||||||||||||
TIA1 |
|||||||||||||
tiar1 |
|||||||||||||
tiar3 |
|||||||||||||
tra2 |
|||||||||||||
TRA2A |
|||||||||||||
TRNAU1AP |
|||||||||||||
TVAG002940 |
|||||||||||||
TVAG129710 |
|||||||||||||
TVAG267990 |
|||||||||||||
TVAG514790 |
|||||||||||||
U2AF2 |
|||||||||||||
U2af50 |
|||||||||||||
unc75 |
|||||||||||||
UNK |
|||||||||||||
YB1 |
|||||||||||||
YBX2 |
|||||||||||||
ZC3H10 |
|||||||||||||
ZC3H14CONSTRUCT |
|||||||||||||
ZCRB1 |
|||||||||||||
ZFP36 |
|||||||||||||
ZNF326 |
|||||||||||||
ZNF638 |
The genomic region of motif instances identified by oRNAment at a threshold of 50% greatly correspond to the binding regions observed by eCLIP, for both all peaks or only reproduced peaks, at a threshold of 3 fold change and a p-value of 0.001. Furthermore, the same number of random genomic regions as observed by oRNAment are seldom matching a binding region in eCLIP.
Please cite:
Louis Philip Benoit Bouvrette, Samantha Bovaird, Mathieu Blanchette, Eric Lécuyer oRNAment: A database of putative RNA binding protein target sites in the transcriptomes of model species. Nucleic Acid Research. Published 14 November 2019. https://doi.org/10.1093/nar/gkz986This site has been validated on Chrome, Firefox, Safari, and Vivaldi on both macOS and Windows environments. For best user experience we recommend Chrome or Vivaldi. Some functionality may not work if using Explorer on Windows.
This database was developed and is maintained by the Lecuyer Lab. Please email us for any comments, suggestions or bug reports.
The search by... forms
This tool allows you to search the database for a specific gene, transcript, or group of genes or transcripts, for which you would like to know all putative RBP motif instances.
Simply select your organism and the type of IDs you have. The motif status option allows the user to filter the database for only RBP that have direct or indirect motif evidence as assessed by Ray et al. Nature 2013 or to include every RBP motif available in the database. In the case of multiple genes, the input list should be in the form of a comma separated list or have one gene per line. A mix of comma and carriage return is acceptable.
This tool allows you to search the database for a specific RBP, at a specific similarity threshold between the PWM and the subsequence, for which you would like to know all its putative instances in all the coding and non-coding transcripts of the selected organism.
Simply select your organism and your RBP of interest. The slider allows you to select the MMS' percent that will be used as the threshold. The database contains all motif instances with matrix similarity scores adding to a MSS' of 50% in increments of 5%. As some motifs of a given RBP might not add enough weight to the MSS', some thresholds can return identical results. The number of results (instances) that will be returned at each step is indicated.
This tool allows you to search the database for a specific combination of attributes in a given species for which you would like to know all putative RBP motif instances.
Simply select your organism and your biotype or region of interest. The motif status option allows the user to filter the database for only RBP that have direct or indirect motif evidence as assessed by Ray et al. Nature 2013 or to include every RBP motif available in the database. When an attribute is incompatible with other selections, it is shown as a blocked (unclickable and greyed out) NA. When selecting the protein coding biotype, the region NA corresponds to the Ensembl annotation for unavailable information, as such it is selectable. D. melanogaster annotation does not have transcript biotypes that differ from their corresponding gene biotype, therefore the option is removed when this species is selected. The number of results (instances) that will be returned for each combination is indicated.
The results pages
The different search tools provide for various top-level graph analysis and a detailed table of all motif instances in each transcript. Kindly note that, by design, the database uses a "transcript-centric" view. Motifs instances coming from different isoform of a gene may have the same genomic coordinates. Therefore, a motif instance can be present more than once in the table, but will always represent two or more distinct transcripts of a gene.
When you search by transcript, while we executed the scan on all transcripts available from the Ensembl database, due to varying nomenclature and versioning, it is possible that a specific gene name/ID will not be found in oRNAment. If this case happens, a card detailing which IDs were not found is shown to help the user better refine their search.
The top-level figure summarises various aspects of the table. When an RBP has multiple motifs, they are grouped to simplify the summarization.
You can download the totality of the table to an Excel or CSV file. Please note that if there are more than 1,048,500 instances (lines), the Excel button will be disabled as this is the maximum number of lines of most Excel versions. The CSV is always available, but it takes a moment to generate.
When clicking on a gene ID, you are brought to the gene level detail page. This page will show the putative binding site of the RBP on the line it was clicked for that gene (i.e. if you click on ENSG000000000003 with RBP A1CF, you will see all motifs instances of A1CF in all isoforms of the TSPAN6 gene).
When clicking on a transcript ID, you are brought to the transcript level detail page. This page will show the putative binding site of the RBP on the line it was clicked for that isoform only (i.e. if you click on ENST00000373020 with RBP A1CF you will see all motifs instances of A1CF in this specific isoform of the TSPAN6 gene).
For a selected transcript, the 2-D structure, calculated by RNAfold with default argument, will also be shown. Please note that the structure is only shown for transcripts shorter than 30,000 nucleotides. For transcripts longer than a few thousand nucleotides, depending on your browser and system, the rendering may be slow. You may always click on the DOT-BRACKET button to download the sequence of structure of the transcript.
The genome browser (IGV)
oRNAment also offers the possibility for a user to browse the genome of a given organism and interactively visualize putative RBP instances in an embedded integrated genomic viewer (IGV) browser. Simply select your organism and up to 3 RBPs. Each of the tracks are fully interactive. Click and/or drag for a different view and further information.