==================================================
Benchmark set for bacterial sRNA target predictors
            Supplementary files
==================================================

Adrien Pain, Daniel Gautheret
Universite Paris Sud
July 30, 2014

File description
================

TableS1-trusted-pairs.xlsx (Table S1)
--------------------------------------------------
List of trusted (experimentally validated) sRNA/target pairs used as
benchmark. Includes coordinates of published interactions, references
and evidence level.

NC_000913_CDS_-200nt_100nt.fa
-----------------------------
5' regions of E. coli K12 genes obtained by automatic extraction of
-200/+100 fragments around each start codon (4317 regions).
Strain: Escherichia coli str. K-12 substr. MG1655. 
Coordinates are given for assembly #2: NC_000913.2

NC_000913_RNA-seq_TSS.fa
--------------------------
5' regions of E. coli K12 genes extracted based on actual
RNAseq-derived TSS (1). For each TSS in Table S5 and S9 of Li et
al.'s paper (1), we extracted the region from the TSS to 100 nt past
the start codon. For genes not expressed in the RNAseq data or genes
preceeded by other genes in an operon, we extracted the -200/+100
region around ATG as above. (total: 4317 regions)
Strain: Escherichia coli str. K-12 substr. MG1655 

coli_sRNA_vx.fa
----------------
All coli sRNA sequences to be used for target prediction.

coli_targetpairs_Vx.tsv
coli_targetpairs_Vx_direct.tsv 
-------------------------------
Flat file extracts from table of true target pairs, for programmatic
usage.  The "direct" version includes only those pairs supported by
direct experimental evidence. The "direct" version includes only those
pairs supported by direct experimental evidence.

coli_pairs_Vx_with_compMut_realUTR.tsv
coli_pairs_Vx_with_compMut_defaultUTR.tsv
-----------------------------------------
Files containing the list of base pairs with experimental compensatory
mutation support (same references as in coli_targetpairs above).
Relative coordinates of each individual base pair are provided, in the
order: sRNA position, base, mRNA position, base.
As mRNA coordinates differ when using real or default
(-200/+100) UTRs, a specific file is provided for each system.

IstR.fa, RseX.fa, RydC.fa
---------------------------
Set of homologous sequences for sRNAs not available in the copraRNA
server.

command-lines.txt
------------------
Text file containing all Unix command lines and options used for running predictors. 
Also includes authors and program versions. 


References:
===========
(1) Li S et al. Directional RNA-seq reveals highly complex
condition-dependent transcriptomes in E. coli K12 through accurate
full-length transcripts assembling. BMC Genomics 2013, 14:520.
