EPDThe Eukaryotic Promoter Database is an annotated non-redundant collection of eukaryotic POL II promoters, for which the transcription start site has been determined experimentally. The resource includes two databases: EPD and EPDnew. EPD is a collection of eukaryotic promoters derived from published articles whereas the EPDnew databases (HT-EPD) are the result of merging EPD promoters with in-house analysis of promoter-specific high-throughput data for selected organisms only. This process gives EPDnew high precision and high coverage.

Dreos, R., Ambrosini, G., Groux, R., Cavin Perier, R., and Bucher, P. Nucleic Acids Res. 2017; 45(D1):D51-D55. DOI: 10.1093/nar/gkw1069

ChIP-Seq – The ChIP-Seq Web Server provides access to a set of useful tools performing common chip-seq data analysis tasks, including positional correlation analysis, peak detection, feature extraction, and genome partitioning into signal-rich and signal-poor regions. Users can analyse their own data by uploading mapped sequence tags in various formats, including BED and BAM. The server also provides access to hundreds of publicly available data sets such as chip-seq data, transcription profiling data (i.e. CAGE), DNA-methylation data, sequence annotations (promoters, polyA-sites, etc.), and sequence-derived features (CpG, phastCons scores).

SSA – The Signal Search Analysis Server : is a software package for the analysis of nucleic acid sequence motifs that are positionally correlated with a functional site, such as a transcription initiation site or a transcription factor binding site.

MGA – The Mass Genome Annotation Data Repository is a database that stores published next-generation sequencing data and other genome annotation data (such as gene start sites, SNPs, etc.) that, in conjunction with the ChIP-Seq and SSA Web servers, can be freely accessed by scientists for data analysis.

Dreos, R., Ambrosini, G., Groux, R., Cavin Perier, R., and Bucher, P. Nucleic Acids Res. 2018; 46(Database issue): D175-D180. DOI: 10.1093/nar/gkx995

PWMScan – Genome-wide position weight matrix (PWM) scanner is a Web server for rapid scanning of large genomes for high-scoring matches to a user-supplied or server-resident position weight matrix (PWM).

SNP2TFBS : it is a Web interface aimed at studying variations (SNPs/indels) that affect transcription factor binding (TFB) in the Human genome.

Kumar, S., Ambrosini, G. and Bucher, P. Nucleic Acids Res. 2017; 45(Database issue): D139-D144. DOI: 10.1093/nar/gkw1064

UCNEbase A database for ultra-conserved non-coding element and genomic regulatory blocks that provides information on the evolution and genomic organization of ultra-conserved non-coding elements (UCNEs) in multiple vertebrate species.

HTPSELEX databaseit contains sets of in vitro selected transcription factor binding site sequences obtained with SELEX and high-throughput SELEX (HT-SELEX) methods.

ZFN-Site – Genome-wide tag scanner for nuclease off-sites, based on the Tagger software, that is intended to search genomes for specific target sites and off-target sites.

Cradick, T.J., Ambrosini, G., Iseli, C., Bucher, P., McCaffrey, A.P. BMC Bioinformatics, 2011; 12:152. DOI: 10.1186/1471-2105-12-152

MADAP : it is a flexible clustering tool for the interpretation of one-dimensional genome annotation data mapped onto complete or partial genome sequences.

Tromer : The transcriptome analyser (TROMER) project aims at providing tools to determine and document all the transcribed elements of a genome.

CleanEx : it provides access to public gene expression data via unique approved gene symbols. It represents heterogeneous expression data produced by different technologies in a way that facilitates joint analysis and cross-dataset comparisons.

FTP server: repository of mapped read distributions for very large amounts of ChIP-Seq and RNA-Seq profiles, sequence annotations (promoters), and sequence-derived features (CpG, PhastCons scores)


The ChIP-Seq software provides a set of applications performing common genome-wide chip-seq analysis tasks, including positional correlation analysis between two genomic features (e.g. histone modification profiles around transcription start sites), peak detection, feature extraction around specific genomic anchor points (e.g. nucleosome profiles around chip-seq peak positions), and genome partitioning into signal-rich and signal-poor regions. Other auxiliary applications, include format conversion and reformatting tools.

ChIP-Seq Tutorials
General Documentation
Ambrosini, G., Dreos, R., Kumar, S., and Bucher, P. BMC Genomics, 2016; 17:938 DOI: 10.1186/s12864-016-3288-8

PWMScan Toolkit
The PWMScan toolkit includes programs that can be used to scan genomes or, in general, large sets of nucleotide sequences to identify matches to a sequence motif represented by a position weight matrix (PWM). The PWM is the most commonly used mathematical model to describe the DNA binding specificity of a transcription factor (TF). A PWM contains scores for each base at each position of the binding site. The TF binding score for a given k-mer sequence is then obtained by simply adding up the base-specific scores at respective positions of the binding site.

Ambrosini, G., Groux, R., and Bucher, P. Bioinformatics, 2018, bty127; https://doi.org/10.1093/bioinformatics/bty127

SSA Tools
The central concept of Signal Search Analysis (SSA) is to find a motif present in all or a statistically significant proportion of DNA input sequences. The user specifies on the fly the sequence range around the functional site one wants to consider.

Bucher P., Trifonov EN. Nucleic Acids Res, 14, 1986.
Ambrosini G., Praz V., Jagannathan V., Bucher P. Nucleic Acids Res., 2003.

MADAP identifies groups of data corresponding to one or several genomic sites, and estimates the volume and extension of such groups (clusters). Input data might be obtained from cDNA and tag sequencing protocols to map the 5′ and 3′ ends of mRNA, from ChIP-chip analysis, or from genome-wide SNP-typing.

Schmid C.D., Sengstag T., Bucher P. and Delorenzi M., Nucleic Acids Res., 2007.

Tagger Toolkit
Tagger is a toolkit for searching fixed-size short sequence tags against entire genomes or mRNA reference sequence databases.

Tagger can use two alternative search engines:
– the fetchGWI program, a fast string-matching tool that uses indexed genomes;
– the tagger program, a C program that searches for exact-matched words across FASTA-formatted sequence databases.

Iseli C., Ambrosini G., Bucher P. and Jogeneel C.V. PLoS ONE, 2007.

The sibRNAfold project is based on Vienna RNAfold, which is a program for RNA secondary structure prediction through energy minimization. SibRNAfold implements a modification of the dynamic programming algorithm that significantly speeds up the minimum free energy calculations, while producing identical result as Vienna RNAfold.

Dimitrieva, S.and Bucher, P. J. Bioinformatics and Computational Biology, 2012