lllll

 

 

Nom Nom de la version Commentaires
ABYSS 1.5.2
1.3.2
1.9.0


URL: http://www.bcgsc.ca/platform/bioinfo/software/abyss

Description:

ABySS is a de novo, parallel, paired-end sequence assembler that is designed for short reads. The single-processor version is useful for assembling genomes up to 100 Mbases in size. The parallel version is implemented using MPI and is capable of assembling larger genomes.
act 11.0.0 URL:

http://www.sanger.ac.uk/resources/software/act/

Description:

ACT is a free tool for displaying pairwise comparisons between two or more DNA sequences. It can be used to identify and analyse regions of similarity and difference between genomes and to explore conservation of synteny, in the context of the entire sequences and their annotation.

Emplacement:

/usr/local/artemis/act
ADMIXTOOLS 3.0 URL:

https://genetics.med.harvard.edu/reich/Reich_Lab/Software.html

Description:

ADMIXTOOLS (Patterson et al. 2012) is a software package that supports formal tests of whether admixture occurred, and makes it possible to infer admixture proportions and dates

emplacement:

/usr/local/AdmixTools-3.0
ADMIXTURE 1.3.0

URL:

https://www.genetics.ucla.edu/software/admixture/index.html

Description:

ADMIXTURE is a software tool for maximum likelihood estimation of individual ancestries from multilocus SNP genotype datasets. It uses the same statistical model as STRUCTURE but calculates estimates much more rapidly using a fast numerical optimization algorithm.

emplacement:

/usr/local/admixture-1.3.0

 

Usage:

admixture + arguments

alignreads 1.0

URL:

https://github.com/zachary-foster/alignreads

Description:

Alignreads is a wrapper for YASRA (http://www.bx.psu.edu/miller_lab/). The principal function of alignreads is to facilitate easy execution of YASRA and to parse its output.

YASRA is a reference guided assembler that has the ability to extend the edges of alignments de novo interactively. The minimum inputs are a reference sequence and reads to be aligned, but there are many options. Use alignreads -h after installation to see a full list of options.

emplacement:

/usr/local/alignreads-1.0/align

usage:

alignreads reads.fa reference.fa

ALLMAPS 1.0 URL:

https://github.com/tanghaibao/jcvi/wiki/ALLMAPS

Description:

The ordering and orientation of genomic scaffolds to reconstruct chromosomes is an essential step during de novo genome assembly. This process is often assisted by various mapping techniques. Because each map provides a unique line of evidence, a combination of multiple maps can greatly improve the accuracy of the resulting chromosomal assemblies. ALLMAPS is capable of computing a scaffold ordering that maximizes the colinearity to a collection of maps, including genetic, physical or comparative maps into the final chromosome build. We highlight several salient features of ALLMAPS.

usage:

python -m jcvi.assembly.allmaps

emplacement:

/usr/local/ALLMAPS-1.0/code
AMOS 3.1.0 Date d'installation: 02/05/13

URL: http://amos.sourceforge.net/wiki/index.php/AMOS

Description:

The AMOS consortium is committed to the development of open-source whole genome assembly software. The project acronym (AMOS) represents our primary goal -- to produce A Modular, Open-Source whole genome assembler. Open-source so that everyone is welcome to contribute and help build outstanding assembly tools, and modular in nature so that new contributions can be easily inserted into an existing assembly pipeline. This modular design will foster the development of new assembly algorithms and allow the AMOS project to continually grow and improve in hopes of eventually becoming a widely accepted and deployed assembly infrastructure. In this sense, AMOS is both a design philosophy and a software system.
angsd

0.581
0.700
0.913

0.918

0.902

 

Pluging de Samtools

http://popgen.dk/angsd/index.php/Main_Page#Overview

ANGSD is a software for analyzing next generation sequencing data. The software can handle a number of different input types from mapped reads to imputed genotype probabilities. Most methods take genotype uncertainty into account instead of basing the analysis on called genotypes. This is especially useful for low and medium depth data. The software is written in C++ and has been used on large sample sizes.

Emplacement:

/usr/local/angsd-0.918



Usage: commande angsd + arguments

angsd

-> angsd version: 0.700 (htslib: 1.2.1) build(Apr 17 2015 16:08:02)

-> Please use the website ''http://www.popgen.dk/angsd'' as reference

-> Use -nThreads or -P for number of threads allocated to the program

Overview of methods:

-GL Estimate genotype likelihoods

-doCounts Calculate various counts statistics

-doAsso Perform association study

-doMaf Estimate allele frequencies

-doError Estimate the type specific error rates

-doAncError Estimate the errorrate based on perfect fastas

-doHWE Est inbreedning per site

-doGeno Call genotypes

-doFasta Generate a fasta for a BAM file

-doAbbababa Perform an ABBA-BABA test

-sites Analyse specific sites (can force major/minor)

-doSaf Estimate the SFS and/or neutrality tests genotype calling

-doHetPlas Estimate hetplasmy by calculating a pooled haploid frequency

Below are options that can be usefull

-bam Options relating to bam reading

-doMajorMinor Infer the major/minor using different approaches

-ref/-anc Read reference or ancestral genome

many others

For information of specific options type:

./angsd METHODNAME eg

./angsd -GL

./angsd -doMaf

./angsd -doAsso etc

./angsd sites for information about indexing -sites files

Examples:

Estimate MAF for bam files in 'list'

'./angsd -bam list -GL 2 -doMaf 2 -out RES -doMajorMinor 1'

apollo 1.11.7 URL:

http://apollo.berkeleybop.org/current/

Description:

Apollo is a genome annotation viewer and editor.

emplacement:

/usr/local/Apollo-1.11.7

usage:

apollo
art 2016-06-05 URL:

https://www.niehs.nih.gov/research/resources/software/biostatistics/art/index.cfm

Description:

ART is a set of simulation tools to generate synthetic next-generation sequencing reads.

emplacement:

/use/local/art-2016-06-05

usage:

art_454 + arguments
artemis 12.0 Date d'installation: 03/01/13

URL: http://www.sanger.ac.uk/resources/software/artemis/

Description: Artemis is written in Java, and is available for UNIX, Macintosh and Windows systems. It can read EMBL and GENBANK database entries or sequence in FASTA, indexed FASTA or raw format. Other sequence features can be in EMBL, GENBANK or GFF format.

emplacement:

/usr/local/artemis-12.0
augustus 3.0.3 URL:

http://bioinf.uni-greifswald.de/augustus/

Description:

AUGUSTUS is a program that predicts genes in eukaryotic genomic sequences. It can be run on this web server, on a new web server for larger input files or be downloaded and run locally. It is open source so you can compile it for your computing platform. You can now run AUGUSTUS on the German MediGRID. This enables you to submit larger sequence files and allows to use protein homology information in the prediction. The MediGRID requires an instant easy registration by email for first-time users.

emplacement:

/usr/local/augustus-3.0.3/bin

usage:

augustus -h
b2g4pipe 2.3.5 Date d'installation:16/07/13

URL: https://sites.google.com/a/brown.edu/bioinformatics-in-biomed/b2g4pipe-module

Description:

B2G4PIPE - A version for B2G annotation without FrontEnd (GUI) for Pipeline Integration
bam-readcount 0.7.4 URL:

https://github.com/genome/bam-readcount

Description:

The purpose of this program is to generate metrics at single nucleotide positions.

This program reports readcounts for each base at each position requested.

It also reports the average base quality of these bases and mapping qualities of

the reads containing each base.

The list of regions should be formatted as chromosome start and end. Each field should be tab separated and coordinates should be 1-based.

The optional region specification on the command line should follow the same format as that used for samtools (chr:start-stop)

Utilisation:

Usage: bam-readcount <bam_file> [region]

Available options:

-h [ --help ] produce this message

-v [ --version ] output the version

-q [ --min-mapping-quality ] arg (=0) minimum mapping quality of reads used

for counting.

-b [ --min-base-quality ] arg (=0) minimum base quality at a position to

use the read for counting.

-d [ --max-count ] arg (=10000000) max depth to avoid excessive memory

usage.

-l [ --site-list ] arg file containing a list of regions to

report readcounts within.

-f [ --reference-fasta ] arg reference sequence in the fasta format.

-D [ --print-individual-mapq ] arg report the mapping qualities as a comma

separated list.

-p [ --per-library ] report results by library.

-w [ --max-warnings ] arg maximum number of warnings of each type

to emit. -1 gives an unlimited number.

-i [ --insertion-centric ] generate indel centric readcounts.

Reads containing insertions will not be

included in per-base counts

emplacement:

/usr/local/bam-readcount-0.7.4/
bamtools 2.3 URL:

https://github.com/pezmaster31/bamtools

Description:

BamTools is a project that provides both a C++ API and a command-line toolkit for reading, writing, and manipulating BAM (genome alignment) files.

usage: bamtools [--help] COMMAND [ARGS]

Available bamtools commands:

convert Converts between BAM and a number of other formats

count Prints number of alignments in BAM file(s)

coverage Prints coverage statistics from the input BAM file

filter Filters BAM file(s) by user-specified criteria

header Prints BAM header information

index Generates index for BAM file

merge Merge multiple BAM files into single file

random Select random alignments from existing BAM file(s), intended more as a testing tool.

resolve Resolves paired-end reads (marking the IsProperPair flag as needed)

revert Removes duplicate marks and restores original base qualities

sort Sorts the BAM file according to some criteria

split Splits a BAM file on user-specified property, creating a new BAM output file for each value found

stats Prints some basic statistics from input BAM file(s)

See 'bamtools help COMMAND' for more information on a specific command.

emplacement:

/usr/local/bamtools-2.3/
bamUtils 1.0.13 URL:

http://genome.sph.umich.edu/wiki/BamUtil

Description:

bamUtil is a repository that contains several programs that perform operations on SAM/BAM files. All of these programs are built into a single executable, bam.

Emplacement:

/usr/local/bamUtil_1.0.13/bin/
Bali-Phy 3.0-beta2

URL:

http://www.bali-phy.org/index.php#intro

Description:
BAli-Phy is software by Ben Redelings that estimates multiple sequence alignments and evolutionary trees from DNA, amino acid, or codon sequences. It uses likelihood-based evolutionary models of substitutions and insertions and deletions to place gaps. It has been used in published analyses on data sets up to 117 taxa.


Emplacement:

/usr/local/bali-phy-3.0-beta2

 

Usage:

http://www.bali-phy.org/README.html#running

bayenv2 2.0

URL:

https://bitbucket.org/tguenther/bayenv2_public/src 

Description:
Loci involved in local adaptation can potentially be identified by an unusual correlation between allele frequencies and important ecological variables, or by extreme allele frequency differences between geographic regions. However, such comparisons are complicated by differences in sample sizes and the neutral correlation of allele frequencies across populations due to shared history and gene flow. To overcome these difficulties, we have developed a Bayesian method that estimates the empirical pattern of covariance in allele frequencies between populations from a set of markers, and then uses this as a null model for a test at individual SNPs.


Emplacement:

/usr/local/bayen2.0

Usage:

bayen2 + arguments 

BayeScan 2.1

URL:
http://cmpg.unibe.ch/software/BayeScan/download.html 


Description:
DETECTING NATURAL SELECTION FROM POPULATION-BASED GENETIC DATA
BayeScan is a command line based open source software, published under the GNU General Public License as published by the Free Software Foundation 


Emplacement:
/usr/local/BayeScan-2.1/binaries 

usage:

BayeScan2.1 + arguments 

bcftools 0.1.17-dev
1.1
1.3
URL:

http://samtools.github.io/bcftools/bcftools.html

Description:

BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed.

Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF will work in all situations. Un-indexed VCF and BCF and streams will work in most, but not all situations.

BCFtools is designed to work on a stream. It regards an input file ''-'' as the standard input (stdin) and outputs to the standard output (stdout). Several commands can thus be combined with Unix pipes.

emplacement:

/usr/local/bcftools-1.3

usage:

bcftools + arguments
beagle 2.0 https://code.google.com/p/beagle-lib/

BEAGLE is a high-performance library that can perform the core calculations at the heart of most Bayesian and Maximum Likelihood phylogenetics package. It can make use of highly-parallel processors such as those in 3D graphics boards found in many PCs.
Beagle 4.1
27jul2016
URL:

http://faculty.washington.edu/browning/beagle/beagle.html#download

Description:

Beagle is a software package that performs genotype calling, genotype phasing, imputation of ungenotyped markers, and identity-by-descent segment detection.

Beagle version 4.1 has a more accurate genotype phasing algorithm and a very fast and accurate genotype imputation algorithm. Version 4.1 also has several changes to the command line arguments which are described in the release notes. The ''ped'' argument has no effect in version 4.1. If your data contains nuclear families and you want to model the parent-offspring relationships when phasing genotypes, please use version 4.0.

Emplacement:

/usr/local/beagle-27Jul16.86a

usage:

java -jar /usr/local/beagle-27Jul16.86a/beagle-27Jul16.86a.jar + arguments
BEAST

1.8.0
1.7.4
2.1.3
2.1.3
2.3

2.4.4

Date d'installation version 1.7.4: 03/01/13

version 1.8.0: 16/10/13

version 2.1.3: 11/09/14

URL: http://beast.bio.ed.ac.uk/Main_Page

Description:

BEAST is a cross-platform program for Bayesian MCMC analysis of molecular sequences. It is entirely orientated towards rooted, time-measured phylogenies inferred using strict or relaxed molecular clock models. It can be used as a method of reconstructing phylogenies but is also a framework for testing evolutionary hypotheses without conditioning on a single tree topology. BEAST uses MCMC to average over tree space, so that each tree is weighted proportional to its posterior probability. We include a simple to use user-interface program for setting up standard analyses and a suit of programs for analysing the results.

emplacement:

/usr/local/BEAST-2.3/bin

usage:

beast + arguments
bedtools 2.17.0
2.25
2.26.0
Date d'installation 2.17.0: 07/03/13

URL: http://bedtools.readthedocs.org/en/latest/

Description:

Collectively, the bedtools utilities are a swiss-army knife of tools for a wide-range of genomics analysis tasks. The most widely-used tools enable genome arithmetic: that is, set theory on the genome. For example, bedtools allows one to intersect, merge, count, complement, and shuffle genomic intervals from multiple files in widely-used genomic file formats such as BAM, BED, GFF/GTF, VCF. While each individual tool is designed to do a relatively simple task (e.g., intersect two interval files), quite sophisticated analyses can be conducted by combining multiple bedtools operations on the UNIX command line.
bioperl 1.6.901-3 URL:

http://www.bioperl.org/wiki/Main_Page

Description:

a community effort to produce Perl code which is useful in biology.
biopython 1.64 URL:

http://biopython.org/wiki/Download

Description:

Biopython is a set of freely available tools for biological computation written in Python by an international team of developers.

It is a distributed collaborative effort to develop Python libraries and applications which address the needs of current and future work in bioinformatics. The source code is made available under the Biopython License, which is extremely liberal and compatible with almost every license in the world

Installé dans /usr/local/lib/python3.3/sites-packages
blasr 1.0 Date d'installation: 22/01/14

URL: https://github.com/PacificBiosciences/blasr

Description:

BLASR: The PacBio® long read aligner
blast+ 2.29
2.2.26
2.3.0
2.4.0+
BLAST finds regions of similarity between biological sequences

URL: ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/

emplacement:

/usr/local/ncbi-blast-2.3.0+/bin
blast2go 2.5.0 Date d'installation: 15/04/13

URL: https://www.blast2go.com/b2ghome

Description:

Blast2GO is an ALL in ONE tool for functional annotation of (novel) sequences and the analysis of annotation data.
blastn2snp   URL:

https://github.com/lindenb/jvarkit/wiki/BlastnToSnp

Description:

parse a BLASTn-XML stream and extract the variations.

emplacement:

/usr/local/blastn2snp/dist-1.128/
blastR 2.2 Date d'installation: 13/11/13

URL: http://www.tcoffee.org/Projects/blastr/

Description:

BlastR is a new method for searching Non-Coding RNAs in databases. The strategy we adopted relies on the use of the mutual information embedded in di-nucleotides. We have shown that this approach has better sensitivity and specifity than other softwares with comparable computational cost. BlastR package is a perl wrapper for BlastP and it is part of the T-Coffee distribution
BMGE 1.12

URL: http://bmcevolbiol.biomedcentral.com/articles/10.1186/1471-2148-10-210

Description:

BMGE is able to perform biologically relevant trimming on a multiple alignment of DNA, codon or amino acid sequences

Emplacement:

/usr/local/BMGE-1.12

Usage:

java -jar /usr/local/BMGE-1.12/BMGE.jar + arguments

 

boost 1_53_0
1_56_0
Date d'installation version 1.53.0: 07/03/13

Date d'installation version 1.56.0: 13/10/14

URL:http://www.boost.org/

Description:Boost provides free peer-reviewed portable C++ source libraries.

We emphasize libraries that work well with the C++ Standard Library. Boost libraries are intended to be widely useful, and usable across a broad spectrum of applications.
bowtie 0.12.9
1.1.1
1.1.2
Date d'installation: 11/03/13

URL: http://bowtie-bio.sourceforge.net/index.shtml

Description:

Bowtie is an ultrafast, memory-efficient short read aligner. It aligns short DNA sequences (reads) to the human genome at a rate of over 25 million 35-bp reads per hour. Bowtie indexes the genome with a Burrows-Wheeler index to keep its memory footprint small: typically about 2.2 GB for the human genome (2.9 GB for paired-end).
bowtie2 2.2.4
2.2.5
2.2.9
URL:http://bowtie-bio.sourceforge.net/bowtie2/index.shtml

Description:

Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes. Bowtie 2 indexes the genome with an FM Index to keep its memory footprint small: for the human genome, its memory footprint is typically around 3.2 GB. Bowtie 2 supports gapped, local, and paired-end alignment modes.

Emplacement:

/usr/local/bowtie2-2.2.9
breakdancer 1.1.2
1.4.5
URL:

https://github.com/genome/breakdancer

Description:

BreakDancerMax predicts five types of structural variants: insertions, deletions, inversions, inter- and intra-chromosomal translocations from next-generation short paired-end sequencing reads using read pairs that are mapped with unexpected separation distances or orientation.

usage:

Perl script: perl /usr/local/breakdancer-1.4.5/bam2cfg.pl

cpp: command breakdancer-max + arguments

emplacement:

/usr/local/breakdancer-1.4.5
BUSCO 1.1b1 URL:

http://busco.ezlab.org/

Description:

BUSCO provides measures for quantitative assessment of genome assembly, gene set, and transcriptome completeness based on evolutionarily informed expectations of gene content from near-universal single-copy orthologs selected from OrthoDB.

emplacement:

/usr/local/BUSCO_v1.1b1

usage:

python3.5 /usr/local/BUSCO_v1.1b1/BUSCO_v1.1b1.py + arguments
bwa 0.6.2
0.7.5a
0.6.2-r126
0.7.12
Date d'installation version 0.6.2: 03/01/13

Date d'installation version 0.7.5a: 02/12/13

URL:http://sourceforge.net/projects/bio-bwa/

Description:

BWA is a program for aligning sequencing reads against a large reference genome (e.g. human genome). It has two major components, one for read shorter than 150bp and the other for longer reads.
canu 1.6



URL: 

https://github.com/marbl/canu/releases

Description: 

Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation

 

Emplacement:

/usr/local/canu-1.6

 

Usage:

 

canu + arguments

CAP3 1.0 Date d'installation: 03/01/13

URL: http://seq.cs.iastate.edu/

Description: Sequence Assembly Program
cd-hit 4.5.4 Date d'installation version 4.5.4: 05/03/13

URL: http://bioinformatics.org/cd-hit/

Description:

D-HIT stands for Cluster Database at High Identity with Tolerance. The program (cd-hit) takes a fasta format sequence database as input and produces a set of 'non-redundant' (nr) representative sequences as output. In addition cd-hit outputs a cluster file, documenting the sequence 'groupies' for each nr sequence representative. The idea is to reduce the overall size of the database without removing any sequence information by only removing 'redundant' (or highly similar) sequences. This is why the resulting database is called non-redundant (nr). Essentially, cd-hit produces a set of closely related protein families from a given fasta sequence database.

CD-HIT uses a 'longest sequence first' list removal algorithm to remove sequences above a certain identity threshold. Additionally the algorithm implements a very fast heuristic to find high identity segments between sequences, and so can avoid many costly full alignments.

With recent developments, cd-hit package offers new programs for DNA sequence clustering and comparing two databases. It also has lots of new options for clustering control.
censor 4.2.22 Date d'installation v4.2.22: 09/12/13

URL:

http://www.girinst.org/censor/

Description:

CENSOR is a software tool which screens query sequences against a reference collection of repeats and ''censors'' (masks) homologous portions with masking symbols, as well as generating a report classifying all found repeats.
clean_reads 0.2.3 Date d'installation v0.2.3: 07/03/13

URL:

http://bioinf.comav.upv.es/clean_reads/

Description:

clean_reads cleans NGS (Sanger, 454, Illumina and solid) reads. It can trim:

bad quality regions,

adaptors,

vectors, and

regular expresssions.

It also filters out the reads that do not meet a minimum quality criteria based on the sequence length and the mean quality.
clustal_omega 1.10

Date d'installation v 1.1.0: 03/01/13

URL:

http://www.clustal.org/#Download

Description:

Clustal Omega is the latest addition to the Clustal family. It offers a significant increase in scalability over previous versions, allowing hundreds of thousands of sequences to be aligned in only a few hours. It will also make use of multiple processors, where present. In addition, the quality of alignments is superior to previous versions, as measured by a range of popular benchmarks.

Emplacement:

/usr/local/clustal-omega-1.2.3

 

Usage:

clustalo + arguments

clustalw 2.1

Date d'installation v2.1: 03/01/13

URL:

http://www.clustal.org/#Download

Description:

Multiple Sequence Alignments

Emplacement:

/usr/local/clustalw-2.1

Usage:

clustalw2 + arguments

cmake 3.1.1 URL:

http://www.cmake.org/download/

Description:

CMake is a family of tools designed to build, test and package software. CMake is used to control the software compilation process using simple platform and compiler independent configuration files. CMake generates native makefiles and workspaces that can be used in the compiler environment of your choice.

PATH: /usr/local/cmake-3.1.1/bin
control-freec 6.8 URL: http://bioinfo-out.curie.fr/projects/freec/tutorial.html#install

Description: Control-FREEC is a tool for detection of copy-number changes and allelic imbalances (including LOH) using deep-sequencing data developed by the Computational Systems Biology of Cancer group at the Bioinformatics Laboratory of Institut Curie (Paris).

Installé le 14/05/14
crac 1.19.0
2.5
URL:

https://odin.univ-montp2.fr/owncloud/public.php?service=files&t=bc65ef55474a076bc68377f664f81a67

http://crac.gforge.inria.fr/index.php?id=what-is-it

Description:

An integrated RNA-Seq read analysis

Usage:

crac + argument(s)

emplacement:

/usr/local/crac-1.19.0
cufflinks 2.0.2
2.1.1
2.2.0
2.2.1
Date d'installation v2.0.2: 07/03/13

Date d'installation v2.1.1: 11/04/13

Date d'installation v2.2.0: 17/04/13

Date d'installation v2.2.0: 08/04/15

URL:

http://cufflinks.cbcb.umd.edu/manual.html

Description:

Transcript assembly, differential expression, and differential regulation for RNA-Seq

emplacement:

/usr/local/cufflinks-2.2.1.Linux_x86_64/
cutadapt 1.2.1
1.4.2
1.8
1.10
Date d'installation v1.2.1: 03/01/13

URL:

https://cutadapt.readthedocs.org/en/stable/

Dowload:

https://pypi.python.org/pypi/cutadapt

Description:

cutadapt removes adapter sequences from high-throughput sequencing data.
cytoscape 3.4.0
3.3.0
URL: http://www.cytoscape.org/

Description:

Cytoscape is an open source software platform for visualizing complex networks and integrating these with any type of attribute data. A lot of Apps are available for various kinds of problem domains, including bioinformatics, social network analysis, and semantic web.

emplacement:

/usr/local/Cytoscape-3.4.0

Usage:

Cytoscape
delly 0.7.5 URL:https://github.com/tobiasrausch/delly

Description:

Delly2 is an integrated structural variant prediction method that can discover and genotype deletions, tandem duplications, inversions and translocations at single-nucleotide resolution in short-read massively parallel sequencing data. It uses paired-ends and split-reads to sensitively and accurately delineate genomic rearrangements throughout the genome. Structural variants can be visualized using Delly-maze and Delly-suave.

emplacement:

/usr/local/delly-0.7.5/

Usage:

delly + arguments
diamond

0.7.11

0.8.29

URL:

http://ab.inf.uni-tuebingen.de/software/diamond/

Description:

DIAMOND is a new high-throughput program for aligning a file of short reads against a protein reference database such as NR, at 20,000 times the speed of BLASTX, with high sensitivity

DIAMOND is a new alignment tool for aligning short DNA sequencing reads to a protein reference database such as NCBI-NR. On Illumina reads of length 100-150bp, in fast mode, DIAMOND is about 20,000 times faster than BLASTX, while reporting about 80-90% of all matches that BLASTX finds, with an e-value of at most 1e-5. In sensitive mode, DIAMOND ist about 2,500 times faster than BLASTX, finding more than 94% of all matches.

emplacement:

/usr/local/diamond-0.8.29

usage:

diamond + arguments
dotter

4.22

 

URL:

http://sonnhammer.sbc.su.se/Dotter.html

Description:

Dotter is a graphical dotplot program for detailed comparison of two sequences. Here, every residue in one sequence is compared to every residue in the other sequence. The first sequence runs along the x-axis and the second sequence along the y-axis. In regions where the two sequences are similar to each other, a row of high scores will run diagonally across the dot matrix. If you're comparing a sequence against itself to find internal repeats, you'll notice that the main diagonal scores maximally, since it's the 100% perfect self-match.

 


emplacement:

/usr/local/dotter-4.22

usage:

dotter + arguments
ea-utils 1.1.2 Description:

Command-line tools for processing biological sequencing data. Barcode demultiplexing, adapter trimming, etc.

Primarily written to support an Illumina based pipeline - but should work with any FASTQs.

URL:

https://code.google.com/p/ea-utils/
elai 0.99 URL: http://www.haplotype.org/elai.html

Description:

Efficient Local Ancestry Inference

The software performs local ancestry inference for admixed individuals.

Emplacement:

/usr/local/elai-0.99

usage:

elai +arguments
emboss 6.5.7
6.4.0.0
Date d'installation v6.5.7: 03/01/13

URL:

http://emboss.sourceforge.net/download/

Description:

EMBOSS is ''The European Molecular Biology Open Software Suite''. EMBOSS is a free Open Source software analysis package specially developed for the needs of the molecular biology (e.g. EMBnet) user community. The software automatically copes with data in a variety of formats and even allows transparent retrieval of sequence data from the web. EMBOSS also integrates a range of currently available packages and tools for sequence analysis into a seamless whole.
emmax 1.0



URL:

http://genetics.cs.ucla.edu/emmax/news.html

Description:
EMMAX is a statistical test for large scale human or model organism association mapping accounting for the sample structure. In addition to the computational efficiency obtained by EMMA algorithm, EMMAX takes advantage of the fact that each loci explains only a small fraction of complex traits, which allows us to avoid repetitive variance component estimation procedure, resulting in a significant amount of increase in computational time of association mapping using mixed model.

emplacement:

/usr/local/emmax-1.0

Usage:

emmax + arguments

emmax-kin + arguments

 

Epidemio 1.0 Date d'installation: 05/08/14

Description: programmes santé pour Elsa Canard
eugene 4.0a Date d'installation v4.0a: 03/01/13

URL: http://eugene.toulouse.inra.fr/

Description:

EuGène is an open gene finder for eukaryotic organisms.
exonerate 2.2.0 Date installation v2.20: 08/08/14

URL: https://www.ebi.ac.uk/~guy/exonerate/

Description:

exonerate is a generic tool for pairwise sequence comparison.

It allows you to align sequences using a many alignment models, using either exhaustive dynamic programming, or a variety of heuristics.
FALCON-integrate 1.0

URL:

https://github.com/PacificBiosciences/FALCON

Description:

Falcon: a set of tools for fast aligning long reads for consensus and assembly

The Falcon tool kit is a set of simple code collection which I use for studying efficient assembly algorithm for haploid and diploid genomes. It has some back-end code implemented in C for speed and some simple front-end written in Python for convenience.



emplacement:

/usr/local/FALCON-integrate-1.0/FALCON-integrate

FastME 2.5.1

URL:

/usr/local/FastME-2.5.1

Description:
FastME provides distance algorithms to infer phylogenies. FastME is based on balanced minimum evolution, which is the very principle of NJ. FastME improves over NJ by performing topological moves using fast, sophisticated algorithms. The first version of FastME only included Nearest Neighbor Interchange (NNI). The new 2.0 version also includes Subtree Pruning and Regrafting (SPR), while remaining as fast as NJ and providing a number of facilities: distance estimation for DNA and proteins with various models and options, bootstrapping, and parallel computations.



emplacement:

/usr/local/FastME-2.5.1

 

Usage:

fastme + arguments

fastPHASE 1.4.8

URL:

/usr/local/fastPHASE-1.4.8

Description:
fastPHASE is a program to estimate missing genotypes and unobserved haplotypes. It is an implementation of the model described in Scheet & Stephens (2006). This is a cluster-based model for haplotype variation, and gains its utility from implicitly modeling the genealogy of chromosomes in a random sample from a population as a tree but summarizing all haplotype variation in the "tips" of the trees.

The program offers additional functionality, as well, including the following: estimation and correction of genotyping errors based on patterns of linkage disequilibrium (Scheet & Stephens, 2008), haplotype-based association mapping of binary phenotypes, estimation of missing genotypes from low-coverage sequencing data. We are in the process of developing a web-based tutorial for fastPHASE and will be updating this space soon.

emplacement:

/usr/local/fastPHASE-1.4.8

 

Usage:

fastPHASE + arguments

fastq-tools 0.8 URL:

https://github.com/dcjones/fastq-tools

Description:

This package provides a number of small and efficient programs to perform common tasks with high throughput sequencing data in the FASTQ format. All of the programs work with typical FASTQ files as well as gzipped FASTQ files.

index

The following programs are provided. See the individual man pages for more information.

fastq-sort : sort fastq entries by various keys

fastq-grep : match sequences against regular expressions

fastq-kmers : count k-mer occurrences

fastq-match : (smith-waterman) local sequence alignment

fastq-qual : tabulate quality scores

fastq-sample : randomly sample reads, with or without replacement

fastq-uniq : count duplicate reads

fastq-qualadj : adjust quality scores by a fixed offset

emplacement:

/usr/local/bin
FastQC 0.10
0.11.2
0.11.5
0.11.3
Date d'installation version 0.10: 03/01/13

URL:

http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/

Description:

FastQC aims to provide a simple way to do some quality control checks on raw sequence data coming from high throughput sequencing pipelines.
fastStructure 1.0 URL:

http://rajanil.github.io/fastStructure/

Description:

A variational framework for inferring population structure from SNP genotype data.

emplacement:

/usr/local/fastStructure-1.0

usage:

python structure.py + arguments
fastTree 2.1.10 URL:

http://www.microbesonline.org/fasttree/

Description:

FastTree infers approximately-maximum-likelihood phylogenetic trees from alignments of nucleotide or protein sequences. FastTree can handle alignments with up to a million of sequences in a reasonable amount of time and memory. For large alignments, FastTree is 100-1,000 times faster than PhyML 3.0 or RAxML 7

emplacement:

/usr/local/FastTree-2.1.10

usage:

FastTree + arguments
FASTX-Toolkit

0.0.13

0.0.14

URL: http://hannonlab.cshl.edu/fastx_toolkit/download.html

The FASTX-Toolkit is a collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing.

Emplacement:

/usr/local/fastx_toolkit-0.0.14

Usage:

binaries contained in /usr/local/fastx_toolkit-0.0.14 + arguments

fftw 3.3 Date d'installation v3.3: 02/01/13

URL:

http://www.fftw.org/

Description:

FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions, of arbitrary input size, and of both real and complex data (as well as of even/odd data, i.e. the discrete cosine/sine transforms or DCT/DST). We believe that FFTW, which is free software, should become the FFT library of choice for most applications.
fgenes 1.0 URL:

http://www.softberry.com/berry.phtml?topic=fdp.htm&no_menu=on

Description:

Pattern based human gene structure prediction (multiple genes, both chains).

emplacement:

/usr/local/fgenes-1.0
freebayes 0.9.21 URL:

https://github.com/ekg/freebayes

Description:

FreeBayes is a Bayesian genetic variant detector designed to find small polymorphisms, specifically SNPs (single-nucleotide polymorphisms), indels (insertions and deletions), MNPs (multi-nucleotide polymorphisms), and complex events (composite insertion and substitution events) smaller than the length of a short-read sequencing alignment.

emplacement:

/usr/local/freebayes-0.9.21

Usage:

commande freebayes + arguments

Installation:

git config --global url.https://github.com/.insteadOf git://github.com/

git clone --recursive git://github.com/ekg/freebayes.git

cd freebayes

make

mkdir /usr/local/freebayes-0.9.21
Gassst 1.28 Date d'installation version 1.28: 03/01/13

URL:

http://www.irisa.fr/symbiose/projects/gassst/

Description:

GASSST finds global alignments of short DNA sequences against large DNA banks
Gatk 2.3.4
2.4.7
2.4.9
3.3.0
3.4-46
3.6
Date installation v2.3.4: 20/12/12

Date installation v2.4.7: 06/03/13

Date installation v2.4.9: 27/03/13

URL:http://www.broadinstitute.org/gsa/wiki/index.php/Downloading_the_GATK

Description:

The GATK is a structured software library that makes writing efficient analysis tools using next-generation sequencing data very easy

emplacement:

/usr/local/gatk-3.6/

usage:

java -Xmx12g -jar /usr/local/gatk-3.6/GenomeAnalysisTK.jar --help
Gblocks 0.91b Date installation v0.91b: 03/01/13

URL:

http://molevol.cmima.csic.es/castresana/Gblocks.html

Description:

Gblocks is a computer program written in ANSI C language that eliminates poorly aligned positions and divergent regions of an alignment of DNA or protein sequences.
genome strip 2.00.1611 URL:

http://software.broadinstitute.org/software/genomestrip/

Description:

Genome STRiP (Genome STRucture In Populations) is a suite of tools for discovering and genotyping structural variations using sequencing data. The methods are designed to detect shared variation using data from multiple individuals.

Genome STRiP looks both across and within a set of sequenced genomes to detect variation. The methods are adaptive and support heterogeneous data sets, including variations in sequencing depth, read lengths and mixtures of paired and single-end reads. A minimum of 20 to 30 genomes are required to get acceptable results, but the method gains power across genomes and processing more genomes provide better results.

emplacement:

/usr/local/genome_strip-2.00.1611
genometools

1.4.2

1.5.9

URL:

http://genometools.org/

Description:

The GenomeTools genome analysis system is a free collection of bioinformatics tools (in the realm of genome informatics) combined into a single binary named gt. It is based on a C library named “libgenometools” which consists of several modules.

emplacement:

/usr/local/genometools-1.5.9
gepard 1.30 date installation v1.30: 03/01/13

URL:

http://www.softpedia.com/get/Science-CAD/Gepard.shtml

Descritption:

Gepard is an easy to use, handy piece of software designed to enable the calculation of dotplots even for large sequences like chromosomes or bacterial genomes.

Utilisation: aller dans /usr/local/gepard-1.30 et lancer la commande sh gepardcmd.sh
gmaj   URL:

http://globin.bx.psu.edu/dist/gmaj/gmaj_help.html

Description:

Gmaj is a tool designed for viewing and manipulating Generalized Multiple Alignments (GMAs) produced by sequence-symmetric alignment programs such as TBA (though it can also be used with MAF format alignments from other sources). It can display interactive graphical and text representations of the alignments, diagrams showing the locations of exons and repeats, and other annotations -- all with the user's choice of reference sequence.

Installé dans /usr/local/gmaj le 17/12/14
gmap v2012-12-20
v2013-02-25
date installation v2012-12-20: 05/03/13

date installation v2012-12-20: 05/03/13

URL:

http://research-pub.gene.com/gmap/

Description:

GMAP: A Genomic Mapping and Alignment Program for mRNA and EST Sequences, and GSNAP: Genomic Short-read Nucleotide Alignment Program
gnuplot 4.6.1 date installation v4.6.1: 21/02/13

URL: http://www.gnuplot.info/

Description:

Gnuplot is a portable command-line driven graphing utility for Linux, OS/2, MS Windows, OSX, VMS, and many other platforms. The source code is copyrighted but freely distributed (i.e., you don't have to pay for it). It was originally created to allow scientists and students to visualize mathematical functions and data interactively, but has grown to support many non-interactive uses such as web scripting. It is also used as a plotting engine by third-party applications like Octave. Gnuplot has been supported and under active development since 1986.
HISAT 0.1.6 URL:

https://ccb.jhu.edu/software/hisat/index.shtml

Description:

HISAT is a fast and sensitive spliced alignment program for mapping RNA-seq reads. In addition to one global FM index that represents a whole genome, HISAT uses a large set of small FM indexes that collectively cover the whole genome (each index represents a genomic region of ~64,000 bp and ~48,000 indexes are needed to cover the human genome). These small indexes (called local indexes) combined with several alignment strategies enable effective alignment of RNA-seq reads, in particular, reads spanning multiple exons. The memory footprint of HISAT is relatively low (~4.3GB for the human genome). We have developed HISAT based on the Bowtie2 implementation to handle most of the operations on the FM index.

Emplacement:

/usr/local/hisat-0.1.6-beta/

Usage:

hisat + arguments
HISAT2 2.0.0-beta URL:

http://ccb.jhu.edu/software/hisat2/index.shtml

Description:

HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) against the general human population (as well as against a single reference genome). Based on an extension of BWT for graphs [Sirén et al. 2014], we designed and implemented a graph FM index (GFM), an original approach and its first implementation to the best of our knowledge. In addition to using one global GFM index that represents the general population, HISAT2 uses a large set of small GFM indexes that collectively cover the whole genome (each index representing a genomic region of 56 Kbp, with 55,000 indexes needed to cover the human population). These small indexes (called local indexes), combined with several alignment strategies, enable rapid and accurate alignment of sequencing reads. This new indexing scheme is called a Hierarchical Graph FM index (HGFM).

Emplacement:

/usr/local/hisat2-2.0.0-beta/

Usage:

hisat2 + arguments
hmmer 3.0
3.1b1
date installation v3.0: 03/01/13

date installation v3.1b1: 02/12/13

URL:

http://hmmer.janelia.org/

Description:

HMMER is used for searching sequence databases for homologs of protein sequences, and for making protein sequence alignments. It implements methods using probabilistic models called profile hidden Markov models(profile HMMs).HMMER is used for searching sequence databases for homologs of protein sequences, and for making protein sequence alignments. It implements methods using probabilistic models called profile hidden Markov models(profile HMMs).HMMER is used for searching sequence databases for homologs of protein sequences, and for making protein sequence alignments. It implements methods using probabilistic models called profile hidden Markov models(profile HMMs).HMMER is used for searching sequence databases for homologs of protein sequences, and for making protein sequence alignments. It implements methods using probabilistic models called profile hidden Markov models(profile HMMs).
HYPHY 2.1020130114beta URL:

http://octamonkey.ucsd.edu/hyphywiki/index.php/Main_Page

Description:

HyPhy is an open-source software package for the analysis of genetic sequences using techniques in phylogenetics, molecular evolution, and machine learning. It features a complete graphical user interface (GUI) and a rich scripting language for limitless customization of analyses. Additionally, HyPhy features support for parallel computing environments (via message passing interface) and it can be compiled as a shared library and called from other programming environments such as Python or R. HyPhy has over 6500 registered users and has been cited in over 600 peer-reviewed publications (Google Scholar). Continued development of HyPhy is currently supported in part by an NIGMS R01 award 1R01GM093939.HyPhy is an open-source software package for the analysis of genetic sequences using techniques in phylogenetics, molecular evolution, and machine learning. It features a complete graphical user interface (GUI) and a rich scripting language for limitless customization of analyses. Additionally, HyPhy features support for parallel computing environments (via message passing interface) and it can be compiled as a shared library and called from other programming environments such as Python or R. HyPhy has over 6500 registered users and has been cited in over 600 peer-reviewed publications (Google Scholar). Continued development of HyPhy is currently supported in part by an NIGMS R01 award 1R01GM093939.HyPhy is an open-source software package for the analysis of genetic sequences using techniques in phylogenetics, molecular evolution, and machine learning. It features a complete graphical user interface (GUI) and a rich scripting language for limitless customization of analyses. Additionally, HyPhy features support for parallel computing environments (via message passing interface) and it can be compiled as a shared library and called from other programming environments such as Python or R. HyPhy has over 6500 registered users and has been cited in over 600 peer-reviewed publications (Google Scholar). Continued development of HyPhy is currently supported in part by an NIGMS R01 award 1R01GM093939.HyPhy is an open-source software package for the analysis of genetic sequences using techniques in phylogenetics, molecular evolution, and machine learning. It features a complete graphical user interface (GUI) and a rich scripting language for limitless customization of analyses. Additionally, HyPhy features support for parallel computing environments (via message passing interface) and it can be compiled as a shared library and called from other programming environments such as Python or R. HyPhy has over 6500 registered users and has been cited in over 600 peer-reviewed publications (Google Scholar). Continued development of HyPhy is currently supported in part by an NIGMS R01 award 1R01GM093939.

emplacement:

/usr/local/veg-hyphy-d19efea/
IGV 1.4.04
2.3.32
Integrative Genomic viewer

http://www.broadinstitute.org/software/igv/home
inGAP 3.1.1 URL:

http://ingap.sourceforge.net/

Description:

We developed an integrative next-generation genome analysis pipeline (inGAP), which employed a Bayesian principle to detect single nucleotide polymorphisms (SNPs), small insertion/deletions (indels). inGAP has been applied to a number of genome projects, including bacteria, yeast, plants and mammals. Here we extend this pipeline to identify and visualize large-size structural variations, including insertions, deletions, inversions and translocations.

emplacement:

/usr/local/inGap-3.1.1

usage:

inGAP
InStruct 1.0 Date Installation: 19/07/13

URL:

http://cbsuapps.tc.cornell.edu/InStruct.aspx

Description:

InStruct implements the Markov Chain Monte Carlo algorithm for the generalized Bayesian clustering method to estimate the self-fertilization rates and cluster individuals into subpopulations simultaneously using genotype data consisting of unlinked markers
interproscan 5.3-46.0 Date installation: 15/04/13

Scans a range of protein signatures against your sequence

URL:

http://code.google.com/p/interproscan/
jabba 1.0 URL:

https://github.com/biointec/jabba/wiki

Description:

Jabba is a hybrid error correction tool to correct third generation (PacBio / ONT) sequencing data, using second generation (Illumina) data.

emplacement:

/usr/local/jabba-1.0

usage:

jabba + argument
java jre 7u51
1.8.20
1.6.0_26
7u55
 
jellyfish 2.1.4 URL:

http://www.cbcb.umd.edu/software/jellyfish/

description:

JELLYFISH is a tool for fast, memory-efficient counting of k-mers in DNA. A k-mer is a substring of length k, and counting the occurrences of all such substrings is a central step in many analyses of DNA sequence. JELLYFISH can count k-mers using an order of magnitude less memory and an order of magnitude faster than other k-mer counting packages by using an efficient encoding of a hash table and by exploiting the ''compare-and-swap'' CPU instruction to increase parallelism.

JELLYFISH is a command-line program that reads FASTA and multi-FASTA files containing DNA sequences. It outputs its k-mer counts in an binary format, which can be translated into a human-readable text format using the ''jellyfish dump'' command. See the documentation below for more details.

emplacement:

/usr/local/jellyfish-2.1.4

usage:

jellyfish + arguments
JR-assembler 1.04 URL:

http://jr-assembler.iis.sinica.edu.tw/index.htm

Description:

JR-Assembler

An assembler for the de novo assembly of large genomes using short sequence reads via jumping extension and read remapping

Usage :

commandeJR + argument(s)

Emplacement:

/usr/local/JR-Assembler_v1.0.4/
kallisto 0.43.1 URL:

https://pachterlab.github.io/kallisto/download

Description:

kallisto is a program for quantifying abundances of transcripts from RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads.

emplacement:

/usr/local/kallisto-0.43.1

usage:

kallisto + arguments
karect 1.0 URL:

https://github.com/aminallam/karect

Description:

KAUST Assembly Read Error Correction Tool

emplacement:

/usr/local/karect-1.0

usage:

karect + arguments
kissplice 2.4.0 URL:

http://kissplice.prabi.fr/

Description:

KisSplice is dedicated to de-novo calling of alternative splicing events from one or several

RNA-seq datasets. In addition to splicing events, KisSplice detects small indels (1, 2, 4 or

5 nucleotides), and provides a list of potential inexact tandem repeats and SNPs. Data from

different conditions can be co-assembled with KisSplice .

Emplacement:

/usr/local/kissplice-2.4.0/

Usage:

kissplice + arguments
kissplice2reftranscriptome 1.1.1 URL:

http://kissplice.prabi.fr/tools/kiss2rt/

Description:

With fasta/fastq input from an RNA-seq experiment, SNPs are found by KisSplice without using a reference. KisSplice provides only a local context around the SNP, but a reference transcriptome can be built from the RNAseq data using a full-lenth transcriptome assembler like Trinity. Then SNPs predicted by KisSplice can be positionned along the transcripts (with BLAT). Some SNPs that do not map on the transcripts of Trinity, called orphan SNPs, are harder to study but can still be of interest. We propose a method, KisSplice2RefTranscriptome, to predict a functional impact for the positioned SNPs, and intersect these results with condition-specific SNPs. Overall, starting from RNAseq data only, we obtain a list of condition-specific SNPs stratified by functional impact.

Emplacement:

/usr/local/python-2.7.10/bin

Usage:

kissplice2reftranscriptome + arguments
lam 7.1.4 URL:

http://www.lam-mpi.org/software/

Description:

http://www.lam-mpi.org/software/
lapack 3.1.1 URL:

http://www.netlib.org/lapack/

Description:

APACK is written in Fortran 90 and provides routines for solving systems of simultaneous linear equations, least-squares solutions of linear systems of equations, eigenvalue problems, and singular value problems. The associated matrix factorizations (LU, Cholesky, QR, SVD, Schur, generalized Schur) are also provided, as are related computations such as reordering of the Schur factorizations and estimating condition numbers. Dense and banded matrices are handled, but not general sparse matrices. In all areas, similar functionality is provided for real and complex matrices, in both single and double precision.

The original goal of the LAPACK project was to make the widely used EISPACK and LINPACK libraries run efficiently on shared-memory vector and parallel processors. On these machines, LINPACK and EISPACK are inefficient because their memory access patterns disregard the multi-layered memory hierarchies of the machines, thereby spending too much time moving data instead of doing useful floating-point operations. LAPACK addresses this problem by reorganizing the algorithms to use block matrix operations, such as matrix multiplication, in the innermost loops. These block operations can be optimized for each architecture to account for the memory hierarchy, and so provide a transportable way to achieve high efficiency on diverse modern machines. We use the term ''transportable'' instead of ''portable'' because, for fastest possible performance, LAPACK requires that highly optimized block matrix operations be already implemented on each machine.

LAPACK routines are written so that as much as possible of the computation is performed by calls to the Basic Linear Algebra Subprograms (BLAS). LAPACK is designed at the outset to exploit the Level 3 BLAS — a set of specifications for Fortran subprograms that do various types of matrix multiplication and the solution of triangular systems with multiple right-hand sides. Because of the coarse granularity of the Level 3 BLAS operations, their use promotes high efficiency on many high-performance computers, particularly if specially coded implementations are provided by the manufacturer.

Highly efficient machine-specific implementations of the BLAS are available for many modern high-performance computers. For details of known vendor- or ISV-provided BLAS, consult the BLAS FAQ. Alternatively, the user can download ATLAS to automatically generate an optimized BLAS library for the architecture. A Fortran 77 reference implementation of the BLAS is available from netlib; however, its use is discouraged as it will not perform as well as a specifically tuned implementation.
last 2.6.6
756
URL:

http://last.cbrc.jp/

Description:

LAST finds similar regions between sequences.

emplacement

/usr/local/last-756

usage:

lastal + arguments
lastz 1.02.0 URL:

http://www.bx.psu.edu/miller_lab/dist/lastz-1.02.00.tar.gz

Description:

LASTZ is a program for aligning DNA sequences, a pairwise aligner. Originally designed to handle sequences the size of human chromosomes and from different species, it is also useful for sequences produced by NGS sequencing technologies such as Roche 454.

emplacement:

/usr/local/lastz-1.02

usage:

lastz + arguments
lofreq 0.6.1
2.2.1
URL:

http://sourceforge.net/projects/lofreq/

Description:

LoFreq is a fast and sensitive variant-caller for inferring single-nucleotide variants (SNVs) from high-throughput sequencing data.
LTR_FINDER 1.0.5 URL:

http://code.google.com/p/ltr-finder/

Description:

LTR_Finder is an efficient program for finding full-length LTR retrotranspsons in genome sequences.
MACS 1.4.2 URL:

http://liulab.dfci.harvard.edu/MACS/Download.html

Description:

Next generation parallel sequencing technologies made chromatin immunoprecipitation followed by sequencing (ChIP-Seq) a popular strategy to study genome-wide protein-DNA interactions, while creating challenges for analysis algorithms. We present Model-based Analysis of ChIP-Seq (MACS) on short reads sequencers such as Genome Analyzer (Illumina / Solexa). MACS empirically models the length of the sequenced ChIP fragments, which tends to be shorter than sonication or library construction size estimates, and uses it to improve the spatial resolution of predicted binding sites. MACS also uses a dynamic Poisson distribution to effectively capture local biases in the genome sequence, allowing for more sensitive and robust prediction. MACS compares favorably to existing ChIP-Seq peak-finding algorithms, is publicly available open source, and can be used for ChIP-Seq with or without control samples.

Emplacement:

/usr/local/MACS-1.4.2

Usage:

macs14 + arguments
MACS2 2.1.1 URL:

https://github.com/taoliu/MACS

Description:

With the improvement of sequencing techniques, chromatin immunoprecipitation followed by high throughput sequencing (ChIP-Seq) is getting popular to study genome-wide protein-DNA interactions. To address the lack of powerful ChIP-Seq analysis method, we present a novel algorithm, named Model-based Analysis of ChIP-Seq (MACS), for identifying transcript factor binding sites. MACS captures the influence of genome complexity to evaluate the significance of enriched ChIP regions, and MACS improves the spatial resolution of binding sites through combining the information of both sequencing tag position and orientation. MACS can be easily used for ChIP-Seq data alone, or with control sample with the increase of specificity.

Emplacement:

/usr/local/python-2.7.10/bin/macs2

Usage:

macs2 + arguments
mafft 7.015b
7.305
URL:

http://mafft.cbrc.jp/alignment/software/linux.html

Description:

Multiple sequence alignment and NJ / UPGMA phylogeny

Emplacement:

/usr/local/mafft-7.805

usage:

mafft + arguments
maker 2.31.9 URL: http://www.yandell-lab.org/software/maker.html

Description:

MAKER is a portable and easily configurable genome annotation pipeline. It's purpose is to allow smaller eukaryotic and prokaryotic genomeprojects to independently annotate their genomes and to create genome databases. MAKER identifies repeats, aligns ESTs and proteins to a genome, produces ab-initio gene predictions and automatically synthesizes these data into gene annotations having evidence-based quality values. MAKER is also easily trainable: outputs of preliminary runs can be used to automatically retrain its gene prediction algorithm, producing higher quality gene-models on seusequent runs. MAKER's inputs are minimal and its ouputs can be directly loaded into a GMOD database. They can also be viewed in the Apollo genome browser; this feature of MAKER provides an easy means to annotate, view and edit individual contigs and BACs without the overhead of a database. MAKER should prove especially useful for emerging model organism projects with minimal bioinformatics expertise and computer resources.

emplacement:

/usr/local/maker-2.31.9

usage:

maker + arguments
MALT 0.3.8

URL:

http://ab.inf.uni-tuebingen.de/data/software/malt/download/welcome.html

Description:

MALT is a fast replacement for BLASTX, BLASTP and BLASTN, and provides both local and semi-global alignment capabilities. By default, MALT produces RMA files that can be opened in MEGAN. In addition, MALT can provide alignments in Text, Tab or SAM format.

 

MALT is an extension of MEGAN. 


emplacement:

/usr/local/MALT-0.3.8

usage:

malt-run + arguments

mapDamage 2.0.5 URL:

http://ginolhac.github.io/mapDamage/

Description:

mapDamage2 is a computational framework written in Python and R, which tracks and quantifies DNA damage patterns among ancient DNA sequencing reads generated by Next-Generation Sequencing platforms.

mapDamage is developed at the Centre for GeoGenetics by the Orlando Group.

usage:

mapDamage + arguments

emplacement:

/usr/local/python-2.7.10/bin

usage:

mapDamage + arguments
maxent 3.3.3k URL:

http://www.cs.princeton.edu/~schapire/maxent/

Description:

Maxent software for species habitat modeling
megan

5.4.0

4.70.4

6.4.19

6.6

6.6.4

6.7.11

6.8.5

6.8.14

URL:

http://ab.inf.uni-tuebingen.de/data/software/megan5/download/welcome.html

Description:

The aim of MEGAN is to provide a tool for studying the taxonomic content of a set of DNA reads,

typically collected in a metagenomics project. In a preprocessing step, a sequence comparison of

all reads with a suitable database of reference DNA or protein sequences must be performed to

produce an input le for the program. MEGAN is suitable for DNA reads (metagenome data), RNA

reads (metatranscriptome data), peptide sequences (metaproteomics data) and, using a suitable

synonyms le that maps SILVA ids to taxon ids, on 16S rRNA data (amplicon sequencing).

Emplacement:

/usr/local/megan-6.8.14

 

Usage:

module load bioinfo/megan/6.8.14

MEGAN

meme 4.11.0 URL:

http://meme-suite.org/doc/download.html?man_type=web

Description:

MEME discovers novel, ungapped motifs (recurring, fixed-length patterns) in your sequences (sample output from sequences). MEME splits variable-length patterns into two or more separate motifs

emplacement:

/usr/local/meme-4.11.0

usage:

meme + arguments
metaphyler 0.115 URL:

http://metaphyler.cbcb.umd.edu/#installation

Description:

MetaPhyler is a novel taxonomic classifier for metagenomic shotgun reads, which uses phylogenetic marker genes as a taxonomic reference. Our classifier, based on BLAST, uses different thresholds (automatically learned from the reference database) for each combination of taxonomic rank, reference gene, and sequence length. Our reference database includes marker genes from all complete genomes, several draft genomes and the NCBI nr protein database. Results on simulated metagenomic datasets demonstrate that MetaPhyler outperforms previous tools used in this context (CARMA, MEGAN and PhymmBL).

emplacement:

/usr/local/MetaPhylerSRV0.115

usage:

metaphyler.pl + arguments
MindTheGap 1.0 URL:

https://github.com/GATB/MindTheGap

Description:

MindTheGap performs detection and assembly of DNA insertion variants in NGS read datasets with respect to a reference genome. It takes as input a set of reads and a reference genome. It outputs two sets of FASTA sequences: one is the set of breakpoints of detected insertion sites, the other is the set of assembled insertions for each breakpoint.

Emplacement:

/usr/local/MindTheGap-1.0/build/bin

Usage:

MindTheGap + arguments
mira 4.0
4.0.2
3.4.1.1
http://sourceforge.net/projects/mira-assembler/

Description

MIRA - Sequence assembler and sequence mapping for whole genome shotgun and EST / RNASeq sequencing data. Can use Sanger, 454, Illumina and IonTorrent data. PacBio: CCS and error corrected data usable, uncorrected not yet.
MITE_Hunter 11-2011 URL:

http://target.iplantcollaborative.org/mite_hunter.html

Description:

a program for discovering miniature inverted-repeat transposable elements from genomic sequences
mongodb 2.7.6 URL:

https://www.mongodb.org/downloads.

Description:

MongoDB is an open-source, document-oriented database designed for ease of development and scaling.
MOSAIK 2.1.73 URL:

http://bioinformatics.bc.edu/marthlab/Mosaik

Description:

MOSAIK is a reference-guided assembler comprising of four main modular programs: MosaikBuil, MosaikAligner, MosaikSort, MosaikAssembler.

MosaikBuild converts various sequence formats into Mosaik’s native read format. MosaikAligner pairwise aligns each read to a specified series of reference sequences. MosaikSort resolves paired-end reads and sorts the alignments by the reference sequence coordinates. Finally, MosaikAssembler parses the sorted alignment archive and produces a multiple sequence alignment which is then saved into an assembly file format.MOSAIK is a reference-guided assembler comprising of four main modular programs: MosaikBuil, MosaikAligner, MosaikSort, MosaikAssembler.

MosaikBuild converts various sequence formats into Mosaik’s native read format. MosaikAligner pairwise aligns each read to a specified series of reference sequences. MosaikSort resolves paired-end reads and sorts the alignments by the reference sequence coordinates. Finally, MosaikAssembler parses the sorted alignment archive and produces a multiple sequence alignment which is then saved into an assembly file format.MOSAIK is a reference-guided assembler comprising of four main modular programs: MosaikBuil, MosaikAligner, MosaikSort, MosaikAssembler.

MosaikBuild converts various sequence formats into Mosaik’s native read format. MosaikAligner pairwise aligns each read to a specified series of reference sequences. MosaikSort resolves paired-end reads and sorts the alignments by the reference sequence coordinates. Finally, MosaikAssembler parses the sorted alignment archive and produces a multiple sequence alignment which is then saved into an assembly file format.

emplacement:

/usr/local/MOSAIK-2.1.73

usage:

Mosaik-Aligner + arguments
mothur 1.39.5

URL:

https://github.com/mothur/mothur

Description:

This project seeks to develop a single piece of open-source, expandable software to fill the bioinformatics needs of the microbial ecology community. mothur is available under the GPL license.

emplacement:

/usr/local/mothur-1.39.5

usage:

mothur

mpiblast 1.6.0 URL:

http://www.mpiblast.org/

Description:

mpiBLAST is a freely available, open-source, parallel implementation of NCBI BLAST. By efficiently utilizing distributed computational resources through database fragmentation, query segmentation, intelligent scheduling, and parallel I/O, mpiBLAST improves NCBI BLAST performance by several orders of magnitude while scaling to hundreds of processors. mpiBLAST is also portable across many different platforms and operating systems. Lastly, a renewed focus and consolidation of the many codebases has positioned mpiBLAST to continue to be of high utility to the bioinformatics community

emplacement:

/uslocal/mpiblast-1.6.0
mpich 1.2.7p1 URL:

http://www.mpich.org/

Description:

MPICH is a high performance and widely portable implementation of the Message Passing Interface (MPI) standard.
mpich2 2-1.4.1p1 URL:

http://phase.hpcc.jp/mirrors/mpi/mpich2/

Description:

MPICH2 is an implementation of the Message-Passing Interface (MPI). The goals of MPICH2 are to provide an MPI implementation for important platforms, including clusters, SMPs, and massively parallel processors. It also provides a vehicle for MPI implementation research and for developing new and better parallel programming environments.
mrbayes 3.2.1 URL:

http://sourceforge.net/projects/mrbayes/files/mrbayes/3.2.0/mrbayes-3.2.0.tar.gz/download

Description:

MrBayes is a program for Bayesian inference and model choice across a wide range of phylogenetic and evolutionary models. MrBayes uses Markov chain Monte Carlo (MCMC) methods to estimate the posterior distribution of model parameters.
mscal 1.0 Date d'installation: 01/08/14

URL: http://www.abcgwh.sitew.ch/Utensils.J.htm#Utensils.J

Description:A cookbook to study Genome Wide Heterogeneity in introgression rates

Emplacement:

/usr/local/ABC_tools-1.0/mscal

usage:

mscal + arguments
msmc 1.0 URL:

https://github.com/stschiff/msmc

Description:

This software implements MSMC, a method to infer population size and gene flow from multiple genome sequences (Schiffels and Durbin, 2014, Nature Genetics, or Preprint).

Emplacement:

/usr/local/msmc-1.0

Usage:

msmc + arguments
msnsam 1.0 Date d'installation: 01/08/14URL:http://www.abcgwh.sitew.ch/Utensils.J.htm#Utensils.J

Description:A cookbook to study Genome Wide Heterogeneity in introgression rates

emplacement:

/usr/local/ABC_tools-1.0/msnsam

usage:

msnsam + arguments
MUMmer 3.23 URL:

http://mummer.sourceforge.net/

Description:

MUMmer is a system for rapidly aligning entire genomes, whether in complete or draft form.

emplacement:

/usr/local/MUMmer-3.23

usage:

mummer + arguments
muscle 3.7
3.8.32
URL:

http://www.drive5.com/muscle/downloads.htm

Description:

MUSCLE stands for MUltiple Sequence Comparison by Log- Expectation. MUSCLE is claimed to achieve both better average accuracy and better speed than ClustalW2 or T-Coffee, depending on the chosen options.MUSCLE stands for MUltiple Sequence Comparison by Log- Expectation. MUSCLE is claimed to achieve both better average accuracy and better speed than ClustalW2 or T-Coffee, depending on the chosen options.MUSCLE stands for MUltiple Sequence Comparison by Log- Expectation. MUSCLE is claimed to achieve both better average accuracy and better speed than ClustalW2 or T-Coffee, depending on the chosen options.

Emplacement:

/usr/local/muscle-3.8.32

usage:

muscle3.8.31_i86linux64 + arguments
mysql-server 5.5 serveur de base de données mysql
nagios 3.3.1 URL:

http://www.nagios.org/

Description:

With Nagios you can:

Monitor your entire IT infrastructure

Spot problems before they occur

Know immediately when problems arise

Share availability data with stakeholders

Detect security breaches

Plan and budget for IT upgrades

Reduce downtime and business losses
nanopolish 1.0

URL:

https://github.com/jts/nanopolish

Description:
Software package for signal-level analysis of Oxford Nanopore sequencing data. Nanopolish can calculate an improved consensus sequence for a draft genome assembly, detect base modifications, call SNPs and indels with respect to a reference genome and more

Emplacement:

/usr/local/nanopolish-1.0

 

Usage:

nanopolish + arguments

 

ncbi-tools v20100808 URl:

http://www.ncbi.nlm.nih.gov/IEB/ToolBox/index.cgi

Description:

he NCBI ToolBox consists of three major parts:

Data Model - An explicit data model of biological sequences, structures, bibliographic data, and associated annotations.

Data Encoding - A formal specification and encoding rules. The telecommunications standard, ASN.1, has been used for this. Recently it has been mapped to a similar language, XML.

Programming Libraries - Originally written in a portable dialect of C. Recently a new generation is being written in C++.

emplacement:

/usr/local/ncbi-tools-20100808
NextGenMap v0.5.1

URl:

https://github.com/Cibiv/NextGenMap/wiki

Description:

NextGenMap is a flexible highly sensitive short read mapping tool that handles much higher mismatch rates than comparable algorithms while still outperforming them in terms of runtime. This allows analysing large scale datasets even with increased SNP rates or higher error rates (e.g. caused by specialized experimental protocols) and avoids biases caused by highly variable regions in the genome.

emplacement:

/usr/local/ngm-0.5.1

Usage:

ngm + arguments

 

NgsAdmix 32 URL:

http://www.popgen.dk/software/index.php/NgsAdmix

Description:

It is a very nice tool for finding admixture proportions from NGS data. It is based on genotype likelihoods. It is a fancy multithreaded c/c++ program

Usage:

NGSadmix

Arguments:

-likes Beagle likelihood filename

-K Number of ancestral populations

Optional:

-fname Ancestral population frequencies

-qname Admixture proportions

-outfiles Prefix for output files

-printInfo print ID and mean maf for the SNPs that were analysed

Setup:

-seed Seed for initial guess in EM

-P Number of threads

-method If 0 no acceleration of EM algorithm

-misTol Tolerance for considering site as missing

Stop chriteria:

-tolLike50 Loglikelihood difference in 50 iterations

-tol Tolerance for convergence

-dymBound Use dymamic boundaries (1: yes (default) 0: no)

-maxiter Maximum number of EM iterations

Filtering

-minMaf Minimum minor allele frequency

-minLrt Minimum likelihood ratio value for maf>0

-minInd Minumum number of informative individuals

emplacement:

/usr/local/NgsAdmix
ngsPopGen 1.0 URL:

https://github.com/mfumagalli/ngsPopGen

Description:

Several tools to perform population genetic analyses from NGS data:

ngsFst - Quantificate population genetic differentiation

ngsCovar - Population structure via PCA (principal components analysis)

ngs2dSFS - Estimate 2D-SFS from posterior probabilities of sample allele frequencies

ngsStat - Estimates number of segregating sites, expected average heterozygosity, and number of fixed differences and Dxy (if 2 populations provided).

Utilisation:

ngsFST

Input:

-postfiles: .sfs files with posterior probabilities of sample allele frequencies for each population [required]

-priorfile: 2D-SFS to be used as a prior; you can use ngs2DSFS with parameter -relative set to 1 [NULL]

-priorfiles: marginal spectra to be used as a prior; you can use optimSFS in ANGSD [NULL]

-outfile: name of the output file [required]

-nind: number of individuals for each population [required]

-nsites: total number of sites; in case you want to analyze a subset of sites this is the upper limit [required]

-verbose: level of verbosity [0]

-block_size: to be memory efficient, set this number as the number of sites you want to analyze at each chunk [0]

-firstbase: in case you want to analyze a subset of your sites this is the lower limit [1]

-isfold: boolean, is your data folded? [0]

-islog: boolean, are postfiles in -log (from -realSFS 1 only, required if 2D-SFS is given)? If you use sfstools then set to 1 [0]

ngsFST

Input:

-postfiles: .sfs files with posterior probabilities of sample allele frequencies for each population [required]

-priorfile: 2D-SFS to be used as a prior; you can use ngs2DSFS with parameter -relative set to 1 [NULL]

-priorfiles: marginal spectra to be used as a prior; you can use optimSFS in ANGSD [NULL]

-outfile: name of the output file [required]

-nind: number of individuals for each population [required]

-nsites: total number of sites; in case you want to analyze a subset of sites this is the upper limit [required]

-verbose: level of verbosity [0]

-block_size: to be memory efficient, set this number as the number of sites you want to analyze at each chunk [0]

-firstbase: in case you want to analyze a subset of your sites this is the lower limit [1]

-isfold: boolean, is your data folded? [0]

-islog: boolean, are postfiles in -log (from -realSFS 1 only, required if 2D-SFS is given)? If you use sfstools then set to 1 [0]

ngs2dSFS

Input:

-postfiles: file with sample allele frequency posterior probabilities for each population

-outfile: name of output file

-nind: number of individuals per population

-nsites: number of sites, or upper limit in case of analyzing a subset

-block_size: memory efficiency, number of sites for each chunk

-offset: lower limit in case of analyzing a subset

-maxlike: if 1 compute the most likely joint allele frequency and sum across sites, if 0 it computes the sum of the products of likelihoods

-relative: boolean, if 1 number are relative frequencies from 0 to 1 which sum up 1; if 0 numbers are absolute counts of sites having a specific joint allele frequency

-offset: lower limit of sites in case you want to analyze a subset

-isfold: is data folded?

-islog: is data in log values?

ngsStat

Input:

-npop: how many pops (1 or 2)

-postfiles: .sfs files with posterior probabilities of sample allele frequencies for each population (with or without running sfstools)

-outfile: name of the output file

-nind: number of individuals for each population

-nsites: total number of sites; in case you want to analyze a subset of sites this is the upper limit

-verbose: level of verbosity, if 0 suppress all messages

-block_size: to be memory efficient, set this number as the number of sites you want to analyze at each chunk

-firstbase: in case you want to analyze a subset of your sites this is the lower limit

-isfold: boolean, is your data folded or not?

-islog: boolean, are postfiles in log (from -realSFS 1 only, required if 2D-SFS is given)? If you use sfstools then set -islog 1

-iswin: if 1 then print the value computed for each non-overlapping window defined by block_size

emplacement:

/usr/local/ngsPopGen-1.0
ngsutils 0.5.9

URL:

http://ngsutils.org/installation/

Description:

NGSUtils is made up of 50+ programs, mainly written in Python. These are separated into modules based on the type of file that is to be analyzed. There are four modules:

bamutils (BAM/SAM files)

bedutils (BED files)

fastqutils (FASTQ files, base- and color-space)

gtfutils (GTF gene models)
Each of these modules contains many commands for manipulating, filtering, converting, or analyzing these types of files.
emplacement:

/usr/local/ngsutils-0.5.9

usage:

bamutils + arguments

bedutils + arguments

fastqutils + arguments

gtfutils + arguments

novelseq 1.02 URL:

http://novelseq.sourceforge.net/Home

Description:

The NovelSeq framework is designed to detect novel sequence insertions using high throughput paired-end whole genome sequencing data.

Usage:

novel_cluster + arguments

Emplacement:

/usr/local/novelseq-1.0.2/
nrpe 2.12 URL:

http://exchange.nagios.org/directory/Addons/Monitoring-Agents/NRPE--2D-Nagios-Remote-Plugin-Executor/details

Description:

NRPE allows you to remotely execute Nagios plugins on other Linux/Unix machines. This allows you to monitor remote machine metrics (disk usage, CPU load, etc.). NRPE can also communicate with some of the Windows agent addons, so you can execute scripts and check metrics on remote Windows machines as well.
OmegaPlus 2.3.0 URL:

http://pop-gen.eu/wordpress/software/omegaplus

Description:

A parallel tool for rapid & scalable detection of selective sweeps in whole-genome datasetsWe have developed OmegaPlus, a scalable implementation of the omega-statistic (Kim and Nielsen 2004) to detect selective sweeps in whole-genome data based on linkage disequilibrium patterns.

emplacement:

/usr/local/OmegaPlus_v2.3.0_Linux

usage:

OmegaPlus + arguments
openmpi 1.6 URL:

http://www.open-mpi.org/

Description:

Features implemented or in short-term development for Open MPI include:

Full MPI-3 standards conformance

Thread safety and concurrency

Dynamic process spawning

Network and process fault tolerance

Support network heterogeneity

Single library supports all networks

Run-time instrumentation

Many job schedulers supported

Many OS's supported (32 and 64 bit)

Production quality software

High performance on all platforms

Portable and maintainable

Tunable by installers and end-users

Component-based design, documented APIs

Active, responsive mailing list

Open source license based on the BSD license
organelle-assembler 2.07 URL:

http://pythonhosted.org/ORG.asm/install.html#downloading-and-installing-orgasm

Description:

The ORGanelle ASeMbler is run using the oa Unix command. It is providing a set of sub-commands allowing for the complete assembling of small genomes (organelle genomes) from a genome skimming sequence dataset.

Emplacement:

/usr/local/organelle-assembler-2.04/ORG.asm-0.2.07/bin
orthomcl 2.09 Ortholog groups of proteins sequences

http://orthomcl.org/common/downloads/software/v2.0/

emplacement:

/usr/local/orthomclSoftware-v2.0.9
packages R  

URL: http://www.r-project.org/

Packages installés:

acepack,affy,affyio, amap, animation,annotate, AnnotationDbi, ape,aroma.affymetrix, aroma.apd, aroma.core, aroma.light, base,

base64enc, Biobase, BiocGenerics,

BiocInstaller, biomaRt, Biostrings, biovizBase, bitops, boot,

BSgenome, caTools, class,

cluster, clusterGeneration, clValid, codetools, colorspace, compiler

cummeRbund, datasets,DBI,

DEoptimR, DESeq, DESeq2,

deSolve, dichromat, digest,

diptest, diversitree, DNAcopy,

doMC, edgeR, evaluate, expm, extrafont, extrafontdb,fastcluster,

fastmatch, flexmix, foreach,

foreign, formatR, Formula,

fpc, gdata, genefilter, geneplotter,

GenomicFeatures, GenomicRanges, getopt,

ggplot2, GO.db, gplots,

graph, graphics, grDevices,

grid, gsmoothr, gtable,

gtools, Gviz, highr,

Hmisc, igraph, IRanges,

iterators, kernlab, KernSmooth,

knitr, labeling, lattice,

latticeExtra, limma, lme4,

locfit, maps, markdown,

MASS, Matrix, matrixStats,

mclust, MEMSS, methods,

mgcv, mime, minqa,

mlmRev, mnormt, modeltools,

msm, multicore, munsell,

mvtnorm, nlme, nloptr,

nnet, NOISeq, numDeriv,

nws, optimx, optparse,

org.Mm.eg.db, parallel, pasilla,

pbkrtest, phangorn, phytools,

PKPDmodels, plotrix, plyr,

prabclus, preprocessCore, proto,

PSCBS, quadprog, R.cache,

RColorBrewer, Rcpp, RcppArmadillo,

RcppEigen, RCurl, R.devices,

readDepth, Repitools, reshape,

reshape2, R.filesets, rgl,

R.huge, Ringo, rlecuyer,

R.methodsS3, robustbase, R.oo,

rpart, R.rsp, Rsamtools,

Rsolnp, RSQLite, rtracklayer,

Rttf2pt1, R.utils, scales,

scatterplot3d, snow, SparseM,

spatial, splines, stats,

stats4, stringr, subplex,

survival, tcltk, testthat,

tools, topGO, trimcluster,

truncnorm, utils, vsn,

XML, xtable, XVector, zlibbioc

geoR,sp,geoRglm,maptools,

raster,SDMTools,fields,maps,

survey,Zelig,classInt,akima,

lme4,aod,INLA, IDPmisc,

boa, biocLite(''VariantAnnotation'')

biocLite(''IRanges''),biocLite(''GenomicFeatures''), biocLite(''snpStats'')

HTSCluster

pals 1.0 URL:

http://www.drive5.com/pals/

Description:

PALS is public domain software that finds local alignments of long DNA sequences. PALS stands for Pairwise Aligner for Long Sequences. It was designed for use in our PILER package for genomic repeat analysis, but may also be useful in other applications.
parallel 20160222 URL: http://www.gnu.org/software/parallel/

Description:

GNU parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables. A job can also be a command that reads from a pipe. GNU parallel can then split the input and pipe it into commands in parallel.

emplacement:

/usr/local/parallel-20160222

Usage:

parallel + arguments
partitionfinder2 2.0

URL: https://github.com/brettc/partitionfinder

Description:


PartitionFinder 2 is a Python program for simultaneously choosing partitioning schemes and models of molecular evolution for phylogenetic analyses of DNA, protein, and morphological data. You can PartitionFinder 2 before running a phylogenetic analysis, in order to decide how to divide up your sequence data into separate blocks before analysis, and to simultaneously perform model selection on each of those blocks.


emplacement:

/usr/local/partitionfinder-2.0/

Usage:

python2.7 /usr/local/partitionfinder-2.0/PartitionFinder.py

ou

python2.7 /usr/local/partitionfinder-2.0/PartitionFinderProtein.py

 

ou

 

python2.7 /usr/local/partitionfinder-2.0/PartitionFinderMorphology.py

PASTEClassifier 1.0

URL: https://urgi.versailles.inra.fr/Tools/PASTEClassifier

Description:

Detect TE features on consensus and classify them. Give some classification statistics.

Can rename headers with classification info and Wicker's code at the beginning.

Can reverse-complement consensus if they are detected in reverse strand.

 




emplacement:

/usr/local/PASTEClassifier-1.0/bin

Usage:

PASTEClassifier.py + arguments

 

 

 

 

pasta 1.0

URL: 

https://github.com/smirarab/pasta

Description:

This is an implementation of the PASTA (Practical Alignment using Sate and TrAnsitivity) algorithm published in RECOMB-2014 and JCB

 

emplacement:

/usr/local/PASTA-1.0

Usage:

python2.7 /usr/local/PASTA-1.0/pasta/run_pasta.py + options

 

 

 

 

pcadapt 1.6
2.2
URL:

http://membres-timc.imag.fr/Michael.Blum/PCAdapt.html

Description:

PCAdapt implements a genome scan for detecting genes involved in local adaptation. There are two versions of pcadapt. The first version is based on a Bayesian hierarchical factor model. The second and much more rapid version is a frequentist method based on Principal Component Analysis (PCA). We recommend to use the rapid version based on PCA.

emplacement:

/usr/local/PCAdaptPackage-1.6

Usage:

PCAdapt + arguments

ped2pcadapt + arguments

vcf2pcadapt + arguments
perl 5.10.1
5.16.2
5.14.2
5.22.0
URL:

https://www.perl.org/

Description:

Perl 5 is a highly capable, feature-rich programming language with over 23 years of development.

emplacement:

/opt/perl-5.22.0/bin/
PHASE 2.2.1 URL:

http://stephenslab.uchicago.edu/phase/download.html

Description:

A program for reconstructing haplotypes from population data

Emplacement:

/usr/local/PHASE-2.1.1

Usage:

PHASE + arguments
phpmyadmin 4.1.9 Utilitaire web pour gérer les bases de données
phppgadmin 5.1.1 Utilitaire web permettant la gestion des bases de données postgresql
phrap 0.990329 URL:

http://www.phrap.org/phredphrapconsed.html

Description:

phrap is a program for assembling shotgun DNA sequence data. Among other features, it allows use of the entire read and not just the trimmed high quality part, it uses a combination of user-supplied and internally computed data quality information to improve assembly accuracy in the presence of repeats, it constructs the contig sequence as a mosaic of the highest quality read segments rather than a consensus, it provides extensive assembly information to assist in trouble-shooting assembly problems, and it handles large datasets. See the phrap/cross_match/swat documentation and phrap documentation for additional information.
phredPhrap 1.0 URL:

http://www.phrap.org/phredphrapconsed.html

Description:

The phred software reads DNA sequencing trace files, calls bases, and assigns a quality value to each called base.

phrap is a program for assembling shotgun DNA sequence data. Among other features, it allows use of the entire read and not just the trimmed high quality part, it uses a combination of user-supplied and internally computed data quality information to improve assembly accuracy in the presence of repeats, it constructs the contig sequence as a mosaic of the highest quality read segments rather than a consensus, it provides extensive assembly information to assist in trouble-shooting assembly problems, and it handles large datasets
phrep 1.0 URL:

http://www.phrap.org/phredphrapconsed.html

Description:

The phred software reads DNA sequencing trace files, calls bases, and assigns a quality value to each called base.
phylip 3.69 URL:

http://evolution.genetics.washington.edu/phylip.html

Description:

PHYLIP is a free package of programs for inferring phylogenies.PHYLIP is a free package of programs for inferring phylogenies.PHYLIP is a free package of programs for inferring phylogenies.
PhyloBayes 4.1c

URL:

http://megasun.bch.umontreal.ca/People/lartillot/www/index.htm/>

Description:

phylogenetic reconstruction using infinite mixtures

 

PhyloBayes (Lartillot et al, 2009) is a Bayesian Monte Carlo Markov Chain (MCMC) sampler for phylogenetic reconstruction. Compared to other phylogenetic MCMC samplers, the main distinguishing feature of PhyloBayes is the underlying probabilistic model, CAT (Lartillot and Philippe, 2004). CAT is an infinite mixture model accounting for site-specific amino-acid or nucleotide preferences. It is well suited to phylogenomic studies using large multigene alignments.

 

Emplacement:

/usr/local/phylobayes-4.1c/data

 

PhyloBases-MPI:

 

Emplacement:

/usr/local/phylobayes-mpi-1.7b/data

PhyML 3.0 URL:

http://www.atgc-montpellier.fr/phyml/binaries.php

Description:

''New Algorithms and Methods to Estimate Maximum-Likelihood Phylogenies: Assessing the Performance of PhyML 3.0.''
phyutility 2.2.6

URL:

https://code.google.com/archive/p/phyutility/

Description:

Phyutility (fyoo-til-i-te) is a command line program that performs simple analyses or modifications on both trees and data matrices

 

Emplacement:

/usr/local/phyutility-2.2.6/

 

Usage:

Aller dans le répertoire /usr/local/phyutility-2.2.6/ et taper la commande phyutility + options

picard-tools 1.83
1.115
2.5.0
URL:

http://sourceforge.net/projects/picard/files/picard-tools

Description:

Picard comprises Java-based command-line utilities that manipulate SAM files, and a Java API (SAM-JDK) for creating new programs that read and write SAM files. Both SAM text format and SAM binary (BAM) format are supported
piler 1.0 URL:

http://www.drive5.com/piler/

Description:

PILER is public domain software for analyzing repetitive DNA found in genome sequences.

Dépendances:

muscle and pals
pindel 0.2.4 URL:

http://gmt.genome.wustl.edu/packages/pindel/index.html

Description:

Pindel can detect breakpoints of large deletions, medium sized insertions, inversions, tandem duplications and other structural variants at single-based resolution from next-gen sequence data. It uses a pattern growth approach to identify the breakpoints of these variants from paired-end short reads.

emplacement:

/usr/local/pindel-0.2.4

usage:

pindel + arguments
plink 1.7

URL:

http://pngu.mgh.harvard.edu/~purcell/plink/

Description:
PLINK is a free, open-source whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

 

The focus of PLINK is purely on analysis of genotype/phenotype data, so there is no support for steps prior to this (e.g. study design and planning, generating genotype or CNV calls from raw data). Through integration with gPLINK and Haploview, there is some support for the subsequent visualization, annotation and storage of results.

 

PLINK (one syllable) is being developed by Shaun Purcell at the Center for Human Genetic Research (CHGR), Massachusetts General Hospital (MGH), and the Broad Institute of Harvard & MIT, with the support of others.  

emplacement:

/usr/local/plink-1.07

usage:

plink + arguments

 

Pool-HMM 1.4.2
1.4.3
URL:

https://qgsp.jouy.inra.fr/index.php?option=com_content&view=article&id=56&Itemid=63

Description:

This program aims at estimating allele frequencies and detecting selective sweeps, using NGS data from a sample of pooled individuals from the same population. It implements the derivations of Boitard et al (2012).

The estimation of allele frequencies is based on a probabilistic model, which accounts for differences of coverage and base quality among genomic positions. Using this probabilistic model, the program can estimate the allele frequency spectrum in any genomic region specified by the user. The allele frequency spectrum can also be estimated for any type of annotated feature (e.g. introns), using the script filter-pileup-by-feature.py.

The detection of selective sweeps is based on a Hidden Markov Model (HMM). In this model, each polymorphic site on the genome is assumed to have an hidden state, which can take one of the three following values : ''Neutral'', ''Intermediate'' and ''Selection''. These hidden states are inferred from the observed data, and at the end the sites with hidden state ''Selection'' are the sweep candidates.

emplacement:

/usr/local/Pool-HMM-1.4.3
posgresql-server 9.3.1 Serveur de base de données postgresql
PRANK 150803 URL:

http://wasabiapp.org/software/prank/

Description:

PRANK is a probabilistic multiple alignment program for DNA, codon and amino-acid sequences. It’s based on a novel algorithm that treats insertions correctly and avoids over-estimation of the number of deletion events. In addition, PRANK borrows ideas from maximum likelihood methods used in phylogenetics and correctly takes into account the evolutionary distances between sequences. Lastly, PRANK allows for defining a potential structure for sequences to be aligned and then, simultaneously with the alignment, predicts the locations of structural units in the sequences.

Usage:

prank + arguments

Emplacement:

/usr/local/prank-150803/
primer3 2.3.6
2.3.7
URL:

http://sourceforge.net/projects/primer3/files/

Description:

Design PCR primers from DNA sequence. Widely used (190k Google hits for ''primer3''). From mispriming libraries to sequence quality data to the generation of internal oligos, primer3 does it. C&perl. Developers/testers/documenters needed.
priorgen 1.0 Date d'installation: 01/08/14

Description: A cookbook to study Genome Wide Heterogeneity in introgression rates

URL: http://www.abcgwh.sitew.ch/Utensils.J.htm#Utensils.J

emplacements:

/usr/local/ABC_tools-1.0/priorgen

usage:

priorgen + arguments
psmc 1.0

URL: https://github.com/lh3/psmc

Description: This software package infers population size history from a diploid sequence using the Pairwise Sequentially Markovian

Coalescent (PSMC) model. The detailed model is described in file `psmc.tex'.

Emplacement:

/usr/local/psmc-1.0

Usage:

psmc + arguments

 
ps_scan 1.67 URL:

http://ebi.edu.au/ftp/databases/prosite/ps_scan/

Description:

Search multiple protein sequences for functional amino acid patterns
PyQt4 4.6.2.8 URL:

http://pyqt.sourceforge.net/Docs/PyQt4/installation.html

Descritption:

PyQt4 rassemble une suite d'outils servant à intégrer Python dans l'environnement de développement intégré (IDE) Qt. Le package comprend des jeux de Widgets, une gestion améliorée des couches de management, l'intégration de GUI (Graphic User Interface) pour les applications, une intégration d'OpenGL et de du format SVG, l'intégration et le support de l'outil de traduction Qt Linguist, l'exportation en PDF, etc
PyQt4-devel 4.6.2-8  
python 2.7.6
2.6.6
3.3.3
 
qiime 1.8.0 QIIME consists of native python code and additionally wraps many external applications. This gives the user flexibility to easily build their own analysis pipelines, making use of popular microbial community analysis tools. QIIME handles the processing of input and output of these applications, so the user can spend time analyzing their data rather than parsing, writing, and converting file formats.
quake 0.3 URL:

http://www.cbcb.umd.edu/software/quake/index.html

Description:

Quake is a package to correct substitution sequencing errors in experiments with deep coverage (e.g. >15X), specifically intended for Illumina sequencing reads. Quake adopts the k-mer error correction framework, first introduced by the EULER genome assembly package. Unlike EULER and similar progams, Quake utilizes a robust mixture model of erroneous and genuine k-mer distributions to determine where errors are located. Then Quake uses read quality values and learns the nucleotide to nucleotide error rates to determine what types of errors are most likely. This leads to more corrections and greater accuracy, especially with respect to avoiding mis-corrections, which create false sequence unsimilar to anything in the original genome sequence from which the read was taken.
qualimap 2.0
2.0.2
2.2
URL: http://qualimap.bioinfo.cipf.es/

Qualimap is a platform-independent application written in Java and R that provides both a Graphical User Interface (GUI) and a command-line interface to facilitate the quality control of alignment sequencing data. Shortly, Qualimap:

Examines sequencing alignment data according to the features of the mapped reads and their genomic properties

Provides an overall view of the data that helps to to the detect biases in the sequencing and/or mapping of the data and eases decision-making for further analysis.
R 3.0.2
2.15.1
3.1.2
3.2.1
3.2.2
3.3.1
URL:

http://www.r-project.org/

Description:

R is a free software environment for statistical computing and graphics.
RaxML

7.3.5

8.2.9

URL:

https://github.com/stamatak/standard-RAxML

Description:

Our standard tool for Maximum-likelihood based phylogenetic inference.

Emplacement:

/usr/local/RAxML-8.2.9/

 

Usage:

Version PTHREADS:

 raxmlHPC-PTHREADS + arguments

 

Version SSE3:

raxmlHPC-SSE3  + arguments

Ray 2.3.1
2.3.0
Ray is a parallel software that computes de novo genome assemblies with next-generation sequencing data.

Ray is written in C++ and can run in parallel on numerous interconnected computers using the message-passing interface (MPI) standard.

Ray is maintained by Sébastien Boisvert, a PhD student supervised by Jacques Corbeil and François Laviolette at Université Laval, in Québec, Canada.
RDPTools
2.2.0

URL:

https://github.com/rdpstaff/RDPTools

Description:
This project includes the core modules from the RDP (Classifier, Clustering, SequenceMatch, ProbeMatch, InitialProcessing, FrameBot, ReadSeq) and all their dependencies

Emplacement:

/usr/local/RDPTools-2.0.2

 

usage:

les fichiers jar s'utilisent avec la commande

 

java -jar /usr/local/RDPTools-2.0.2/fichier.jar

 

exemple:

java -jar /usr/local/RDPTools-2.0.2/classifier.jar

reads2snp 1.0 http://kimura.univ-montp2.fr/PopPhyl/index.php?section=tools

Linking molecular evolution to species biology and ecology.

Emplacement:

/usr/local/reads2snp-1.0
ReAS 2.02 URL:

ftp://ftp.genomics.org.cn/pub/ReAS/software/

Description:

ReAS – Software to recover ancestral sequences for transposable elements using unassembled reads from a whole genome shotgun sequencing.

Emplacement:

/usr/local/ReAS-2.02
recon 1.05
1.07
1.08
URL:

http://selab.janelia.org/recon.html

Description:

Proper identification of repetitive sequences is an essential step in genome analysis. The RECON package performs de novo identification and classification of repeat sequence families from genomic sequences. The underlying algorithm is based on extensions to the usual approach of single linkage clustering of local pairwise alignments between genomic sequences. Specifically, our extensions use multiple alignment information to define the boundaries of individual copies of the repeats and to distinguish homologous but distinct repeat element families. RECON should be useful for first-pass automatic classification of repeats in newly sequenced genomes.

Emplacement:

/usr/local/RECON-1.08
REPdenovo 1.0.3 URL:

https://github.com/Reedwarbler/REPdenovo

Description:

REPdenovo is designed for constructing repeats directly from sequence reads. It based on the idea of frequent k-mer assembly. REPdenovo provides many functionalities, and can generate much longer repeats than existing tools. The overall pipeline is shown in the mannual file. REPdenovo supports the following main functionalities.

Assembly. This step performs k-mer counting. Then we find frequent k-mers whose frequencies are over certain threshold. We then assemble these frequent k-mers into consensus repeats (in the form of contigs). Then we merge the constructed contigs to more completeness ones.

Scaffolding. We use paired-end reads to connect repeat contigs into scaffolds, also provide the average coverage (indicates the copy number) for each constructed repeats.

Emplacement:

/usr/local/REPdenovo-1.0.3

Usage:

Python2.7 + scripts python.

ContigsMerger + arguments

TERefiner_1 + arguments
RepeatMasker 6/20/13 URL:

http://www.repeatmasker.org/RMDownload.html

Description:

RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences. The output of the program is a detailed annotation of the repeats that are present in the query sequence as well as a modified version of the query sequence in which all the annotated repeats have been masked (default: replaced by Ns). Currently over 56% of human genomic sequence is identified and masked by the program. Sequence comparisons in RepeatMasker are performed by one of several popular search engines including nhmmer, cross_match, ABBlast/WUBlast, RMBlast and Decypher. RepeatMasker makes use of curated libraries of repeats and currently supports Dfam ( profile HMM library derived from Repbase sequences ) and Repbase, a service of the Genetic Information Research Institute.
RepeatModeler 1.0.10

URL:

http://www.repeatmasker.org/RepeatModeler/

Description:

RepeatModeler is a de-novo repeat family identification and modeling package. At the heart of RepeatModeler are two de-novo repeat finding programs ( RECON and RepeatScout ) which employ complementary computational methods for identifying repeat element boundaries and family relationships from sequence data. RepeatModeler assists in automating the runs of RECON and RepeatScout given a genomic database and uses the output to build, refine and classify consensus models of putative interspersed repeats.

Emplacement:

/usr/local/RepeatModeler-1.0.10

RepeatScout 1.0
1.0.5
URL:

http://bix.ucsd.edu/repeatscout/

Description:

RepeatScout is a tool to discover repetitive substrings in DNA.
REPET 2.2
2.5
URL:

https://urgi.versailles.inra.fr/Tools/REPET

Description:

The REPET package ( Flutre et al, 2011 ) integrates bioinformatics programs in order to tackle biological issues at the genomic scale

/usr/local/REPET-2.5
revbayes 1.0.0 beta
1.0.1
URL:

https://github.com/revbayes/

Description:

RevBayes -- Bayesian phylogenetic inference using probabilistic graphical models and an interpreted language.

emplacement:

/usr/local/revbayes-1.0.1

usage:

rb + arguments

ou rb-mpi + arguments
rrdtool 1.4.5
1.6.0
URL:

http://oss.oetiker.ch/rrdtool/

Description:

RRDtool is the OpenSource industry standard, high performance data logging and graphing system for time series data. RRDtool can be easily integrated in shell scripts, perl, python, ruby, lua or tcl applications.

emplacement:

/usr/local/rrdtools-1.6.0
samstat 1.5 Installé le 29/10/14

URL:

http://samstat.sourceforge.net/

Description:

Displaying sequence statistics for next generation sequencing

Works with large fasta, fastq and SAM/BAM files.

emplacement:

/usr/local/samstat-1.5
samtools 0.1.18
0.1.18
1.1
1.3.1
URL:

http://samtools.sourceforge.net/

description:

SAM (Sequence Alignment/Map) format is a generic format for storing large nucleotide sequence alignments. SAM aims to be a format that:

Is flexible enough to store all the alignment information generated by various alignment programs;

Is simple enough to be easily generated by alignment programs or converted from existing alignment formats;

Is compact in file size;

Allows most of operations on the alignment to work on a stream without loading the whole alignment into memory;

Allows the file to be indexed by genomic position to efficiently retrieve all reads aligning to a locus.

SAM Tools provide various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format.
seaview 4.3.2 URL:

http://pbil.univ-lyon1.fr/software/seaview.html

Description:

SeaView is a multiplatform, graphical user interface for multiple sequence alignment and molecular phylogeny.

emplacement:

/usr/local/seaview-4.3.2
seqclean 1.0 URL:

http://sourceforge.net/projects/seqclean/files/

Description:

SeqClean is a tool for validation and trimming of DNA sequences from a flat

file database (FASTA format). SeqClean was designed primarily for ''cleaning''

of EST databases, when specific vector and splice site data are not

available, or when screening for various contaminating sequences is desired.

The program works by processing the input sequence file and filtering its

content according to a few criteria:

* percentage of undetermined bases

* polyA tail removal

* overall low complexity analysis

* short terminal matches with various sequences used

during the sequencing process (vectors, adapters)

* strong matches with other contaminants or unwanted sequences

(mitochondrial, ribosomal, bacterial, other species than the

target organism etc.)
sequin 12.30 URL:

http://www.ncbi.nlm.nih.gov/Sequin/

Description:

Sequin is a stand-alone software tool developed by the NCBI for submitting and updating entries to the GenBank sequence database. It is capable of handling simple submissions that contain a single short mRNA sequence, and complex submissions containing long sequences, multiple annotations, gapped sequences, or phylogenetic and population studies. A single Sequin file should contain less than 10,000 sequences for maximum performance. Larger submissions should be made with tbl2asn
sff2fastq 1.0 URL:

https://github.com/indraniel/sff2fastq

Description:

The program sff2fastq extracts read information from a SFF file, produced by the 454 genome sequencer, and outputs the sequences and quality scores in a FASTQ format.

emplacement:

/usr/local/sff2fastq-1.0/sff2fastq-master/

usage:

sff2fastq + arguments
ShortStack 3.3 URL:

https://github.com/MikeAxtell/ShortStack/releases/

Description:

Alignment of small RNA-seq data and annotation of small RNA-producing

genes

emplacement:

/usr/local/ShortStack-3.3

commandes:

ShortStack + arguments
shrimp 2.2.3 URL:

http://compbio.cs.toronto.edu/shrimp/

Description:

SHRiMP is a software package for aligning genomic reads against a target genome. It was primarily developed with the multitudinous short reads of next generation sequencing machines in mind, as well as Applied Biosystem's colourspace genomic representation. 

emplacement:

/usr/local/SHRIMP-2.2.3/bin
SIBsim4 0.20 URL:

http://sibsim4.sourceforge.net/

Description:

The SIBsim4 project is based on sim4, which is a program designed to align an expressed DNA sequence with a genomic sequence, allowing for introns
silix 1.2.11

URL:
http://lbbe.univ-lyon1.fr/Download.htm


Description:

The software package SiLiX implements an ultra-efficient algorithm for the clustering

of homologous sequences, based on single transitive links (single linkage) with alignment coverage constraints.

 

emplacement:

/usr/local/silix-1.2.11

Usage:

silix + arguments

Site_frequency_spectra 1.0
2.5
Linking molecular evolution to species biology and ecology.

emplacement:

/usr/local/site-fresquency-spectra-1.0
smrtanalysis 2.0.0
2.0.1
URL:

http://www.pacb.com/devnet/

Description:

PacBio's open source software suite for single molecule,

real-time sequencing.
snap 2013-11-29 URL:

http://korflab.ucdavis.edu/software.html

Description:

(Semi-HMM-based Nucleic Acid Parser) gene prediction tool

Emplacement:

/usr/local/snap-2013-11-29/

usage:

snap + arguments
snpEff

3.1
4.2

4.3

URL:

http://snpeff.sourceforge.net/

Description:

Genetic variant annotation and effect prediction toolbox. It annotates and predicts the effects of variants on genes (such as amino acid changes).

emplacement:

/usr/local/snpEff-4.3/

usage:

java -jar /usr/local/snpEff-4.3 + arguments

soap 2.21 URL:

http://soap.genomics.org.cn/down/

Description:

SOAP has been in evolution from a single alignment tool to a tool package that provides full solution to next generation sequencing data analysis. Currently, it consists of a new alignment tool (SOAPaligner/soap2), a re-sequencing consensus sequence builder (SOAPsnp), an indel finder ( SOAPindel ), a structural variation scanner ( SOAPsv ) and a de novo short reads assembler ( SOAPdenovo ). And a GPU-accelerated alignment tool (SOAP3/GPU) are being implemented.SOAP has been in evolution from a single alignment tool to a tool package that provides full solution to next generation sequencing data analysis. Currently, it consists of a new alignment tool (SOAPaligner/soap2), a re-sequencing consensus sequence builder (SOAPsnp), an indel finder ( SOAPindel ), a structural variation scanner ( SOAPsv ) and a de novo short reads assembler ( SOAPdenovo ). And a GPU-accelerated alignment tool (SOAP3/GPU) are being implemented.

emplacement:

/usr/local/soap-2.21
SOAPdenovo 2.04

URL:

https://github.com/BGIshenzhen/SOAPdenovo 

Description:
SOAPdenovo is a novel short-read assembly method that can build a de novo draft assembly for the human-sized genomes. 


emplacement:

/usr/local/SOAPdenovo-2.04

Usage:

SOAPdenovo-63mer + arguments
ou
SOAPdenovo-127mer + arguments 

 

sortmerna 1.9
1-99-beta
2.1
URL: http://bioinfo.lifl.fr/RNA/sortmerna/index.php

description:

SortMeRNA is a software designed to rapidly filter ribosomal RNA fragments from metatranscriptomic data produced by next-generation sequencers. It is capable of handling large RNA databases and sorting out all fragments matching to the database with high accuracy and specificity.

emplacement:

/usr/local/sortmerna-2.1-linux-64

usage:

sortmerna + arguments
sowhat 0.22 URL:

https://github.com/josephryan/sowhat

Description:

sowhat automates the SOWH phylogenetic topology test (described by the manuscripts listed in FURTHER READING below). It works on amino acid, nucleotide, and binary character state datasets. Partitions (including codon position partitioning) can be specified.

A manuscript describing the sowhat and the SOWH is available at bioRxiv: http://biorxiv.org/content/early/2014/05/19/005264

sowhat includes several features that provide flexibility and aid in the interpretation and assessment of SOWH test results, including:

The test can be run with the adjustment suggested by Susko 2014 (http://dx.doi.org/10.1093/molbev/msu039), which is the default behavior, or as originally described.

Gaps are propagated from the original dataset to the simulated dataset.

Likelihood searches can be performed with RAxML or GARLI

Boostrap replicate datasets can be simulated with Seq-Gen or PhyloBayes.

Different models can be used for simulation and inference.

Confidence intervals are estimated for the p-value, which helps the investigator assess if a sufficient number of bootstrap replicates have been sampled.

The option to account for variability in the maximum likelihood searches by estimating the test statistic and parameters for each new alignment.

Usage:

sowhat + argument(s)

emplacement:

/usr/local/sowhat-0.22/bin/
SPAdes 3.11.1 URL:

http://cab.spbu.ru/software/spades/

Description:

SPAdes – St. Petersburg genome assembler – is an assembly toolkit containing various assembly pipeline

Usage:

spades.py + argument(s)

emplacement:

/usr/local/SPAdes-3.11.1
splatche2 2.01 URL:

http://www.splatche.com/

Description:

SPLATCHE2 is a program to simulate the demography of populations and the resulting molecular diversity for a wide range of evolutionary scenarios. The spatially-explicit simulation framework can account for environmental heterogeneity and fluctuations, and it can manage multiple population sources. A coalescent-based approach is used to generate genetic markers mostly used in population genetics studies (DNA sequences, SNPs, STRs, or RFLPs). Various combinations of independent, fully or partially linked genetic markers can be produced under a recombination model based on the ancestral recombination graph. Competition between two populations (or species) can also be simulated with user-defined levels of admixture between the two populations. SPLATCHE2 may be used to generate the expected genetic diversity under complex demographic scenarios and can thus serve to test null hypotheses. For model parameter estimation, SPLATCHE2 can easily be integrated into an Approximate Bayesian Computation (ABC) framework.

Emplacement:

/usr/local/splatche-2.0.1

Usage:

splatche2 + arguments
sprites 0.3.0 URL:

https://github.com/zhangzhen/sprites

Description:

Sprites is a sv caller that specializes in detecting deletion from low-coverage sequencing data. It works by identifying split reads from alignments based on soft-clipping information. By re-aligning a split read to one of its target sequences derived from paired-end reads that span it, a deletion is predicted and breakpoint ends are pinpointed with base-pair resolution. Sprites uses alignments produced by BWA. Of course, it can also use those produced by other read aligners that support 5'- or 3'-end soft-clipping, like Bowtie2. It can also be extended to detect other types of sv.

emplacements:

/usr/local/sprites-0.3.0

usage:

sprites + arguments
ssaha2

2.5.3

 

URL:

http://www.sanger.ac.uk/science/tools/ssaha2-0

Description:
SSAHA2 (Sequence Search and Alignment by Hashing Algorithm) is a pairwise sequence alignment program designed for the efficient mapping of sequencing reads onto genomic reference sequences.


emplacement:

/usr/local/ssaha-2.5.3/

Usage:

ssaha + arguments

sra-toolkit


2.8.1

2.7.0

URL:

https://github.com/ncbi/sra-tools

Description:

The SRA Toolkit and SDK from NCBI is a collection of tools and libraries for using data in the INSDC Sequence Read Archives.

emplacement:

/usr/local/sratoolkit-2.8.1

Usage:

 

launch  one the the binaires contained in /usr/local/sratoolkit-2.8.1/bin + arguments

 

stacks

1.29
1.42

1.43

URL:

http://creskolab.uoregon.edu/stacks/

Description:

Stacks is a software pipeline for building loci from short-read sequences, such as those generated on the Illumina platform. Stacks was developed to work with restriction enzyme-based data, such as RAD-seq, for the purpose of building genetic maps and conducting population genomics and phylogeography.

emplacement:

/usr/local/stacks-1.43/
stampy 1.0.31

URL:

http://www.well.ox.ac.uk/stampy

Description:

Stampy is a package for the mapping of short reads from illumina sequencing machines onto a reference genome. It's recommended for most workflows, including those for genomic resequencing, RNA-Seq and Chip-seq
Stampy excels in the mapping of reads containing that contain sequence variation relative to the reference, in particular for those containing insertions or deletions. It can map reads from a highly divergent species to a reference genome for instance. Stampy achieves high sensitivity and speed by using a fast hashing algorithm and a detailed statistical model. Stampy has the following features:

Maps single, paired-end and mate pair Illumina reads to a reference genome

Fast: about 20 Gbase per hour in hybrid mode (using BWA)

Low memory footprint: 2.7 Gb shared memory for a 3Gbase genome

High sensitivity for indels and divergent reads, up to 10-15%

Low mapping bias for reads with SNPs

Well calibrated mapping quality scores

Input: Fastq and Fasta; gzipped or plain

Output: SAM, Maq's map file

Optionally calculates per-base alignment posteriors

Optionally processes part of the input

Handles reads of up to 4500 bases

emplacement:

/usr/local/stampy-1.0.31

Usage:

 

python2.7 stampy.py + arguments

STAR 2.4.1.c
2.5
URL:

https://github.com/alexdobin/STAR

Description:

STAR is a RNA-seq aligner

Usage:

The basic options to generate genome indices are as follows:

--runThreadN NumberOfThreads

--runMode genomeGenerate

--genomeDir /path/to/genomeDir

--genomeFastaFiles /path/to/genome/fasta1 /path/to/genome/fasta2 ...

--sjdbGTFfile /path/to/annotations.gtf

--sjdbOverhang ReadLength-1

--runThreadN option defines the number of threads to be used for genome generation, it has

to be set to the number of available cores on the server node.

--runMode genomeGenerate option directs STAR to run genome indices generation job.

--genomeDir specifies path to the directory (henceforth called ”genome directory” where the

genome indices are stored. This directory has to be created (with mkdir) before STAR run

and needs to writing permissions. The file system needs to have at least 100GB of disk space

available for a typical mammalian genome. It is recommended to remove all files from the

genome directory before running the genome generation step. This directory path will have to

be supplied at the mapping step to identify the reference genome.

--genomeFastaFiles specified one or more FASTA files with the genome reference sequences.

Multiple reference sequences (henceforth called chromosomes) are allowed for each fasta file.

4

You can rename the chromosomes names in the chrName.txt keeping the order of the chromosomes

in the file: the names from this file will be used in all output alignment files (such as

.sam). The tabs are not allowed in chromosomes names, and spaces are not recommended.

--sjdbGTFfile specifies the path to the file with annotated transcripts in the standard GTF

format. STAR will extract splice junctions from this file and use them to greatly improve

accuracy of the mapping. While this is optional, and STAR can be run without annotations,

using annotations is highly recommended whenever they are available. Starting from 2.4.1a,

the annotations can also be included on the fly at the mapping step.

--sjdbOverhang specifies the length of the genomic sequence around the annotated junction

to be used in constructing the splice junctions database. Ideally, this length should be equal

to the ReadLength-1, where ReadLength is the length of the reads. For instance, for Illumina

2x100b paired-end reads, the ideal value is 100-1=99. In case of reads of varying length, the

ideal value is max(ReadLength)-1. In most cases, the default value of 100 will work as

well as the ideal value

emplacement:

/usr/local/STAR-2.4.1.c/source/
starjava 1.0 URL:

https://github.com/Starlink/starjava/

DEscription:

Java applications initially developed for the Starlink Project but now developed independently

Emplacement:

/usr/local/starjava-1.0
stringtie 1.0.4
1.1.2
URL:

http://ccb.jhu.edu/software/stringtie/

Descrition:

StringTie is a fast and highly efficient assembler of RNA-Seq alignments into potential transcripts. It uses a novel network flow algorithm as well as an optional de novo assembly step to assemble and quantitate full-length transcripts representing multiple splice variants for each gene locus. Its input can include not only the alignments of raw reads used by other transcript assemblers, but also alignments longer sequences that have been assembled from those reads.To identify differentially expressed genes between experiments, StringTie's output can be processed either by the Cuffdiff or Ballgown programs.

Emplacement:

/usr/local/stringtie-1.1.2/

Usage:

stringtie + arguments
structure 2.3.4 URL:

http://pritch.bsd.uchicago.edu/structure_software/release_versions

Description:

The program structure is a free software package for using multi-locus genotype data to investigate population structure.
suite 454 sequencing 2.9 URL:

http://454.com/contact-us/software-request-thank-you.asp

Description:

The GS Data Analysis Software package includes the tools to investigate complex genomic variation in samples including de novo assembly, reference guided alignment and variant calling, and low abundance variant identification and quantification. The suite of software is provided with the GS Junior and GS FLX System at no additional cost and allows researchers to begin interpreting sequence data immediately, without the need to invest in complex and expensive third party solution. Each of the software tools incorporates flow and signal information into the sequence analysis algorithms leading to higher confidence variant calling. Additionally, researchers can interrogate sequence data down to the flow-by-flow signal intensities used in base calling.

GS de Novo Assembler:

A powerful tool for de novo assembly of genomes up to 3 Gb in size

Microbial genome assembly on commodity workstation hardware

Perform whole genome assembly with shotgun reads alone or in combination with 3, 8, or 20 kb span paired end reads to order and join contigs into scaffolds to accurately reconstruct the structure of a genome

Produce high-quality assemblies for microbial genomes in as little as 15 minutes, and in less than 24 hours for larger genomes

Perform de novo assembly of EST reads from cDNA library sequencing runs to accurately reconstruct the transcriptome and identify novel genes, isoform variants and transcript fusions

Graphical software environment for quick project set-up and assembly viewing down to the flowgram level

Command line operation available for power users and scripting

Perform hybrid assemblies using GS FLX and GS Junior shotgun and paired end reads with additional capillary-sequencing or short read sequencing reads (FASTA or FASTQ)

Data output: Contig sequence and quality files (sequence of contigs and corresponding Phred equivalent quality scores, FASTA format), ace.file (alignment of the reads to contig sequence), optional Consed output and more

GS Reference Mapper:

Rapidly and accurately align reads to any reference genome

Identify differences compared to the reference

Annotate reference features and variations

Explore the full spectrum of genomic variation:

Local variation detection: SNPs, insertions and deletions (blocks up to 50 bases)

Structural variation detection: large inserts and deletions, inversions, duplications, translocations and fusions

Data outputs: fna.file (sequence of contigs, FASTA format), qual.file (corresponding Phred equivalent quality score), ace.file (consensus alignment of the reads against a given reference sequence), and SAM/BAM (industry standard alignment format)

GS Amplicon Variant Analyser:

Aligns PCR amplicon reads against a reference sequence

Accurately detects and quantifies known variants in complex pools

Defines and discovers novel variants

Performs haplotyping– identify multiple linked variants over the full amplicon length

Detects low-frequency (<1%) variants in complex mixtures, such as somatic mutations and viral quasispecies

Collapse high-depth sequences into consensus sequences to explore the unique members of a mixture

Flexible project set-up: separate samples / results based on MID tags (“barcodes”), associate amplicons, references and samples for simple and complex experimental designs

Data outputs: ace.file (alignment of the reads against a reference sequence), png.file (graphical file format, tab delimited text file), and SAM/BAM (industry standard alignment format)

emplacement:

/usr/local/suite-sequencing-454-2.9
SweeD 3.2.12
3.3.2
URL:http://pop-gen.eu/wordpress/software/sweed

Description:

We developed SweeD, a parallel and checkpointable tool that implements a composite likelihood ratio test for detecting selective sweeps.

SweeD is based on the SweepFinder algorithm (Nielsen et al. 2005).

SweeD can calculate the theoretical SFS of a given demographic model (stepwise changes or with an exponential growth phase + stepwise changes) by using the method by Živković and Stephan (2011).

SweeD is numerically more stable than SweepFinder (in terms of floating-point arithmetic operations and in particular for folded data), and is faster than SweepFinder when the number of sequences is large.

SweeD has been tested on simulated datasets with up to 10,000 sequences and 1,000,000 SNPs.

The sequential version of SweeD is up to 21 times faster than SweepFinder, depending on the number of SNPs and the number of sequences.

Performance improves over SweepFinder with an increasing number of sequences.

For few sequences, SweeD is as fast as SweepFinder.

SweeD has been also used to analyze the Chromosome 1 from the 1000 Genomes Project.

The dataset comprises more than 2000 sequences and about 2,896,000 SNPs. The analysis required 8h and 15mins.

PATH: /usr/local/SweeD-3.3.2/
SweepFinder 1.0 URL:

http://people.binf.ku.dk/rasmus/webpage/sf.html

Description:

SweepFinder is a program implementing the method described in Nielsen et al. 2005. Genomic scans for selective sweeps using SNP data. Genome Research 1566-1575. It can be used to detect the location of a selective sweep based on SNP data. It will also estimate the frequency spectrum of observed SNP data in the presence of missing data.

Emplacement:

/usr/local/SweepFinder

Usage:

SweepFinder + arguments
tabix 0.2.6 URL:

http://sourceforge.net/projects/samtools/files/tabix/

Description:

Tabix indexes a TAB-delimited genome position file in.tab.bgz and creates an index file in.tab.bgz.tbi when region is absent from the command-line. The input data file must be position sorted and compressed by bgzip which has a gzip(1) like interface. After indexing, tabix is able to quickly retrieve data lines overlapping regions specified in the format ''chr:beginPos-endPos''. Fast data retrieval also works over network if URI is given as a file name and in this case the index file will be downloaded if it is not present locally.

emplacement:

/usr/local/tabix-0.2.6
tablet 1.16.09.06

URL:

http://bioinf.scri.ac.uk/tablet/download.shtml

Description:

Tablet is a lightweight, high-performance graphical viewer for next generation sequence assemblies and alignments.

Emplacement:

/usr/local/tablet-1.16.09.06/

Usage:

Se connecter avec une session graphique et taper la commande: tablet

       

tassel version 4
5.1.0
5.2.6
5.2.8
URL pour télécharger:

http://www.maizegenetics.net/index.php?option=com_content&task=view&id=89&Itemid=119

Description:

TASSEL is a software package to evaluate traits associations, evolutionary patterns, and linkage disequilibrium. Strengths of this software:

1. It provides a number of new and powerful statistical approaches to association mapping such as a General Linear Model (GLM) and Mixed Linear Model (MLM). MLM is an implementation of the technique which our recently published Nature Genetics paper - Unified Mixed-Model Method for Association Mapping - which reduces Type I error in association mapping with complex pedigrees, families, founding effects and population structure.

2. Ability to handle a wide range of indels (insertion & deletions). Most software package ignore this type of polymorphism, however, in some species (like maize) this is the most common type of polymorphism.
t_coffee 11.00.8cbe486



URL:
http://www.tcoffee.org/Projects/tcoffee/

Description:

T-Coffee is a multiple sequence alignment package. You can use T-Coffee to align sequences or to combine the output of your favorite alignment methods (Clustal, Mafft, Probcons, Muscle...) into one unique alignment (M-Coffee)

T-Coffee can align Protein, DNA and RNA sequences. It is also able to combine sequence information with protein structural information (3D-Coffee/Expresso), profile information (PSI-Coffee) or RNA secondary structures (R-Coffee).

Emplacement:

/usr/local/tcoffee-11.00.8cbe486/

 

 

Usage:

 

t_coffee + arguments

     
Tedna 1.2.2 URL:

https://urgi.versailles.inra.fr/Tools/Tedna

Description:

Tedna is a lightweight de novo transposable element assembler. It assembles the transposable elements directly from the raw reads.

Emplacement:

/usr/local/tedna_1.2.2

Usage:

tedna + arguments
TESS3 1.0 URL:

https://github.com/cayek/TESS3

Description:

TESS3 is a fast and efficient program for estimating spatial population structure based on geographically constrained non-negative matrix factorization and population genetics.

Emplacement:

/usr/local/TESS3-1.0/build

Usage:

TESS3 + arguments
TEtools 3.0

URL:

https://github.com/l-modolo/TEtools

Description:

TEtools is composed of three tools (TEcount, TEdiff and PingPong)

Emplacement:

/usr/local/TEtools-3.0/

variable d'environnement:

TEtools_PATH=/usr/local/TEtools-3.0/


Usage:

python3 $TEtools_PATH/TEcount.py + arguments

python3 $TEtools_PATH/PingPong.py + arguments

Rscript $TEtools_PATH/TEdiff.R + arguments

 

tigcl 1.0 URL:

ftp://occams.dfci.harvard.edu/pub/bio/tgi/software/tgicl/

Description:

TGI Clustering tools (TGICL): a software system for fast clustering of large EST datasets

This package automates clustering and assembly of a large EST/mRNA dataset. The clustering is performed by a slightly modified version of NCBI's megablast , and the resulting clusters are then assembled using CAP3 assembly program. TGICL starts with a large multi-FASTA file (and an optional peer quality values file) and outputs the assembly files as produced by CAP3. Both clustering and assembly phases can be parallelized by distributing the searches and the assembly jobs across multiple CPUs, as TGICL can take advantage of either SMP machines or PVM (Parallel Virtual Machine) clusters. The two full precompiled packages below were built on Linux and SunOS, respectively. They include CAP3, mgblast and all the other binaries for this platform (of course, except the base Unix utilities like 'sed', 'sort' etc.). Please note that only the Linux version was thoroughly tested at DFCI.

Dépendances: cap3, megablast
tmhmm 2.0c

URL:


https://github.com/dansondergaard/tmhmm.py/blob/master/README.md


Description:

Prediction of transmembrane helices in proteins

Emplacement:

/usr/local/bin/tmhmm

TOGGLE 0.2 URL:

https://github.com/SouthGreenPlatform/TOGGLE

Description:

TOGGLE (TOolbox for Generic nGs anaLysEs) is a suite of 10 packages and more than 110 modules able to manage a large set of NGS softwares and utilities to easily design pipelines able to handle hundreds of samples. Moreover, TOGGLE offers an easy way to manipulate the various options of the different softwares through the pipelines in using a single basic configuration file, that can be changed for each assay without having to change the code itself.

We present also the implementation of TOGGLE in a complete analysis pipeline designed for SNP discovery for large sets of NGS data, ready to use in different environments (single machine to HPC clusters).
tomcat 8.0.14
7.0.47
description:

Apache Tomcat is an open source software implementation of the Java Servlet and JavaServer Pages technologies. The Java Servlet and JavaServer Pages specifications are developed under the Java Community Process.
tophat 2.0.6
2.0.13
2.0.14
URL:

http://tophat.cbcb.umd.edu/

Descrption:

TopHat is a fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons.
TransDecoder 2.1.0
3.0.0
URL:

http://transdecoder.github.io/

Description:

TransDecoder identifies candidate coding regions within transcript sequences, such as those generated by de novo RNA-Seq transcript assembly using Trinity, or constructed based on RNA-Seq alignments to the genome using Tophat and Cufflinks.

TransDecoder identifies likely coding sequences based on the following criteria:

a minimum length open reading frame (ORF) is found in a transcript sequence

a log-likelihood score similar to what is computed by the GeneID software is > 0.

the above coding score is greatest when the ORF is scored in the 1st reading frame as compared to scores in the other 5 reading frames.

if a candidate ORF is found fully encapsulated by the coordinates of another candidate ORF, the longer one is reported. However, a single transcript can report multiple ORFs (allowing for operons, chimeras, etc).

optional the putative peptide has a match to a Pfam domain above the noise cutoff score.

The software is primarily maintained by Brian Haas at the Broad Institute and Alexie Papanicolaou at the Commonwealth Scientific and Industrial Research Organisation (CSIRO). It is integrated into other related software such as Trinity, PASA, EVidenceModeler, and Trinotate.

emplacement:

/usr/local/TransDecoder-3.0.0/

Usage:

TransDecoder.LongOrfs + arguments
treeview 3.0 URL:

http://taxonomy.zoology.gla.ac.uk/rod/treeview.html

Description:

Treeview is a tool for displaying hierarchical structures, and knows about astronomical file formats amongst others. Operation is very intutive.

emplacement:

/usr/local/startjava/bin/treeview
TreeviewX 0.5 URL:

http://code.google.com/p/treeviewx/

Description:

TreeView X is an open source program to display phylogenetic trees on Linux, Unix, Mac OS X, and Windows platforms
trf 4.07b URL:

http://tandem.bu.edu/trf/trf.download.html

Description:

A tandem repeat in DNA is two or more adjacent, approximate copies of a pattern of nucleotides. Tandem Repeats Finder is a program to locate and display tandem repeats in DNA sequences.

Emplacement:

/usr/local/bin
Trimmomatic 0.33 URL:

http://www.usadellab.org/cms/?page=trimmomatic

Description:

rimmomatic performs a variety of useful trimming tasks for illumina paired-end and single ended data.The selection of trimming steps and their associated parameters are supplied on the command line.

The current trimming steps are:

ILLUMINACLIP: Cut adapter and other illumina-specific sequences from the read.

SLIDINGWINDOW: Perform a sliding window trimming, cutting once the average quality within the window falls below a threshold.

LEADING: Cut bases off the start of a read, if below a threshold quality

TRAILING: Cut bases off the end of a read, if below a threshold quality

CROP: Cut the read to a specified length

HEADCROP: Cut the specified number of bases from the start of the read

MINLEN: Drop the read if it is below a specified length

TOPHRED33: Convert quality scores to Phred-33

TOPHRED64: Convert quality scores to Phred-64

It works with FASTQ (using phred + 33 or phred + 64 quality scores, depending on the Illumina pipeline used), either uncompressed or gzipp'ed FASTQ. Use of gzip format is determined based on the .gz extension.

For single-ended data, one input and one output file are specified, plus the processing steps. For paired-end data, two input files are specified, and 4 output files, 2 for the 'paired' output where both reads survived the processing, and 2 for corresponding 'unpaired' output where a read survived, but the partner read did not.

Emplacement sur le cluster:

/usr/local/Trimmomatic-0.33
Trinity RNA-Seq Assembly r20140413
r2012-10-05
2.0.6
2.1.1
2.2.0
Description:The Trinity RNA-Seq Assembly project provides software solutions targeted to the reconstruction of full-length transcripts and alternatively spliced isoforms from Illumina RNA-Seq data.

URL: http://sourceforge.net/projects/trinityrnaseq/

usage:

Trinity --help

emplacement:

/usr/local/trinityrnaseq-2.2.0
uclust 1.2.22q URL:

http://drive5.com/usearch/manual/uclust_algo.html

Description:

The UCLUST algorithm divides a set of sequences into clusters
unison 1.0 URL:

http://sourceforge.net/projects/unison-db/

Description:

Unison is a database and web interface of integrated, precomputed proteomic predictions for rapid feature-based mining, sequence analysis, and hypothesis generation. Click ''Home Page'' for more information and to access the public access version.
UrQt 1.0.18

URL:

https://lbbe.univ-lyon1.fr/Download-5172.html

Description:

UrQt : Unsupervised Quality trimming for NGS data

Emplacement: 

/usr/local/UrQt-1.0.18

Usage:

UrQt + arguments

usearch 6.0.307
7.0.1090
8.0.1623
URL:

http://www.drive5.com/usearch/

Description:

USEARCH is a unique high-throughput sequence analysis tool. It is a distributed as single binary program that implements a suite of algorithms comparable to BLASTN, BLASTP, BLASTX, BLASTCLUST, CD-HIT, CD-HIT-EST, CD-HIT-2D, CD-HIT-EST-2D, CD-HIT-OTU, CD-HIT-454, ChimeraSlayer, Perseus, RAPsearch and more.

Emplacement: /usr/local/bin
VariationHunter 0.4 URL:

http://variationhunter.sourceforge.net/Home

Description:

VariationHunter-CommonLaw is a tool for discovery of structural variation in one or more individuals simultaneously using high throughput technologies.

emplacement:

/usr/local/VariationHunter-0.04/clustering
vcftools 0.1.10
0.1.12b
0.1.13
URL:

http://vcftools.sourceforge.net/

Description:

Welcome to VCFtools - a program package designed for working with VCF files, such as those generated by the 1000 Genomes Project.

emplacement:

/usr/local/vcftools-0.1.13/bin

usage:

vcftools + arguments
velvet 1.2.08
1.2.10
URL:

http://www.ebi.ac.uk/~zerbino/velvet/

Description:

Velvet is a de novo genomic assembler specially designed for short read sequencing technologies, such as Solexa or 454,

Emplacement:

/usr/local/velvet_1.2.10

Commandes à lancer:

velveth ou velvetg
vsearch   URL:

https://github.com/torognes/vsearch

Description:

We have implemented a tool called VSEARCH which supports de novo and reference based chimera detection, clustering, full-length and prefix dereplication, reverse complementation, masking, all-vs-all pairwise global alignment, exact and global alignment searching, shuffling, subsampling and sorting. It also supports FASTQ file analysis, filtering and conversion.

VSEARCH stands for vectorized search, as the tool takes advantage of parallelism in the form of SIMD vectorization as well as multiple threads to perform accurate alignments at high speed. VSEARCH uses an optimal global aligner (full dynamic programming Needleman-Wunsch), in contrast to USEARCH which by default uses a heuristic seed and extend aligner. This usually results in more accurate alignments and overall improved sensitivity (recall) with VSEARCH, especially for alignments with gaps.

emplacement:

/usr/local/vsearch-1.1.3

usage:

vsearch + arguments
wgs 8.1 http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Main_Page

elera Assembler : scientific software for biological research. Celera Assembler is a de novo whole-genome shotgun (WGS) DNA sequence assembler. It reconstructs long sequences of genomic DNA from fragmentary data produced by whole-genome shotgun sequencing. Celera Assembler has enabled many advances in genomics, including the first whole genome shotgun sequence of a multi-cellular organism (Myers 2000) and the first diploid sequence of an individual human (Levy 2007). Celera Assembler was developed at Celera Genomics starting in 1999. It was released to SourceForge in 2004 as the wgs-assembler under the GNU General Public License. The pipeline revised for 454 data was named CABOG (Miller 2008).

Celera Assembler can use any combination of reads from:

dideoxy (Sanger) sequencing platforms such as the Applied Biosystems 3730 DNA Analyzer and 3730xl DNA Analyzer

pyrosequencing platforms such as the 454 Life Sciences Genome Sequencer FLX Titanium and GS Junior.

(Reads from the discontinued Genome Sequencer FLX before Titanium reagents and Genome Sequencer 20 are supported as well.)

sequencing by synthesis platforms such as the Illumina HiSeq 2000, Genome Analyzer IIx and Genome Analyzer IIe.

(Reads shorter than 75bp are not supported.)

single-molecule sequencing platforms such as the Pacific Biosciences PacBio RS (after correction using the PBcR pipeline.)

emplacement:

/usr/local/wgs-8.1/Linux-amd64/bin/
wise2 2.4.1 URL:

http://dendrome.ucdavis.edu/resources/tooldocs/wise2/doc_wise2.html

Description:

Wise2 is a package focused on comparisons of bio polymers, commonly DNA sequence and protein sequence. There are many other packages which do this, probably the best known being BLAST package (from NCBI) and the Fasta package (from Bill Pearson). There are other packages, such as the HMMER package (Sean Eddy) or SAM package (UC Santa Cruz) focused on hidden Markov models (HMMs) of bio polymers.

Wise2's particular forte is the comparison of DNA sequence at the level of its protein translation. This comparison allows the simultaneous prediction of say gene structure with homology based alignment. There is currently no other package that I know of that contains this type of algorithm with a full blown gene prediction model and a hidden Markov model of a protein domain.

Wise2 also contains other algorithms, such as the venerable Smith-Waterman algorithm, or more modern ones such as Stephen Altschul's generalised gap penalties, or even experimental ones developed in house, such as dba (see section 7.1). The development of these algorithms is due to the ease of developing such algorithms in the environment used by Wise2.

Wise2 has also been written with an eye for reuse and maintainability. Although it is a pure C package you can access its functionality directly in Perl. Parts of the package (or the entire package) can be used by other C or C++ programs without name space clashes as all externally linked variables have the unique identifier Wise2 prep ended. Java and CORBA ports are being considered - see 8 the API section

Finally Wise2, although implemented in C makes heavy use of the Dynamite code generating language
Wolfram-Mathematica 9 URL:

http://www.wolfram.com/mathematica/

Description:

Mathematica fournit un système unique, intégré, et en constante expansion couvrant toute l'étendue du calcul technique
XPCLR 1.0 10-30-2009 Description:

XP-CLR (Chen et al. 2010) uses allele frequency differentiation at linked loci to detect selective sweeps.

URL:

https://genetics.med.harvard.edu/reich/Reich_Lab/Software.html

Emplacement:

/usr/local/XPCLR

usage:

XPCLR + arguments

 

Copyright © 2019 IRD Bioinformatics. All Rights Reserved.
Joomla! is Free Software released under the GNU General Public License.