From Wikipedia, the free encyclopedia
T-Coffee
Developer(s)Cédric Notredame, Centro de Regulacio Genomica (CRG) - Barcelona
Stable release
13.45.0.4846264 / 15 October 2020; 3 years ago (2020-10-15)
Preview release
13.45.33.7d7e789 / 23 December 2020; 3 years ago (2020-12-23)
Repository
Operating system UNIX, Linux, MS-Windows, Mac OS X
TypeBioinformatics tool
Licence GPL
Website www.tcoffee.org

T-Coffee (Tree-based Consistency Objective Function for Alignment Evaluation) is a multiple sequence alignment software using a progressive approach. [1] It generates a library of pairwise alignments to guide the multiple sequence alignment. It can also combine multiple sequences alignments obtained previously and in the latest versions can use structural information from PDB files (3D-Coffee). It has advanced features to evaluate the quality of the alignments and some capacity for identifying occurrence of motifs (Mocca). It produces alignment in the aln format ( Clustal) by default, but can also produce PIR, MSF, and FASTA format. The most common input formats are supported ( FASTA, PIR).

Algorithm

T-Coffee algorithm consist of two main features, the first by utilizing heterogeneous data sources it is able to provide simple and flexible means of generating multiple alignments. T-coffee can compute multiple alignments using a library that was generated using a mixture of local and global pair-wise alignments. [1]

The second is the "Optimization method", used to find the multiple alignment that best fits the pair-wise alignments in the input library using a progressive strategy that can be compared to the one used in ClustalW. The Optimization method has the advantage of being fast and robust. The information in the library is used to carry out progressive alignments and facilitates the duty of considering the alignments between all the pairs while carrying out every step of the progressive multiple alignments. [1]

Generating a primary library of alignments

The library incorporates a set of pair-wise alignments between all of the sequences to be aligned, the alignments are not required to be consistent. Inside the library, there can be found information on each of the N(N-1)/2 in where N is the number of sequences. Two alignment sources are used for each pair of sequences, one of them classified as local, and the other as global. [1]

Global alignments are constructed using ClustalW on the sequences, two at a time, and sed to give one full-length alignment between each pair of sequences. The local alignments are the ten top-scoring non-intersecting local alignments gathered using the Lalign program of the FASTA package. [1]

Each alignment is represented in the library as a list of pair-wise residue matches, each pair is a constraint; however, some constraints are more relevant than others. the importance of each constraint depends on which are more likely to be correct. While computing the multiple alignments, priority is given to the most reliable residue pairs by utilizing a weighting scheme. [1]

Combination of the libraries

Efficient combination of local and global alignment information is an important factor of T-Coffee. By using the ClustalW and Lalign primary libraries it can be achieved with a process of addition. Any duplicated pair between both libraries is merged into a single entry with the weight of the total sum of both pairs. Else, a new entry is created for the pair. Pairs with a weight of zero will not be represented. [1] For each pair of aligned residues in the library, it is possible to assign a weight that belongs to the degree to which those residues align consistently. This is called Library extension.

Comparisons with other alignment software

While the default output is a Clustal-like format, it is sufficiently different from the output of ClustalW/X that many programs supporting Clustal format cannot read it; fortunately ClustalX can import T-Coffee output so the simplest fix for this issue is usually to import T-Coffee's output into ClustalX and then re-export. Another possibility is to request the strict Clustalw output format with the option "-output=clustalw_aln".

An important specificity of T-Coffee is its ability to combine different methods and different data types. In its latest version, T-Coffee can be used to combine protein sequences and structures, RNA sequences and structures. It can also run and combine the output of the most common sequence and structure alignment packages.

T-Coffee comes along with a sophisticated sequence reformatting utility named seq_reformat. An extensive documentation is available online.

Variations

  • M-Coffee: a special mode of T-Coffee that makes it possible to combine the output of the most common multiple sequence alignment packages (Muscle, ClustalW, Mafft, ProbCons, etc.). The resulting alignments are slightly better than the individual one, but most importantly the program indicates the alignment regions where the various packages agree upon. Regions of high agreement are usually well aligned. [2]
  • Expresso and 3D-Coffee: these are special modes of T-Coffee making it possible to combine sequence and structures in an alignment. The structure based alignments can be carried out using the most common structural aligners such as TMalign, Mustang, and sap. [3] [4] [5] [6]
  • R-Coffee: a special mode of T-Coffee making it possible to align RNA sequences while using secondary structure information. [7] [8]
  • PSI-Coffee: aligns distantly related proteins using homology extension (slow and accurate) [9] [10]
  • TM-Coffee: aligns transmembrane proteins using homology extension [11]
  • Pro-Coffee: aligns homologous promoter regions [12]
  • Accurate: automatically combine the most accurate modes for DNA, RNA and proteins (experimental). [13]
  • Combine: combines two (or more) multiple sequence alignments into a single one. [1] [9]

Evaluation

(Transitive Consistency Score) is an extended version of the T-Coffee scoring scheme. [14] It uses T-Coffee libraries of pairwise alignments to evaluate any third party MSA. Pairwise projections can be produced using fast or slow methods, thus allowing a trade-off between speed and accuracy. TCS has been shown to lead to significantly better estimates of structural accuracy and more accurate phylogenetic trees against Heads-or-Tails, GUIDANCE, Gblocks, and trimAl. [15]

See also

References

  1. ^ a b c d e f g h Notredame C, Higgins DG, Heringa J (2000-09-08). "T-Coffee: A novel method for fast and accurate multiple sequence alignment". J Mol Biol. 302 (1): 205–217. doi: 10.1006/jmbi.2000.4042. PMID  10964570. S2CID  10189971.{{ cite journal}}: CS1 maint: multiple names: authors list ( link)
  2. ^ Wallace, Iain M.; O'Sullivan, Orla; Higgins, Desmond G.; Notredame, Cedric (2006). "M-Coffee: combining multiple sequence alignment methods with T-Coffee". Nucleic Acids Research. 34 (6): 1692–1699. doi: 10.1093/nar/gkl091. ISSN  1362-4962. PMC  1410914. PMID  16556910.
  3. ^ Armougom, Fabrice; Moretti, Sébastien; Poirot, Olivier; Audic, Stéphane; Dumas, Pierre; Schaeli, Basile; Keduas, Vladimir; Notredame, Cedric (2006-07-01). "Expresso: automatic incorporation of structural information in multiple sequence alignments using 3D-Coffee". Nucleic Acids Research. 34 (Web Server issue): W604–608. doi: 10.1093/nar/gkl092. ISSN  1362-4962. PMC  1538866. PMID  16845081.
  4. ^ Zhang, Yang; Skolnick, Jeffrey (2005). "TM-align: a protein structure alignment algorithm based on the TM-score". Nucleic Acids Research. 33 (7): 2302–2309. doi: 10.1093/nar/gki524. ISSN  1362-4962. PMC  1084323. PMID  15849316.
  5. ^ Konagurthu, Arun S.; Whisstock, James C.; Stuckey, Peter J.; Lesk, Arthur M. (2006-08-15). "MUSTANG: a multiple structural alignment algorithm". Proteins. 64 (3): 559–574. doi: 10.1002/prot.20921. ISSN  1097-0134. PMID  16736488. S2CID  14074658.
  6. ^ Sun, Zheng; Tian, Weidong (2012). "SAP--a sequence mapping and analyzing program for long sequence reads alignment and accurate variants discovery". PLOS ONE. 7 (8): e42887. Bibcode: 2012PLoSO...742887S. doi: 10.1371/journal.pone.0042887. ISSN  1932-6203. PMC  3413671. PMID  22880129.
  7. ^ Wilm, Andreas; Higgins, Desmond G.; Notredame, Cédric (May 2008). "R-Coffee: a method for multiple alignment of non-coding RNA". Nucleic Acids Research. 36 (9): e52. doi: 10.1093/nar/gkn174. ISSN  1362-4962. PMC  2396437. PMID  18420654.
  8. ^ Moretti, Sébastien; Wilm, Andreas; Higgins, Desmond G.; Xenarios, Ioannis; Notredame, Cédric (2008-07-01). "R-Coffee: a web server for accurately aligning noncoding RNA sequences". Nucleic Acids Research. 36 (Web Server issue): W10–13. doi: 10.1093/nar/gkn278. ISSN  1362-4962. PMC  2447777. PMID  18483080.
  9. ^ a b Di Tommaso P, Moretti S, Xenarios I, Orobitg M, Montanyola A, Chang JM, Taly JF, Notredame C (Jul 2011). "T-Coffee: a web server for the multiple sequence alignment of protein and RNA sequences using structural information and homology extension". Nucleic Acids Res. 39 (Web Server issue): W13–7. doi: 10.1093/nar/gkr245. PMC  3125728. PMID  21558174.
  10. ^ Kemena C, Notredame C (2009-10-01). "Upcoming challenges for multiple sequence alignment methods in the high-throughput era". Bioinformatics. 25 (19): 2455–65. doi: 10.1093/bioinformatics/btp452. PMC  2752613. PMID  19648142.
  11. ^ Chang JM, Di Tommaso P, Taly JF, Notredame C (2012-03-28). "Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee". BMC Bioinformatics. 13: S1. doi: 10.1186/1471-2105-13-S4-S1. PMC  3303701. PMID  22536955.
  12. ^ Erb I, González-Vallinas JR, Bussotti G, Blanco E, Eyras E, Notredame C (Apr 2012). "Use of ChIP-Seq data for the design of a multiple promoter-alignment method". Nucleic Acids Res. 40 (7): e52. doi: 10.1093/nar/gkr1292. PMC  3326335. PMID  22230796.
  13. ^ "T-Coffee Server". tcoffee.crg.eu. Retrieved 2023-12-26.
  14. ^ Chang, JM; Di Tommaso, P; Lefort, V; Gascuel, O; Notredame, C (1 July 2015). "TCS: a web server for multiple sequence alignment evaluation and phylogenetic reconstruction". Nucleic Acids Research. 43 (W1): W3-6. doi: 10.1093/nar/gkv310. PMC  4489230. PMID  25855806.
  15. ^ Chang, JM; Di Tommaso, P; Notredame, C (Jun 2014). "TCS: A New Multiple Sequence Alignment Reliability Measure to Estimate Alignment Accuracy and Improve Phylogenetic Tree Reconstruction". Molecular Biology and Evolution. 31 (6): 1625–37. doi: 10.1093/molbev/msu117. PMID  24694831.
From Wikipedia, the free encyclopedia
T-Coffee
Developer(s)Cédric Notredame, Centro de Regulacio Genomica (CRG) - Barcelona
Stable release
13.45.0.4846264 / 15 October 2020; 3 years ago (2020-10-15)
Preview release
13.45.33.7d7e789 / 23 December 2020; 3 years ago (2020-12-23)
Repository
Operating system UNIX, Linux, MS-Windows, Mac OS X
TypeBioinformatics tool
Licence GPL
Website www.tcoffee.org

T-Coffee (Tree-based Consistency Objective Function for Alignment Evaluation) is a multiple sequence alignment software using a progressive approach. [1] It generates a library of pairwise alignments to guide the multiple sequence alignment. It can also combine multiple sequences alignments obtained previously and in the latest versions can use structural information from PDB files (3D-Coffee). It has advanced features to evaluate the quality of the alignments and some capacity for identifying occurrence of motifs (Mocca). It produces alignment in the aln format ( Clustal) by default, but can also produce PIR, MSF, and FASTA format. The most common input formats are supported ( FASTA, PIR).

Algorithm

T-Coffee algorithm consist of two main features, the first by utilizing heterogeneous data sources it is able to provide simple and flexible means of generating multiple alignments. T-coffee can compute multiple alignments using a library that was generated using a mixture of local and global pair-wise alignments. [1]

The second is the "Optimization method", used to find the multiple alignment that best fits the pair-wise alignments in the input library using a progressive strategy that can be compared to the one used in ClustalW. The Optimization method has the advantage of being fast and robust. The information in the library is used to carry out progressive alignments and facilitates the duty of considering the alignments between all the pairs while carrying out every step of the progressive multiple alignments. [1]

Generating a primary library of alignments

The library incorporates a set of pair-wise alignments between all of the sequences to be aligned, the alignments are not required to be consistent. Inside the library, there can be found information on each of the N(N-1)/2 in where N is the number of sequences. Two alignment sources are used for each pair of sequences, one of them classified as local, and the other as global. [1]

Global alignments are constructed using ClustalW on the sequences, two at a time, and sed to give one full-length alignment between each pair of sequences. The local alignments are the ten top-scoring non-intersecting local alignments gathered using the Lalign program of the FASTA package. [1]

Each alignment is represented in the library as a list of pair-wise residue matches, each pair is a constraint; however, some constraints are more relevant than others. the importance of each constraint depends on which are more likely to be correct. While computing the multiple alignments, priority is given to the most reliable residue pairs by utilizing a weighting scheme. [1]

Combination of the libraries

Efficient combination of local and global alignment information is an important factor of T-Coffee. By using the ClustalW and Lalign primary libraries it can be achieved with a process of addition. Any duplicated pair between both libraries is merged into a single entry with the weight of the total sum of both pairs. Else, a new entry is created for the pair. Pairs with a weight of zero will not be represented. [1] For each pair of aligned residues in the library, it is possible to assign a weight that belongs to the degree to which those residues align consistently. This is called Library extension.

Comparisons with other alignment software

While the default output is a Clustal-like format, it is sufficiently different from the output of ClustalW/X that many programs supporting Clustal format cannot read it; fortunately ClustalX can import T-Coffee output so the simplest fix for this issue is usually to import T-Coffee's output into ClustalX and then re-export. Another possibility is to request the strict Clustalw output format with the option "-output=clustalw_aln".

An important specificity of T-Coffee is its ability to combine different methods and different data types. In its latest version, T-Coffee can be used to combine protein sequences and structures, RNA sequences and structures. It can also run and combine the output of the most common sequence and structure alignment packages.

T-Coffee comes along with a sophisticated sequence reformatting utility named seq_reformat. An extensive documentation is available online.

Variations

  • M-Coffee: a special mode of T-Coffee that makes it possible to combine the output of the most common multiple sequence alignment packages (Muscle, ClustalW, Mafft, ProbCons, etc.). The resulting alignments are slightly better than the individual one, but most importantly the program indicates the alignment regions where the various packages agree upon. Regions of high agreement are usually well aligned. [2]
  • Expresso and 3D-Coffee: these are special modes of T-Coffee making it possible to combine sequence and structures in an alignment. The structure based alignments can be carried out using the most common structural aligners such as TMalign, Mustang, and sap. [3] [4] [5] [6]
  • R-Coffee: a special mode of T-Coffee making it possible to align RNA sequences while using secondary structure information. [7] [8]
  • PSI-Coffee: aligns distantly related proteins using homology extension (slow and accurate) [9] [10]
  • TM-Coffee: aligns transmembrane proteins using homology extension [11]
  • Pro-Coffee: aligns homologous promoter regions [12]
  • Accurate: automatically combine the most accurate modes for DNA, RNA and proteins (experimental). [13]
  • Combine: combines two (or more) multiple sequence alignments into a single one. [1] [9]

Evaluation

(Transitive Consistency Score) is an extended version of the T-Coffee scoring scheme. [14] It uses T-Coffee libraries of pairwise alignments to evaluate any third party MSA. Pairwise projections can be produced using fast or slow methods, thus allowing a trade-off between speed and accuracy. TCS has been shown to lead to significantly better estimates of structural accuracy and more accurate phylogenetic trees against Heads-or-Tails, GUIDANCE, Gblocks, and trimAl. [15]

See also

References

  1. ^ a b c d e f g h Notredame C, Higgins DG, Heringa J (2000-09-08). "T-Coffee: A novel method for fast and accurate multiple sequence alignment". J Mol Biol. 302 (1): 205–217. doi: 10.1006/jmbi.2000.4042. PMID  10964570. S2CID  10189971.{{ cite journal}}: CS1 maint: multiple names: authors list ( link)
  2. ^ Wallace, Iain M.; O'Sullivan, Orla; Higgins, Desmond G.; Notredame, Cedric (2006). "M-Coffee: combining multiple sequence alignment methods with T-Coffee". Nucleic Acids Research. 34 (6): 1692–1699. doi: 10.1093/nar/gkl091. ISSN  1362-4962. PMC  1410914. PMID  16556910.
  3. ^ Armougom, Fabrice; Moretti, Sébastien; Poirot, Olivier; Audic, Stéphane; Dumas, Pierre; Schaeli, Basile; Keduas, Vladimir; Notredame, Cedric (2006-07-01). "Expresso: automatic incorporation of structural information in multiple sequence alignments using 3D-Coffee". Nucleic Acids Research. 34 (Web Server issue): W604–608. doi: 10.1093/nar/gkl092. ISSN  1362-4962. PMC  1538866. PMID  16845081.
  4. ^ Zhang, Yang; Skolnick, Jeffrey (2005). "TM-align: a protein structure alignment algorithm based on the TM-score". Nucleic Acids Research. 33 (7): 2302–2309. doi: 10.1093/nar/gki524. ISSN  1362-4962. PMC  1084323. PMID  15849316.
  5. ^ Konagurthu, Arun S.; Whisstock, James C.; Stuckey, Peter J.; Lesk, Arthur M. (2006-08-15). "MUSTANG: a multiple structural alignment algorithm". Proteins. 64 (3): 559–574. doi: 10.1002/prot.20921. ISSN  1097-0134. PMID  16736488. S2CID  14074658.
  6. ^ Sun, Zheng; Tian, Weidong (2012). "SAP--a sequence mapping and analyzing program for long sequence reads alignment and accurate variants discovery". PLOS ONE. 7 (8): e42887. Bibcode: 2012PLoSO...742887S. doi: 10.1371/journal.pone.0042887. ISSN  1932-6203. PMC  3413671. PMID  22880129.
  7. ^ Wilm, Andreas; Higgins, Desmond G.; Notredame, Cédric (May 2008). "R-Coffee: a method for multiple alignment of non-coding RNA". Nucleic Acids Research. 36 (9): e52. doi: 10.1093/nar/gkn174. ISSN  1362-4962. PMC  2396437. PMID  18420654.
  8. ^ Moretti, Sébastien; Wilm, Andreas; Higgins, Desmond G.; Xenarios, Ioannis; Notredame, Cédric (2008-07-01). "R-Coffee: a web server for accurately aligning noncoding RNA sequences". Nucleic Acids Research. 36 (Web Server issue): W10–13. doi: 10.1093/nar/gkn278. ISSN  1362-4962. PMC  2447777. PMID  18483080.
  9. ^ a b Di Tommaso P, Moretti S, Xenarios I, Orobitg M, Montanyola A, Chang JM, Taly JF, Notredame C (Jul 2011). "T-Coffee: a web server for the multiple sequence alignment of protein and RNA sequences using structural information and homology extension". Nucleic Acids Res. 39 (Web Server issue): W13–7. doi: 10.1093/nar/gkr245. PMC  3125728. PMID  21558174.
  10. ^ Kemena C, Notredame C (2009-10-01). "Upcoming challenges for multiple sequence alignment methods in the high-throughput era". Bioinformatics. 25 (19): 2455–65. doi: 10.1093/bioinformatics/btp452. PMC  2752613. PMID  19648142.
  11. ^ Chang JM, Di Tommaso P, Taly JF, Notredame C (2012-03-28). "Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee". BMC Bioinformatics. 13: S1. doi: 10.1186/1471-2105-13-S4-S1. PMC  3303701. PMID  22536955.
  12. ^ Erb I, González-Vallinas JR, Bussotti G, Blanco E, Eyras E, Notredame C (Apr 2012). "Use of ChIP-Seq data for the design of a multiple promoter-alignment method". Nucleic Acids Res. 40 (7): e52. doi: 10.1093/nar/gkr1292. PMC  3326335. PMID  22230796.
  13. ^ "T-Coffee Server". tcoffee.crg.eu. Retrieved 2023-12-26.
  14. ^ Chang, JM; Di Tommaso, P; Lefort, V; Gascuel, O; Notredame, C (1 July 2015). "TCS: a web server for multiple sequence alignment evaluation and phylogenetic reconstruction". Nucleic Acids Research. 43 (W1): W3-6. doi: 10.1093/nar/gkv310. PMC  4489230. PMID  25855806.
  15. ^ Chang, JM; Di Tommaso, P; Notredame, C (Jun 2014). "TCS: A New Multiple Sequence Alignment Reliability Measure to Estimate Alignment Accuracy and Improve Phylogenetic Tree Reconstruction". Molecular Biology and Evolution. 31 (6): 1625–37. doi: 10.1093/molbev/msu117. PMID  24694831.

Videos

Youtube | Vimeo | Bing

Websites

Google | Yahoo | Bing

Encyclopedia

Google | Yahoo | Bing

Facebook