Original author(s) | Previous Programmers: Charles P. Kollar, Nandita Mukhopadhyay, Lee Almasy, Mark Schroeder, William P. Mulvihill. |
---|---|
Developer(s) | Daniel E. Weeks, Robert V. Baron, Justin R. Stickel. |
Initial release | 16 January 2000 |
Stable release | 5.0.1
/ 13 December 2018 |
Repository | |
Written in | C++ |
Operating system | Linux, Mac OS X, Microsoft Windows |
Type | Applied statistical genetics, Bioinformatics |
License | GNU General Public License version 3 |
Website |
watson |
Mega2 is a data manipulation software for applied statistical genetics. Mega is an acronym for Manipulation Environment for Genetic Analysis.
The software allows the applied statistical geneticist to convert one's data from several input formats to a large number output formats suitable for analysis by commonly used software packages. [1] [2] [3] [4] In a typical human genetics study, the analyst often needs to use a variety of different software programs to analyze the data, and these programs usually require that the data be formatted to their precise input specifications. Conversion of one's data into these multiple different formats can be tedious, time-consuming, and error-prone. Mega2, by providing validated conversion pipelines, can accelerate the analyses while reducing errors.
Mega2 produces a common intermediate data representation using SQLite3, which enables the data to be accessed by other programs and languages. In particular, the Mega2R R package converts the SQLite3 data into R data frames. Several R functions are provided that illustrate how data can be extracted from the data frames for common R analysis, such as SKAT and pedgene. The key is being able to efficiently extract genotypes corresponding to chosen subsets of markers so as to facilitate gene-based association testing by automating looping over genes in the genome. Another function converts to VCF format and another converts the data to GenABEL format. For more information about the Mega2R package, see here.
Mega2 has been used to facilitate genetic analyses of a wide variety of human traits, including hereditary dystonia, [5] Ehlers-Danlos syndrome, [6] multiple sclerosis, [7] and gliomas. [8] A list of PubMed Central articles citing Mega2 can be seen here.
Mega2, which focusses on data reformatting, should not be confused with the MEGA, Molecular Evolutionary Genetics Analysis program, which focuses on molecular evolution and phylogenetics.
Mega2 accepts input data in a variety of widely used file formats. These contain, at a minimum, data about the phenotypes, the marker genotypes, any family structures, and map positions of the markers.
Input format | Description | Links |
---|---|---|
LINKAGE [9] [10] [11] [12] | pre-Makeped or post-Makeped formats | Linkage User Guide (PDF), LINKAGE format |
Mega2 [1] [2] [3] [4] | simplified/augmented LINKAGE-format | Mega2 format |
PLINK [13] | ped format or binary bed format | PLINK documentation |
VCF or BCF [14] | Variant Call Format or Binary Variant Call Format | Variant Call Format (Wikipedia entry), BCF documentation |
IMPUTE2 [15] [16] | IMPUTE2 GEN and BGEN Formats | IMPUTE2 documentation, GEN format, BGEN format |
Mega2 supports conversion to the following output formats.
Output format | Links |
---|---|
ASPEX format | ASPEX |
Allegro format [17] | |
Beagle format [18] [19] | BEAGLE |
CRANEFOOT format [20] | CRANEFOOT |
Eigenstrat format [21] [22] | EIGENSOFT |
FBAT format [23] | FBAT |
GeneHunter format [24] | GeneHunter |
GeneHunter-Plus format [25] | GeneHunter-Plus |
IQLS/Idcoefs format [26] [27] | IQLS, Idcoefs |
Linkage format [9] [10] [11] [12] | Linkage User Guide (PDF), LINKAGE format |
Loki format [28] | Loki |
MaCH/minimac3 format [29] [30] | MaCH, minimac3 |
MLBQTL format [31] | MLB-QTL |
Mega2 annotated format [1] [2] [3] [4] | Mega2 format |
Mendel format [32] | Mendel |
Merlin format [33] | Merlin |
Merlin/SimWalk2-NPL format [33] [34] | Merlin SimWalk2 |
PANGAEA MORGAN format [35] [36] | MORGAN |
PAP format [37] | PAP |
PLINK format [13] (bed, lgen, or ped formats) | PLINK |
PREST format [38] [39] | PREST |
PSEQ format | PSEQ |
Pre-makeped LINKAGE format [9] [10] [11] [12] | Linkage User Guide (PDF), LINKAGE format |
ROADTRIPS format [40] | ROADTRIPS |
SAGE format | SAGE, openSAGE |
SHAPEIT format [41] [42] [43] [44] [45] | SHAPEIT |
SIMULATE format [46] | SIMULATE |
SLINK format [47] [48] | FASTSLINK |
SOLAR format [49] [50] | SOLAR |
SPLINK format [51] | SPLINK |
SUP format [48] [52] | SUP |
SimWalk2 format [34] | SimWalk2 |
Structure format [53] [54] [55] | Structure |
VCF format [14] | Variant Call Format (Wikipedia entry) |
Vintage Mendel format [32] [56] | Vintage Mendel |
Vitesse format [57] | Vitesse |
The Mega2 documentation is available here in HTML format, and here in PDF format.
Original author(s) | Previous Programmers: Charles P. Kollar, Nandita Mukhopadhyay, Lee Almasy, Mark Schroeder, William P. Mulvihill. |
---|---|
Developer(s) | Daniel E. Weeks, Robert V. Baron, Justin R. Stickel. |
Initial release | 16 January 2000 |
Stable release | 5.0.1
/ 13 December 2018 |
Repository | |
Written in | C++ |
Operating system | Linux, Mac OS X, Microsoft Windows |
Type | Applied statistical genetics, Bioinformatics |
License | GNU General Public License version 3 |
Website |
watson |
Mega2 is a data manipulation software for applied statistical genetics. Mega is an acronym for Manipulation Environment for Genetic Analysis.
The software allows the applied statistical geneticist to convert one's data from several input formats to a large number output formats suitable for analysis by commonly used software packages. [1] [2] [3] [4] In a typical human genetics study, the analyst often needs to use a variety of different software programs to analyze the data, and these programs usually require that the data be formatted to their precise input specifications. Conversion of one's data into these multiple different formats can be tedious, time-consuming, and error-prone. Mega2, by providing validated conversion pipelines, can accelerate the analyses while reducing errors.
Mega2 produces a common intermediate data representation using SQLite3, which enables the data to be accessed by other programs and languages. In particular, the Mega2R R package converts the SQLite3 data into R data frames. Several R functions are provided that illustrate how data can be extracted from the data frames for common R analysis, such as SKAT and pedgene. The key is being able to efficiently extract genotypes corresponding to chosen subsets of markers so as to facilitate gene-based association testing by automating looping over genes in the genome. Another function converts to VCF format and another converts the data to GenABEL format. For more information about the Mega2R package, see here.
Mega2 has been used to facilitate genetic analyses of a wide variety of human traits, including hereditary dystonia, [5] Ehlers-Danlos syndrome, [6] multiple sclerosis, [7] and gliomas. [8] A list of PubMed Central articles citing Mega2 can be seen here.
Mega2, which focusses on data reformatting, should not be confused with the MEGA, Molecular Evolutionary Genetics Analysis program, which focuses on molecular evolution and phylogenetics.
Mega2 accepts input data in a variety of widely used file formats. These contain, at a minimum, data about the phenotypes, the marker genotypes, any family structures, and map positions of the markers.
Input format | Description | Links |
---|---|---|
LINKAGE [9] [10] [11] [12] | pre-Makeped or post-Makeped formats | Linkage User Guide (PDF), LINKAGE format |
Mega2 [1] [2] [3] [4] | simplified/augmented LINKAGE-format | Mega2 format |
PLINK [13] | ped format or binary bed format | PLINK documentation |
VCF or BCF [14] | Variant Call Format or Binary Variant Call Format | Variant Call Format (Wikipedia entry), BCF documentation |
IMPUTE2 [15] [16] | IMPUTE2 GEN and BGEN Formats | IMPUTE2 documentation, GEN format, BGEN format |
Mega2 supports conversion to the following output formats.
Output format | Links |
---|---|
ASPEX format | ASPEX |
Allegro format [17] | |
Beagle format [18] [19] | BEAGLE |
CRANEFOOT format [20] | CRANEFOOT |
Eigenstrat format [21] [22] | EIGENSOFT |
FBAT format [23] | FBAT |
GeneHunter format [24] | GeneHunter |
GeneHunter-Plus format [25] | GeneHunter-Plus |
IQLS/Idcoefs format [26] [27] | IQLS, Idcoefs |
Linkage format [9] [10] [11] [12] | Linkage User Guide (PDF), LINKAGE format |
Loki format [28] | Loki |
MaCH/minimac3 format [29] [30] | MaCH, minimac3 |
MLBQTL format [31] | MLB-QTL |
Mega2 annotated format [1] [2] [3] [4] | Mega2 format |
Mendel format [32] | Mendel |
Merlin format [33] | Merlin |
Merlin/SimWalk2-NPL format [33] [34] | Merlin SimWalk2 |
PANGAEA MORGAN format [35] [36] | MORGAN |
PAP format [37] | PAP |
PLINK format [13] (bed, lgen, or ped formats) | PLINK |
PREST format [38] [39] | PREST |
PSEQ format | PSEQ |
Pre-makeped LINKAGE format [9] [10] [11] [12] | Linkage User Guide (PDF), LINKAGE format |
ROADTRIPS format [40] | ROADTRIPS |
SAGE format | SAGE, openSAGE |
SHAPEIT format [41] [42] [43] [44] [45] | SHAPEIT |
SIMULATE format [46] | SIMULATE |
SLINK format [47] [48] | FASTSLINK |
SOLAR format [49] [50] | SOLAR |
SPLINK format [51] | SPLINK |
SUP format [48] [52] | SUP |
SimWalk2 format [34] | SimWalk2 |
Structure format [53] [54] [55] | Structure |
VCF format [14] | Variant Call Format (Wikipedia entry) |
Vintage Mendel format [32] [56] | Vintage Mendel |
Vitesse format [57] | Vitesse |
The Mega2 documentation is available here in HTML format, and here in PDF format.