From Wikipedia, the free encyclopedia
(Redirected from Draft:Nextflow)
Nextflow
Original author(s)Paolo Di Tommaso
Developer(s)Seqera Labs, Centre for Genomic Regulation
Initial release April 9, 2013; 11 years ago (2013-04-09)
Stable release
v23.10.1 / January 12, 2024; 6 months ago (2024-01-12)
Preview release
v24.02.0-edge / March 9, 2024; 4 months ago (2024-03-09)
Repository https://github.com/nextflow-io/nextflow
Written in Groovy, Java
Operating system Linux, macOS, WSL
Type Scientific workflow system, Dataflow programming, Big data
License Apache License 2.0
Website nextflow.io

Nextflow is a scientific workflow system predominantly used for bioinformatic data analyses. It imposes standards on how to programmatically author a sequence of dependent compute steps and enables their execution on various local and cloud resources. [1] [2]

Purpose

Many scientific data analyses require a significant amount of sequential processing steps. Custom scripts may suffice when developing new methods or infrequently running a particular analysis, but scale poorly to complex task successions or many samples. [3] [4] [5]

Scientific workflow systems like Nextflow allow formalizing an analysis as a data analysis pipeline. Pipelines, also known as workflows, are instructions that specify order and conditions of computing steps to be performed. They are carried out by special purpose programs, so-called workflow executors, which ensure predictable and reproducible behavior in various compute environments. [3] [6] [7] [8]

Workflow systems also provide built-in solutions to common challenges of workflow development, such as the application to multiple samples, the validation of input and intermediate results, conditional execution of steps, error handling, and report generation. Advanced features of workflow systems may also include scheduling capabilities, graphical user interfaces for monitoring workflow executions, and the management of dependencies by containerizing the whole workflow or its components. [9] [10]

Typically, scientific workflow systems initially present a steep learning challenge as all their features and complexities are added on top of and in addition to the actual analysis. However, the standards and abstraction imposed by workflow systems ultimately improve the traceability of analysis steps, which is particularly relevant when collaborating on pipeline development, as is customary in scientific settings. [11]

Characteristics

Specification of workflows

In Nextflow, pipelines are constructed from individual processes that correspond to computational tasks. Each process is set up with input requirements and output declarations. Rather than running in a fixed succession, the execution of a process commences when all its input requirements are met. By specifying the output of one process as the input of another step, a logical and sequential connection between processes is created. [12]

This reactive implementation of processes is a characteristic design pattern of Nextflow and also known as functional dataflow model. [13]

Processes and whole workflows are programmed in a domain-specific language (DSL) that is provided by Nextflow and based on Apache Groovy. [14] While Nextflow's DSL is used to declare the workflow logic, developers can use their scripting language of choice within a process and mix multiple languages in a workflow. Porting existing scripts and workflows to Nextflow is therefore possible. Supported scripting languages include bash, csh, ksh, Python, Ruby, and R. Any scripting language that uses the standard Unix shebang declaration (#!/bin/bash) is supported in Nextflow.

An exemplary workflow consisting of only one process is shown below:

process hello_world {
    input:
    val greeting

    output:
    path "${greeting}.txt"

    script:
    """
    echo "${greeting} World!" > ${greeting}.txt
    """
}

workflow {
    Channel.of("Hello", "Ciao", "Hola", "Bonjour") | hello_world
}

To facilitate straightforward collaboration on workflows, Nextflow has native support for source-code management systems and DevOps-platforms including GitHub, GitLab, and others. [15]

Execution of workflows

Workflows written in Nextflow's DSL can be deployed and run across diverse computing environments without modifications to the pipeline code.

To enable portability, Nextflow ships with dedicated executors for a variety of platforms [16] including those of major cloud providers. Because Nextflow decouples individual process steps, it can optionally be configured to spread execution across multiple computing platforms. It supports the following environments for pipeline execution:

  • Local – the default executor. Nextflow pipelines run on Linux or Mac OS and execution occurs on the computer where the pipeline is launched.
  • HPC workload managers – Slurm, SGE, LSF, Moab, PBS Pro, PBS/Torque, HTCondor, NQSII, OAR
  • Kubernetes – local or cloud-based Kubernetes implementations (GKE, EKS, or AKS)
  • Cloud batch services – AWS Batch, [17] Azure Batch [18]
  • Other environments – Apache Ignite, Google Life Sciences [19]

Containers for portability across computing environments

A fundamental concept of Nextflow is its tight integration with software containers. Whole workflows and, in later versions, also single processes can harness containers to allow their execution across various compute environments without tedious installation and configuration routines. [3]

This design choice was strongly influenced by Solomon Hyke's talk at dotScale in 2013, [20] which had a significant impact on Nextflow's principal developer, Paolo Di Tommaso. [21]

Container frameworks supported by Nextflow include Docker, Singularity, Charliecloud, Podman, and Shifter. [22] Those type of containers can be utilized in a workflow and are automatically retrieved from external repositories when the pipeline is executed. At Nextflow Summit 2022, it was unveiled that future versions of Nextflow will support a dedicated container provisioning service for an improved integration of customized containers into workflows. [23]

Developmental history

Nextflow was originally developed at the Centre for Genomic Regulation in Spain and released as an open-source project on GitHub in July 2013. [24] In October 2018, the project license for Nextflow was changed from GPLv3 to Apache 2.0. [25]

In July 2018, Seqera Labs was launched as a spin-off from the Centre for Genomic Regulation. [21] The company employs many of Nextflow's core developers and maintainers and provides commercial services [26] and consulting with a focus on Nextflow.

In July 2020, a major extension and revision of Nextflow's domain-specific language was introduced to allow for sub-workflows and additional improvements. [27] In the same year, monthly downloads of Nextflow sat at approximately 55,000 per month. [21]

Adoption and reception

The nf-core community

In addition to the Centre for Genomic Regulation, [28] other sequencing facilities have adopted Nextflow as their preferred Scientific workflow system, among them the Quantitative Biology Center in Tübingen, the Francis Crick Institute, A*STAR Genome Institute of Singapore, and the Swedish National Genomics Infrastructure. [21]

Efforts to share, harmonize, and curate the bioinformatic pipelines used by those facilities [29] [30] [31] [32] eventually turned into the nf-core project. [33] Spearheaded by Phil Ewels from the Swedish National Genomics Infrastructure [34] [35] the focus of the nf-core project is to ensure that pipelines are reproducible and portable across different hardware, operating systems, and software versions. In July 2020, Nextflow and nf-core were awarded a grant from the Chan Zuckerberg Initiative, recognizing its role as a vital open-source software. [36]

As of 2022, the nf-core organization hosts 73 Nextflow pipelines for the biosciences and more than 700 process modules. Uniting more than 500 developers and scientists, it is the largest collaborative effort and community to develop bioinformatic data analysis pipelines. [37]

By domain and research subject

Nextflow is preferred in sequencing data processing and genomic data analysis. Over the last five years, numerous pipelines for many different applications and analyses in the field of genomics have been published.

A notable use case in this regard was for pathogen surveillance during the COVID-19 pandemic. [38] Monitoring the emergence of new virus variants and retracing its global spread required the swift and highly automatized, yet accurate, processing of raw data, variant analysis, and the designation of lineages, which was enabled by pipelines written in Nextflow. [39] [40] [41] [42] [43] [44] [45]

Nextflow also plays an important role for the non-profit plasmid repository Addgene, which uses it to corroborate the integrity of all deposited plasmids. [46]

Apart from genomics, Nextflow is gaining popularity in other domains of biomedical data processing, which also require the application of complex workflows on large amounts of primary data: Drug screening, [47] Diffusion magnetic resonance imaging (dMRI) in radiology, [48] and mass spectrometry data processing, [49] [50] [51] the latter with a particular focus on proteomics [52] [53] [54] [55] [56] [57] [58] [59]

References

  1. ^ Strozzi, Francesco; Janssen, Roel; Wurmus, Ricardo; Crusoe, Michael R.; Githinji, George; Di Tommaso, Paolo; Belhachemi, Dominique; Möller, Steffen; Smant, Geert; De Ligt, Joep; Prins, Pjotr (2019). "Scalable Workflows and Reproducible Data Analysis for Genomics". Evolutionary Genomics. Methods in Molecular Biology. Vol. 1910. pp. 723–745. doi: 10.1007/978-1-4939-9074-0_24. ISBN  978-1-4939-9073-3. PMC  7613310. PMID  31278683.
  2. ^ Gao, Mingxuan; Ling, Mingyi; Tang, Xinwei; Wang, Shun; Xiao, Xu; Qiao, Ying; Yang, Wenxian; Yu, Rongshan (2021). "Comparison of high-throughput single-cell RNA sequencing data processing pipelines". Briefings in Bioinformatics. 22 (3). doi: 10.1093/bib/bbaa116. PMID  34020539.
  3. ^ a b c Wratten, Laura; Wilm, Andreas; Göke, Jonathan (October 2021). "Reproducible, scalable, and shareable analysis pipelines with bioinformatics workflow managers". Nature Methods. 18 (10): 1161–1168. doi: 10.1038/s41592-021-01254-9. PMID  34556866. S2CID  237616424.
  4. ^ Terrón-Camero, Laura C.; Gordillo-González, Fernando; Salas-Espejo, Eduardo; Andrés-León, Eduardo (2022). "Comparison of Metagenomics and Metatranscriptomics Tools: A Guide to Making the Right Choice". Genes. 13 (12): 2280. doi: 10.3390/genes13122280. PMC  9777648. PMID  36553546.
  5. ^ Federico, Anthony; Karagiannis, Tanya; Karri, Kritika; Kishore, Dileep; Koga, Yusuke; Campbell, Joshua D.; Monti, Stefano (2019). "Pipeliner: A Nextflow-Based Framework for the Definition of Sequencing Data Processing Pipelines". Frontiers in Genetics. 10: 614. doi: 10.3389/fgene.2019.00614. PMC  6609566. PMID  31316552.
  6. ^ Kolpakov, Fedor; Akberdin, Ilya; Kiselev, Ilya; Kolmykov, Semyon; Kondrakhin, Yury; Kulyashov, Mikhail; Kutumova, Elena; Pintus, Sergey; Ryabova, Anna; Sharipov, Ruslan; Yevshin, Ivan; Zhatchenko, Sergey; Kel, Alexander (2022). "BioUML—towards a universal research platform". Nucleic Acids Research. 50 (W1): W124–W131. doi: 10.1093/nar/gkac286. PMC  9252820. PMID  35536253.
  7. ^ Yukselen, Onur; Turkyilmaz, Osman; Ozturk, Ahmet Rasit; Garber, Manuel; Kucukural, Alper (2020). "Dolphin Next: A distributed data processing platform for high throughput genomics". BMC Genomics. 21 (1): 310. doi: 10.1186/s12864-020-6714-x. PMC  7168977. PMID  32306927.
  8. ^ Yuen, Denis; Cabansay, Louise; Duncan, Andrew; Luu, Gary; Hogue, Gregory; Overbeck, Charles; Perez, Natalie; Shands, Walt; Steinberg, David; Reid, Chaz; Olunwa, Nneka; Hansen, Richard; Sheets, Elizabeth; o'Farrell, Ash; Cullion, Kim; o'Connor, Brian D; Paten, Benedict; Stein, Lincoln (2021). "The Dockstore: Enhancing a community platform for sharing reproducible and accessible computational protocols". Nucleic Acids Research. 49 (W1): W624–W632. doi: 10.1093/nar/gkab346. PMC  8218198. PMID  33978761.
  9. ^ Ahmed, Azza E.; Allen, Joshua M.; Bhat, Tajesvi; Burra, Prakruthi; Fliege, Christina E.; Hart, Steven N.; Heldenbrand, Jacob R.; Hudson, Matthew E.; Istanto, Dave Deandre; Kalmbach, Michael T.; Kapraun, Gregory D.; Kendig, Katherine I.; Kendzior, Matthew Charles; Klee, Eric W.; Mattson, Nate; Ross, Christian A.; Sharif, Sami M.; Venkatakrishnan, Ramshankar; Fadlelmola, Faisal M.; Mainzer, Liudmila S. (2021). "Design considerations for workflow management systems use in production genomics research and the clinic". Scientific Reports. 11 (1): 21680. Bibcode: 2021NatSR..1121680A. doi: 10.1038/s41598-021-99288-8. PMC  8569008. PMID  34737383.
  10. ^ Baichoo, Shakuntala; Souilmi, Yassine; Panji, Sumir; Botha, Gerrit; Meintjes, Ayton; Hazelhurst, Scott; Bendou, Hocine; Beste, Eugene de; Mpangase, Phelelani T.; Souiai, Oussema; Alghali, Mustafa; Yi, Long; o'Connor, Brian D.; Crusoe, Michael; Armstrong, Don; Aron, Shaun; Joubert, Fourie; Ahmed, Azza E.; Mbiyavanga, Mamana; Heusden, Peter van; Magosi, Lerato E.; Zermeno, Jennie; Mainzer, Liudmila Sergeevna; Fadlelmola, Faisal M.; Jongeneel, C. Victor; Mulder, Nicola (2018). "Developing reproducible bioinformatics analysis workflows for heterogeneous computing environments to support African genomics". BMC Bioinformatics. 19 (1): 457. doi: 10.1186/s12859-018-2446-1. PMC  6264621. PMID  30486782.
  11. ^ Jackson, Michael; Kavoussanakis, Kostas; Wallace, Edward W. J. (2021). "Using prototyping to choose a bioinformatics workflow management system". PLOS Computational Biology. 17 (2): e1008622. Bibcode: 2021PLSCB..17E8622J. doi: 10.1371/journal.pcbi.1008622. PMC  7906312. PMID  33630841.
  12. ^ Tommaso, Paolo Di; Floden, Evan W.; Magis, Cedrik; Palumbo, Emilio; Notredame, Cedric (2017). "Nextflow : Un outil efficace pour l'amélioration de la stabilité numérique des calculs en analyse génomique". Biologie Aujourd'hui. 211 (3): 233–237. doi: 10.1051/jbio/2017029. PMID  29412134.
  13. ^ "Nextflow Documentation - Channels". docs.nextflow.io. Retrieved 6 June 2022.
  14. ^ "Nextflow Documentation - Domain Specific Language (DSL) 2". docs.nextflow.io. Retrieved 6 June 2022.
  15. ^ "Nextflow Documentation - Pipeline Sharing". docs.nextflow.io. Retrieved 6 June 2022.
  16. ^ "Nextflow Documentation - Executors". docs.nextflow.io. Retrieved 6 June 2022.
  17. ^ "Nextflow Documentation - Amazon Cloud". docs.nextflow.io. Retrieved 6 June 2022.
  18. ^ "Nextflow Documentation - Azure Cloud". docs.nextflow.io. Retrieved 6 June 2022.
  19. ^ "Nextflow Documentation - Google Cloud". docs.nextflow.io. Retrieved 6 June 2022.
  20. ^ Hykes, Solomon (7 June 2013). "Dot Scale 2013 - Why we built Docker". YouTube. Retrieved 6 June 2022.
  21. ^ a b c d Di Tomasso, Paolo (14 October 2021). "The story of Nextflow: Building a modern pipeline orchestrator". eLifeSciences.org. Retrieved 6 June 2022.
  22. ^ "Nextflow Documentation - Containers". docs.nextflow.io. Retrieved 7 June 2022.
  23. ^ Di Tommaso, Paolo (13 October 2022). "Nextflow and the future of containers". YouTube. Retrieved 17 November 2022.
  24. ^ "Release Version 0.3.0 · nextflow-io/nextflow". GitHub. Retrieved 31 May 2022.
  25. ^ Di Tomasso, Paolo (24 October 2018). "Goodbye zero, Hello Apache!". Nextflow.io/blog. Retrieved 7 June 2022.
  26. ^ Di Tommaso, Paolo (8 October 2019). "Introducing Nextflow Tower - Seamless monitoring of data analysis workflows from anywhere". Seqera.IO. Retrieved 7 June 2022.
  27. ^ Di Tommaso, Paolo (24 July 2020). "Nextflow DSL 2 is here!". Nextflow.IO/blog. Retrieved 7 June 2022.
  28. ^ Di Tomasso, Paolo; Chatzou, Maria; Floden, Evan; Prieto Barja, Pablo; Palumbo, Emilio; Notredame, Cedric (11 April 2017). "Nextflow enables reproducible computational workflows". Nature Biotechnology. 35 (4): 316–319. doi: 10.1038/nbt.3820. PMID  28398311. S2CID  9690740. Retrieved 7 June 2022.
  29. ^ Fellows Yates, James A.; Lamnidis, Thiseas C.; Borry, Maxime; Andrades Valtueña, Aida; Fagernäs, Zandra; Clayton, Stephen; Garcia, Maxime U.; Neukamm, Judith; Peltzer, Alexander (2021). "Reproducible, portable, and efficient ancient genome reconstruction with nf-core/Eager". PeerJ. 9: e10947. doi: 10.7717/peerj.10947. PMC  7977378. PMID  33777521.
  30. ^ Krakau, Sabrina; Straub, Daniel; Gourlé, Hadrien; Gabernet, Gisela; Nahnsen, Sven (2022). "Nf-core/Mag: A best-practice pipeline for metagenome hybrid assembly and binning". Nar Genomics and Bioinformatics. 4: lqac007. doi: 10.1093/nargab/lqac007. PMC  8808542. PMID  35118380.
  31. ^ Garcia, Maxime; Juhos, Szilveszter; Larsson, Malin; Olason, Pall I.; Martin, Marcel; Eisfeldt, Jesper; Dilorenzo, Sebastian; Sandgren, Johanna; Díaz De Ståhl, Teresita; Ewels, Philip; Wirta, Valtteri; Nistér, Monica; Käller, Max; Nystedt, Björn (2020). "Sarek: A portable workflow for whole-genome sequencing analysis of germline and somatic variants". F1000Research. 9: 63. doi: 10.12688/f1000research.16665.2. PMC  7111497. PMID  32269765.
  32. ^ Digby, Barry; Finn, Stephen P.; ó Broin, Pilib (2023). "Nf-core/Circrna: A portable workflow for the quantification, miRNA target prediction and differential expression analysis of circular RNAs". BMC Bioinformatics. 24 (1): 27. doi: 10.1186/s12859-022-05125-8. PMC  9875403. PMID  36694127.
  33. ^ Ewels, Philip; Peltzer, Alexander; Fillinger, Sven; Alneberg, Johannes; Patel, Harshil; Wilm, Andreas; Garcia, Maxime Ulysse; Di Tommaso, Paolo; Nahnsen, Sven (April 1, 2019). "Nf-core: Community curated bioinformatics pipelines". Research Gate. Retrieved June 30, 2022.
  34. ^ Zapata Garin, Claire-Alix. "nf-core: a community-driven initiative to standardise Nextflow-based pipelines". Lifebit.ai. Retrieved June 30, 2022.
  35. ^ "The nf-core community provides computational pipelines". SciLifeLab. February 14, 2020. Retrieved June 30, 2022.
  36. ^ "Nextflow and nf-core: Reproducible Workflows for the Scientific Community". Chan Zuckerberg Initiative. 27 July 2020. Retrieved 15 June 2022.
  37. ^ "nf-core Github organization". GitHub. Retrieved 18 November 2022.
  38. ^ Floden, Evan (5 November 2021). "Genetic Sequencing Will Enable Us To Win The Global Battle Against COVID-19".
  39. ^ Afolayan, Ayorinde O.; et al. (2021). "Overcoming Data Bottlenecks in Genomic Pathogen Surveillance". Clinical Infectious Diseases. 73 (Suppl_4): S267–S274. doi: 10.1093/cid/ciab785. PMC  8634317. PMID  34850839.
  40. ^ Tilloy, Valentin; Cuzin, Pierre; Leroi, Laura; Guérin, Emilie; Durand, Patrick; Alain, Sophie (2022). "ASPICov: An automated pipeline for identification of SARS-Cov2 nucleotidic variants". PLOS ONE. 17 (1): e0262953. Bibcode: 2022PLoSO..1762953T. doi: 10.1371/journal.pone.0262953. PMC  8791494. PMID  35081137.
  41. ^ Petit, Robert A.; Read, Timothy D. (2020). "Bactopia: A Flexible Pipeline for Complete Analysis of Bacterial Genomes". mSystems. 5 (4). doi: 10.1128/mSystems.00190-20. PMC  7406220. PMID  32753501.
  42. ^ Pandolfo, Mattia; Telatin, Andrea; Lazzari, Gioele; Adriaenssens, Evelien M.; Vitulo, Nicola (2022). "Meta Phage: An Automated Pipeline for Analyzing, Annotating, and Classifying Bacteriophages in Metagenomics Sequencing Data". mSystems. 7 (5): e0074122. doi: 10.1128/msystems.00741-22. PMC  9599279. PMID  36069454.
  43. ^ Gauthier, Marie-Emilie A.; Lelwala, Ruvini V.; Elliott, Candace E.; Windell, Craig; Fiorito, Sonia; Dinsdale, Adrian; Whattam, Mark; Pattemore, Julie; Barrero, Roberto A. (2022). "Side-by-Side Comparison of Post-Entry Quarantine and High Throughput Sequencing Methods for Virus and Viroid Diagnosis". Biology. 11 (2): 263. doi: 10.3390/biology11020263. PMC  8868628. PMID  35205129.
  44. ^ Brandt, Christian; Krautwurst, Sebastian; Spott, Riccardo; Lohde, Mara; Jundzill, Mateusz; Marquet, Mike; Hölzer, Martin (2021). "Pore Cov-An Easy to Use, Fast, and Robust Workflow for SARS-CoV-2 Genome Reconstruction via Nanopore Sequencing". Frontiers in Genetics. 12: 711437. doi: 10.3389/fgene.2021.711437. PMC  8355734. PMID  34394197.
  45. ^ Afiahayati; Bernard, Stefanus; Gunadi; Wibawa, Hendra; Hakim, Mohamad Saifudin; Marcellus; Parikesit, Arli Aditya; Dewa, Chandra Kusuma; Sakakibara, Yasubumi (2022). "A Comparison of Bioinformatics Pipelines for Enrichment Illumina Next Generation Sequencing Systems in Detecting SARS-CoV-2 Virus Strains". Genes. 13 (8): 1330. doi: 10.3390/genes13081330. PMC  9394340. PMID  35893066.
  46. ^ Niehaus, Jason (14 July 2022). "Bioinformatics at Addgene". Addgene corporate blog. Retrieved 25 February 2023.
  47. ^ Ssekagiri, Alfred; Jjingo, Daudi; Lujumba, Ibra; Bbosa, Nicholas; Bugembe, Daniel L.; Kateete, David P.; Jordan, I King; Kaleebu, Pontiano; Ssemwanga, Deogratius (2022). "Quasi Flow: A Nextflow pipeline for analysis of NGS-based HIV-1 drug resistance data". Bioinformatics Advances. 2: vbac089. doi: 10.1093/bioadv/vbac089. PMC  9722223. PMID  36699347.
  48. ^ Theaud, Guillaume; Houde, Jean-Christophe; Boré, Arnaud; Rheault, François; Morency, Felix; Descoteaux, Maxime (2020). "Tracto Flow: A robust, efficient and reproducible diffusion MRI pipeline leveraging Nextflow & Singularity". NeuroImage. 218: 116889. doi: 10.1016/j.neuroimage.2020.116889. PMID  32447016. S2CID  164318811.
  49. ^ Van Maldegem, Febe; Valand, Karishma; Cole, Megan; Patel, Harshil; Angelova, Mihaela; Rana, Sareena; Colliver, Emma; Enfield, Katey; Bah, Nourdine; Kelly, Gavin; Tsang, Victoria Siu Kwan; Mugarza, Edurne; Moore, Christopher; Hobson, Philip; Levi, Dina; Molina-Arcas, Miriam; Swanton, Charles; Downward, Julian (2021). "Characterisation of tumour microenvironment remodelling following oncogene inhibition in preclinical studies with imaging mass cytometry". Nature Communications. 12 (1): 5906. Bibcode: 2021NatCo..12.5906V. doi: 10.1038/s41467-021-26214-x. PMC  8501076. PMID  34625563.
  50. ^ Li, Chenxin; Gao, Mingxuan; Yang, Wenxian; Zhong, Chuanqi; Yu, Rongshan (2021). "Diamond: A multi-modal DIA mass spectrometry data processing pipeline". Bioinformatics. 37 (2): 265–267. doi: 10.1093/bioinformatics/btaa1093. PMID  33416868.
  51. ^ Luu, Gordon T.; Freitas, Michael A.; Lizama-Chamu, Itzel; McCaughey, Catherine S.; Sanchez, Laura M.; Wang, Mingxun (2022). "TIMSCONVERT: A workflow to convert trapped ion mobility data to open data formats". Bioinformatics. 38 (16): 4046–4047. doi: 10.1093/bioinformatics/btac419. PMC  9991885. PMID  35758608.
  52. ^ Perez-Riverol, Yasset; Moreno, Pablo (2020). "Scalable Data Analysis in Proteomics and Metabolomics Using Bio Containers and Workflows Engines". Proteomics. 20 (9): e1900147. doi: 10.1002/pmic.201900147. PMC  7613303. PMID  31657527.
  53. ^ Vlasova, Anna; Hermoso Pulido, Toni; Camara, Francisco; Ponomarenko, Julia; Guigó, Roderic (2021). "FA-nf: A Functional Annotation Pipeline for Proteins from Non-Model Organisms Implemented in Nextflow". Genes. 12 (10): 1645. doi: 10.3390/genes12101645. PMC  8535801. PMID  34681040.
  54. ^ Miller, Rachel M.; Jordan, Ben T.; Mehlferber, Madison M.; Jeffery, Erin D.; Chatzipantsiou, Christina; Kaur, Simi; Millikin, Robert J.; Dai, Yunxiang; Tiberi, Simone; Castaldi, Peter J.; Shortreed, Michael R.; Luckey, Chance John; Conesa, Ana; Smith, Lloyd M.; Deslattes Mays, Anne; Sheynkman, Gloria M. (2022). "Enhanced protein isoform characterization through long-read proteogenomics". Genome Biology. 23 (1): 69. doi: 10.1186/s13059-022-02624-y. PMC  8892804. PMID  35241129.
  55. ^ Othman, Houcemeddine; Jemimah, Sherlyn; Da Rocha, Jorge Emanuel Batista (2022). "SWAAT Bioinformatics Workflow for Protein Structure-Based Annotation of ADME Gene Variants". Journal of Personalized Medicine. 12 (2): 263. doi: 10.3390/jpm12020263. PMC  8875676. PMID  35207751.
  56. ^ Bichmann, Leon; Gupta, Shubham; Rosenberger, George; Kuchenbecker, Leon; Sachsenberg, Timo; Ewels, Phil; Alka, Oliver; Pfeuffer, Julianus; Kohlbacher, Oliver; Röst, Hannes (2021). "DIAproteomics: A Multifunctional Data Analysis Pipeline for Data-Independent Acquisition Proteomics and Peptidomics". Journal of Proteome Research. 20 (7): 3758–3766. doi: 10.1021/acs.jproteome.1c00123. PMID  34153189. S2CID  235597603.
  57. ^ Walzer, Mathias; García-Seisdedos, David; Prakash, Ananth; Brack, Paul; Crowther, Peter; Graham, Robert L.; George, Nancy; Mohammed, Suhaib; Moreno, Pablo; Papatheodorou, Irene; Hubbard, Simon J.; Vizcaíno, Juan Antonio (2022). "Implementing the reuse of public DIA proteomics datasets: From the PRIDE database to Expression Atlas". Scientific Data. 9 (1): 335. Bibcode: 2022NatSD...9..335W. doi: 10.1038/s41597-022-01380-9. PMC  9197839. PMID  35701420.
  58. ^ Hulstaert, Niels; Shofstahl, Jim; Sachsenberg, Timo; Walzer, Mathias; Barsnes, Harald; Martens, Lennart; Perez-Riverol, Yasset (2020). "ThermoRawFile Parser: Modular, Scalable, and Cross-Platform RAW File Conversion". Journal of Proteome Research. 19 (1): 537–542. doi: 10.1021/acs.jproteome.9b00328. PMC  7116465. PMID  31755270.
  59. ^ Li, Kai; Jain, Antrix; Malovannaya, Anna; Wen, Bo; Zhang, Bing (2020). "Deep Rescore: Leveraging Deep Learning to Improve Peptide Identification in Immunopeptidomics". Proteomics. 20 (21–22): e1900334. doi: 10.1002/pmic.201900334. PMC  7718998. PMID  32864883.

See also

Galaxy Snakemake

From Wikipedia, the free encyclopedia
(Redirected from Draft:Nextflow)
Nextflow
Original author(s)Paolo Di Tommaso
Developer(s)Seqera Labs, Centre for Genomic Regulation
Initial release April 9, 2013; 11 years ago (2013-04-09)
Stable release
v23.10.1 / January 12, 2024; 6 months ago (2024-01-12)
Preview release
v24.02.0-edge / March 9, 2024; 4 months ago (2024-03-09)
Repository https://github.com/nextflow-io/nextflow
Written in Groovy, Java
Operating system Linux, macOS, WSL
Type Scientific workflow system, Dataflow programming, Big data
License Apache License 2.0
Website nextflow.io

Nextflow is a scientific workflow system predominantly used for bioinformatic data analyses. It imposes standards on how to programmatically author a sequence of dependent compute steps and enables their execution on various local and cloud resources. [1] [2]

Purpose

Many scientific data analyses require a significant amount of sequential processing steps. Custom scripts may suffice when developing new methods or infrequently running a particular analysis, but scale poorly to complex task successions or many samples. [3] [4] [5]

Scientific workflow systems like Nextflow allow formalizing an analysis as a data analysis pipeline. Pipelines, also known as workflows, are instructions that specify order and conditions of computing steps to be performed. They are carried out by special purpose programs, so-called workflow executors, which ensure predictable and reproducible behavior in various compute environments. [3] [6] [7] [8]

Workflow systems also provide built-in solutions to common challenges of workflow development, such as the application to multiple samples, the validation of input and intermediate results, conditional execution of steps, error handling, and report generation. Advanced features of workflow systems may also include scheduling capabilities, graphical user interfaces for monitoring workflow executions, and the management of dependencies by containerizing the whole workflow or its components. [9] [10]

Typically, scientific workflow systems initially present a steep learning challenge as all their features and complexities are added on top of and in addition to the actual analysis. However, the standards and abstraction imposed by workflow systems ultimately improve the traceability of analysis steps, which is particularly relevant when collaborating on pipeline development, as is customary in scientific settings. [11]

Characteristics

Specification of workflows

In Nextflow, pipelines are constructed from individual processes that correspond to computational tasks. Each process is set up with input requirements and output declarations. Rather than running in a fixed succession, the execution of a process commences when all its input requirements are met. By specifying the output of one process as the input of another step, a logical and sequential connection between processes is created. [12]

This reactive implementation of processes is a characteristic design pattern of Nextflow and also known as functional dataflow model. [13]

Processes and whole workflows are programmed in a domain-specific language (DSL) that is provided by Nextflow and based on Apache Groovy. [14] While Nextflow's DSL is used to declare the workflow logic, developers can use their scripting language of choice within a process and mix multiple languages in a workflow. Porting existing scripts and workflows to Nextflow is therefore possible. Supported scripting languages include bash, csh, ksh, Python, Ruby, and R. Any scripting language that uses the standard Unix shebang declaration (#!/bin/bash) is supported in Nextflow.

An exemplary workflow consisting of only one process is shown below:

process hello_world {
    input:
    val greeting

    output:
    path "${greeting}.txt"

    script:
    """
    echo "${greeting} World!" > ${greeting}.txt
    """
}

workflow {
    Channel.of("Hello", "Ciao", "Hola", "Bonjour") | hello_world
}

To facilitate straightforward collaboration on workflows, Nextflow has native support for source-code management systems and DevOps-platforms including GitHub, GitLab, and others. [15]

Execution of workflows

Workflows written in Nextflow's DSL can be deployed and run across diverse computing environments without modifications to the pipeline code.

To enable portability, Nextflow ships with dedicated executors for a variety of platforms [16] including those of major cloud providers. Because Nextflow decouples individual process steps, it can optionally be configured to spread execution across multiple computing platforms. It supports the following environments for pipeline execution:

  • Local – the default executor. Nextflow pipelines run on Linux or Mac OS and execution occurs on the computer where the pipeline is launched.
  • HPC workload managers – Slurm, SGE, LSF, Moab, PBS Pro, PBS/Torque, HTCondor, NQSII, OAR
  • Kubernetes – local or cloud-based Kubernetes implementations (GKE, EKS, or AKS)
  • Cloud batch services – AWS Batch, [17] Azure Batch [18]
  • Other environments – Apache Ignite, Google Life Sciences [19]

Containers for portability across computing environments

A fundamental concept of Nextflow is its tight integration with software containers. Whole workflows and, in later versions, also single processes can harness containers to allow their execution across various compute environments without tedious installation and configuration routines. [3]

This design choice was strongly influenced by Solomon Hyke's talk at dotScale in 2013, [20] which had a significant impact on Nextflow's principal developer, Paolo Di Tommaso. [21]

Container frameworks supported by Nextflow include Docker, Singularity, Charliecloud, Podman, and Shifter. [22] Those type of containers can be utilized in a workflow and are automatically retrieved from external repositories when the pipeline is executed. At Nextflow Summit 2022, it was unveiled that future versions of Nextflow will support a dedicated container provisioning service for an improved integration of customized containers into workflows. [23]

Developmental history

Nextflow was originally developed at the Centre for Genomic Regulation in Spain and released as an open-source project on GitHub in July 2013. [24] In October 2018, the project license for Nextflow was changed from GPLv3 to Apache 2.0. [25]

In July 2018, Seqera Labs was launched as a spin-off from the Centre for Genomic Regulation. [21] The company employs many of Nextflow's core developers and maintainers and provides commercial services [26] and consulting with a focus on Nextflow.

In July 2020, a major extension and revision of Nextflow's domain-specific language was introduced to allow for sub-workflows and additional improvements. [27] In the same year, monthly downloads of Nextflow sat at approximately 55,000 per month. [21]

Adoption and reception

The nf-core community

In addition to the Centre for Genomic Regulation, [28] other sequencing facilities have adopted Nextflow as their preferred Scientific workflow system, among them the Quantitative Biology Center in Tübingen, the Francis Crick Institute, A*STAR Genome Institute of Singapore, and the Swedish National Genomics Infrastructure. [21]

Efforts to share, harmonize, and curate the bioinformatic pipelines used by those facilities [29] [30] [31] [32] eventually turned into the nf-core project. [33] Spearheaded by Phil Ewels from the Swedish National Genomics Infrastructure [34] [35] the focus of the nf-core project is to ensure that pipelines are reproducible and portable across different hardware, operating systems, and software versions. In July 2020, Nextflow and nf-core were awarded a grant from the Chan Zuckerberg Initiative, recognizing its role as a vital open-source software. [36]

As of 2022, the nf-core organization hosts 73 Nextflow pipelines for the biosciences and more than 700 process modules. Uniting more than 500 developers and scientists, it is the largest collaborative effort and community to develop bioinformatic data analysis pipelines. [37]

By domain and research subject

Nextflow is preferred in sequencing data processing and genomic data analysis. Over the last five years, numerous pipelines for many different applications and analyses in the field of genomics have been published.

A notable use case in this regard was for pathogen surveillance during the COVID-19 pandemic. [38] Monitoring the emergence of new virus variants and retracing its global spread required the swift and highly automatized, yet accurate, processing of raw data, variant analysis, and the designation of lineages, which was enabled by pipelines written in Nextflow. [39] [40] [41] [42] [43] [44] [45]

Nextflow also plays an important role for the non-profit plasmid repository Addgene, which uses it to corroborate the integrity of all deposited plasmids. [46]

Apart from genomics, Nextflow is gaining popularity in other domains of biomedical data processing, which also require the application of complex workflows on large amounts of primary data: Drug screening, [47] Diffusion magnetic resonance imaging (dMRI) in radiology, [48] and mass spectrometry data processing, [49] [50] [51] the latter with a particular focus on proteomics [52] [53] [54] [55] [56] [57] [58] [59]

References

  1. ^ Strozzi, Francesco; Janssen, Roel; Wurmus, Ricardo; Crusoe, Michael R.; Githinji, George; Di Tommaso, Paolo; Belhachemi, Dominique; Möller, Steffen; Smant, Geert; De Ligt, Joep; Prins, Pjotr (2019). "Scalable Workflows and Reproducible Data Analysis for Genomics". Evolutionary Genomics. Methods in Molecular Biology. Vol. 1910. pp. 723–745. doi: 10.1007/978-1-4939-9074-0_24. ISBN  978-1-4939-9073-3. PMC  7613310. PMID  31278683.
  2. ^ Gao, Mingxuan; Ling, Mingyi; Tang, Xinwei; Wang, Shun; Xiao, Xu; Qiao, Ying; Yang, Wenxian; Yu, Rongshan (2021). "Comparison of high-throughput single-cell RNA sequencing data processing pipelines". Briefings in Bioinformatics. 22 (3). doi: 10.1093/bib/bbaa116. PMID  34020539.
  3. ^ a b c Wratten, Laura; Wilm, Andreas; Göke, Jonathan (October 2021). "Reproducible, scalable, and shareable analysis pipelines with bioinformatics workflow managers". Nature Methods. 18 (10): 1161–1168. doi: 10.1038/s41592-021-01254-9. PMID  34556866. S2CID  237616424.
  4. ^ Terrón-Camero, Laura C.; Gordillo-González, Fernando; Salas-Espejo, Eduardo; Andrés-León, Eduardo (2022). "Comparison of Metagenomics and Metatranscriptomics Tools: A Guide to Making the Right Choice". Genes. 13 (12): 2280. doi: 10.3390/genes13122280. PMC  9777648. PMID  36553546.
  5. ^ Federico, Anthony; Karagiannis, Tanya; Karri, Kritika; Kishore, Dileep; Koga, Yusuke; Campbell, Joshua D.; Monti, Stefano (2019). "Pipeliner: A Nextflow-Based Framework for the Definition of Sequencing Data Processing Pipelines". Frontiers in Genetics. 10: 614. doi: 10.3389/fgene.2019.00614. PMC  6609566. PMID  31316552.
  6. ^ Kolpakov, Fedor; Akberdin, Ilya; Kiselev, Ilya; Kolmykov, Semyon; Kondrakhin, Yury; Kulyashov, Mikhail; Kutumova, Elena; Pintus, Sergey; Ryabova, Anna; Sharipov, Ruslan; Yevshin, Ivan; Zhatchenko, Sergey; Kel, Alexander (2022). "BioUML—towards a universal research platform". Nucleic Acids Research. 50 (W1): W124–W131. doi: 10.1093/nar/gkac286. PMC  9252820. PMID  35536253.
  7. ^ Yukselen, Onur; Turkyilmaz, Osman; Ozturk, Ahmet Rasit; Garber, Manuel; Kucukural, Alper (2020). "Dolphin Next: A distributed data processing platform for high throughput genomics". BMC Genomics. 21 (1): 310. doi: 10.1186/s12864-020-6714-x. PMC  7168977. PMID  32306927.
  8. ^ Yuen, Denis; Cabansay, Louise; Duncan, Andrew; Luu, Gary; Hogue, Gregory; Overbeck, Charles; Perez, Natalie; Shands, Walt; Steinberg, David; Reid, Chaz; Olunwa, Nneka; Hansen, Richard; Sheets, Elizabeth; o'Farrell, Ash; Cullion, Kim; o'Connor, Brian D; Paten, Benedict; Stein, Lincoln (2021). "The Dockstore: Enhancing a community platform for sharing reproducible and accessible computational protocols". Nucleic Acids Research. 49 (W1): W624–W632. doi: 10.1093/nar/gkab346. PMC  8218198. PMID  33978761.
  9. ^ Ahmed, Azza E.; Allen, Joshua M.; Bhat, Tajesvi; Burra, Prakruthi; Fliege, Christina E.; Hart, Steven N.; Heldenbrand, Jacob R.; Hudson, Matthew E.; Istanto, Dave Deandre; Kalmbach, Michael T.; Kapraun, Gregory D.; Kendig, Katherine I.; Kendzior, Matthew Charles; Klee, Eric W.; Mattson, Nate; Ross, Christian A.; Sharif, Sami M.; Venkatakrishnan, Ramshankar; Fadlelmola, Faisal M.; Mainzer, Liudmila S. (2021). "Design considerations for workflow management systems use in production genomics research and the clinic". Scientific Reports. 11 (1): 21680. Bibcode: 2021NatSR..1121680A. doi: 10.1038/s41598-021-99288-8. PMC  8569008. PMID  34737383.
  10. ^ Baichoo, Shakuntala; Souilmi, Yassine; Panji, Sumir; Botha, Gerrit; Meintjes, Ayton; Hazelhurst, Scott; Bendou, Hocine; Beste, Eugene de; Mpangase, Phelelani T.; Souiai, Oussema; Alghali, Mustafa; Yi, Long; o'Connor, Brian D.; Crusoe, Michael; Armstrong, Don; Aron, Shaun; Joubert, Fourie; Ahmed, Azza E.; Mbiyavanga, Mamana; Heusden, Peter van; Magosi, Lerato E.; Zermeno, Jennie; Mainzer, Liudmila Sergeevna; Fadlelmola, Faisal M.; Jongeneel, C. Victor; Mulder, Nicola (2018). "Developing reproducible bioinformatics analysis workflows for heterogeneous computing environments to support African genomics". BMC Bioinformatics. 19 (1): 457. doi: 10.1186/s12859-018-2446-1. PMC  6264621. PMID  30486782.
  11. ^ Jackson, Michael; Kavoussanakis, Kostas; Wallace, Edward W. J. (2021). "Using prototyping to choose a bioinformatics workflow management system". PLOS Computational Biology. 17 (2): e1008622. Bibcode: 2021PLSCB..17E8622J. doi: 10.1371/journal.pcbi.1008622. PMC  7906312. PMID  33630841.
  12. ^ Tommaso, Paolo Di; Floden, Evan W.; Magis, Cedrik; Palumbo, Emilio; Notredame, Cedric (2017). "Nextflow : Un outil efficace pour l'amélioration de la stabilité numérique des calculs en analyse génomique". Biologie Aujourd'hui. 211 (3): 233–237. doi: 10.1051/jbio/2017029. PMID  29412134.
  13. ^ "Nextflow Documentation - Channels". docs.nextflow.io. Retrieved 6 June 2022.
  14. ^ "Nextflow Documentation - Domain Specific Language (DSL) 2". docs.nextflow.io. Retrieved 6 June 2022.
  15. ^ "Nextflow Documentation - Pipeline Sharing". docs.nextflow.io. Retrieved 6 June 2022.
  16. ^ "Nextflow Documentation - Executors". docs.nextflow.io. Retrieved 6 June 2022.
  17. ^ "Nextflow Documentation - Amazon Cloud". docs.nextflow.io. Retrieved 6 June 2022.
  18. ^ "Nextflow Documentation - Azure Cloud". docs.nextflow.io. Retrieved 6 June 2022.
  19. ^ "Nextflow Documentation - Google Cloud". docs.nextflow.io. Retrieved 6 June 2022.
  20. ^ Hykes, Solomon (7 June 2013). "Dot Scale 2013 - Why we built Docker". YouTube. Retrieved 6 June 2022.
  21. ^ a b c d Di Tomasso, Paolo (14 October 2021). "The story of Nextflow: Building a modern pipeline orchestrator". eLifeSciences.org. Retrieved 6 June 2022.
  22. ^ "Nextflow Documentation - Containers". docs.nextflow.io. Retrieved 7 June 2022.
  23. ^ Di Tommaso, Paolo (13 October 2022). "Nextflow and the future of containers". YouTube. Retrieved 17 November 2022.
  24. ^ "Release Version 0.3.0 · nextflow-io/nextflow". GitHub. Retrieved 31 May 2022.
  25. ^ Di Tomasso, Paolo (24 October 2018). "Goodbye zero, Hello Apache!". Nextflow.io/blog. Retrieved 7 June 2022.
  26. ^ Di Tommaso, Paolo (8 October 2019). "Introducing Nextflow Tower - Seamless monitoring of data analysis workflows from anywhere". Seqera.IO. Retrieved 7 June 2022.
  27. ^ Di Tommaso, Paolo (24 July 2020). "Nextflow DSL 2 is here!". Nextflow.IO/blog. Retrieved 7 June 2022.
  28. ^ Di Tomasso, Paolo; Chatzou, Maria; Floden, Evan; Prieto Barja, Pablo; Palumbo, Emilio; Notredame, Cedric (11 April 2017). "Nextflow enables reproducible computational workflows". Nature Biotechnology. 35 (4): 316–319. doi: 10.1038/nbt.3820. PMID  28398311. S2CID  9690740. Retrieved 7 June 2022.
  29. ^ Fellows Yates, James A.; Lamnidis, Thiseas C.; Borry, Maxime; Andrades Valtueña, Aida; Fagernäs, Zandra; Clayton, Stephen; Garcia, Maxime U.; Neukamm, Judith; Peltzer, Alexander (2021). "Reproducible, portable, and efficient ancient genome reconstruction with nf-core/Eager". PeerJ. 9: e10947. doi: 10.7717/peerj.10947. PMC  7977378. PMID  33777521.
  30. ^ Krakau, Sabrina; Straub, Daniel; Gourlé, Hadrien; Gabernet, Gisela; Nahnsen, Sven (2022). "Nf-core/Mag: A best-practice pipeline for metagenome hybrid assembly and binning". Nar Genomics and Bioinformatics. 4: lqac007. doi: 10.1093/nargab/lqac007. PMC  8808542. PMID  35118380.
  31. ^ Garcia, Maxime; Juhos, Szilveszter; Larsson, Malin; Olason, Pall I.; Martin, Marcel; Eisfeldt, Jesper; Dilorenzo, Sebastian; Sandgren, Johanna; Díaz De Ståhl, Teresita; Ewels, Philip; Wirta, Valtteri; Nistér, Monica; Käller, Max; Nystedt, Björn (2020). "Sarek: A portable workflow for whole-genome sequencing analysis of germline and somatic variants". F1000Research. 9: 63. doi: 10.12688/f1000research.16665.2. PMC  7111497. PMID  32269765.
  32. ^ Digby, Barry; Finn, Stephen P.; ó Broin, Pilib (2023). "Nf-core/Circrna: A portable workflow for the quantification, miRNA target prediction and differential expression analysis of circular RNAs". BMC Bioinformatics. 24 (1): 27. doi: 10.1186/s12859-022-05125-8. PMC  9875403. PMID  36694127.
  33. ^ Ewels, Philip; Peltzer, Alexander; Fillinger, Sven; Alneberg, Johannes; Patel, Harshil; Wilm, Andreas; Garcia, Maxime Ulysse; Di Tommaso, Paolo; Nahnsen, Sven (April 1, 2019). "Nf-core: Community curated bioinformatics pipelines". Research Gate. Retrieved June 30, 2022.
  34. ^ Zapata Garin, Claire-Alix. "nf-core: a community-driven initiative to standardise Nextflow-based pipelines". Lifebit.ai. Retrieved June 30, 2022.
  35. ^ "The nf-core community provides computational pipelines". SciLifeLab. February 14, 2020. Retrieved June 30, 2022.
  36. ^ "Nextflow and nf-core: Reproducible Workflows for the Scientific Community". Chan Zuckerberg Initiative. 27 July 2020. Retrieved 15 June 2022.
  37. ^ "nf-core Github organization". GitHub. Retrieved 18 November 2022.
  38. ^ Floden, Evan (5 November 2021). "Genetic Sequencing Will Enable Us To Win The Global Battle Against COVID-19".
  39. ^ Afolayan, Ayorinde O.; et al. (2021). "Overcoming Data Bottlenecks in Genomic Pathogen Surveillance". Clinical Infectious Diseases. 73 (Suppl_4): S267–S274. doi: 10.1093/cid/ciab785. PMC  8634317. PMID  34850839.
  40. ^ Tilloy, Valentin; Cuzin, Pierre; Leroi, Laura; Guérin, Emilie; Durand, Patrick; Alain, Sophie (2022). "ASPICov: An automated pipeline for identification of SARS-Cov2 nucleotidic variants". PLOS ONE. 17 (1): e0262953. Bibcode: 2022PLoSO..1762953T. doi: 10.1371/journal.pone.0262953. PMC  8791494. PMID  35081137.
  41. ^ Petit, Robert A.; Read, Timothy D. (2020). "Bactopia: A Flexible Pipeline for Complete Analysis of Bacterial Genomes". mSystems. 5 (4). doi: 10.1128/mSystems.00190-20. PMC  7406220. PMID  32753501.
  42. ^ Pandolfo, Mattia; Telatin, Andrea; Lazzari, Gioele; Adriaenssens, Evelien M.; Vitulo, Nicola (2022). "Meta Phage: An Automated Pipeline for Analyzing, Annotating, and Classifying Bacteriophages in Metagenomics Sequencing Data". mSystems. 7 (5): e0074122. doi: 10.1128/msystems.00741-22. PMC  9599279. PMID  36069454.
  43. ^ Gauthier, Marie-Emilie A.; Lelwala, Ruvini V.; Elliott, Candace E.; Windell, Craig; Fiorito, Sonia; Dinsdale, Adrian; Whattam, Mark; Pattemore, Julie; Barrero, Roberto A. (2022). "Side-by-Side Comparison of Post-Entry Quarantine and High Throughput Sequencing Methods for Virus and Viroid Diagnosis". Biology. 11 (2): 263. doi: 10.3390/biology11020263. PMC  8868628. PMID  35205129.
  44. ^ Brandt, Christian; Krautwurst, Sebastian; Spott, Riccardo; Lohde, Mara; Jundzill, Mateusz; Marquet, Mike; Hölzer, Martin (2021). "Pore Cov-An Easy to Use, Fast, and Robust Workflow for SARS-CoV-2 Genome Reconstruction via Nanopore Sequencing". Frontiers in Genetics. 12: 711437. doi: 10.3389/fgene.2021.711437. PMC  8355734. PMID  34394197.
  45. ^ Afiahayati; Bernard, Stefanus; Gunadi; Wibawa, Hendra; Hakim, Mohamad Saifudin; Marcellus; Parikesit, Arli Aditya; Dewa, Chandra Kusuma; Sakakibara, Yasubumi (2022). "A Comparison of Bioinformatics Pipelines for Enrichment Illumina Next Generation Sequencing Systems in Detecting SARS-CoV-2 Virus Strains". Genes. 13 (8): 1330. doi: 10.3390/genes13081330. PMC  9394340. PMID  35893066.
  46. ^ Niehaus, Jason (14 July 2022). "Bioinformatics at Addgene". Addgene corporate blog. Retrieved 25 February 2023.
  47. ^ Ssekagiri, Alfred; Jjingo, Daudi; Lujumba, Ibra; Bbosa, Nicholas; Bugembe, Daniel L.; Kateete, David P.; Jordan, I King; Kaleebu, Pontiano; Ssemwanga, Deogratius (2022). "Quasi Flow: A Nextflow pipeline for analysis of NGS-based HIV-1 drug resistance data". Bioinformatics Advances. 2: vbac089. doi: 10.1093/bioadv/vbac089. PMC  9722223. PMID  36699347.
  48. ^ Theaud, Guillaume; Houde, Jean-Christophe; Boré, Arnaud; Rheault, François; Morency, Felix; Descoteaux, Maxime (2020). "Tracto Flow: A robust, efficient and reproducible diffusion MRI pipeline leveraging Nextflow & Singularity". NeuroImage. 218: 116889. doi: 10.1016/j.neuroimage.2020.116889. PMID  32447016. S2CID  164318811.
  49. ^ Van Maldegem, Febe; Valand, Karishma; Cole, Megan; Patel, Harshil; Angelova, Mihaela; Rana, Sareena; Colliver, Emma; Enfield, Katey; Bah, Nourdine; Kelly, Gavin; Tsang, Victoria Siu Kwan; Mugarza, Edurne; Moore, Christopher; Hobson, Philip; Levi, Dina; Molina-Arcas, Miriam; Swanton, Charles; Downward, Julian (2021). "Characterisation of tumour microenvironment remodelling following oncogene inhibition in preclinical studies with imaging mass cytometry". Nature Communications. 12 (1): 5906. Bibcode: 2021NatCo..12.5906V. doi: 10.1038/s41467-021-26214-x. PMC  8501076. PMID  34625563.
  50. ^ Li, Chenxin; Gao, Mingxuan; Yang, Wenxian; Zhong, Chuanqi; Yu, Rongshan (2021). "Diamond: A multi-modal DIA mass spectrometry data processing pipeline". Bioinformatics. 37 (2): 265–267. doi: 10.1093/bioinformatics/btaa1093. PMID  33416868.
  51. ^ Luu, Gordon T.; Freitas, Michael A.; Lizama-Chamu, Itzel; McCaughey, Catherine S.; Sanchez, Laura M.; Wang, Mingxun (2022). "TIMSCONVERT: A workflow to convert trapped ion mobility data to open data formats". Bioinformatics. 38 (16): 4046–4047. doi: 10.1093/bioinformatics/btac419. PMC  9991885. PMID  35758608.
  52. ^ Perez-Riverol, Yasset; Moreno, Pablo (2020). "Scalable Data Analysis in Proteomics and Metabolomics Using Bio Containers and Workflows Engines". Proteomics. 20 (9): e1900147. doi: 10.1002/pmic.201900147. PMC  7613303. PMID  31657527.
  53. ^ Vlasova, Anna; Hermoso Pulido, Toni; Camara, Francisco; Ponomarenko, Julia; Guigó, Roderic (2021). "FA-nf: A Functional Annotation Pipeline for Proteins from Non-Model Organisms Implemented in Nextflow". Genes. 12 (10): 1645. doi: 10.3390/genes12101645. PMC  8535801. PMID  34681040.
  54. ^ Miller, Rachel M.; Jordan, Ben T.; Mehlferber, Madison M.; Jeffery, Erin D.; Chatzipantsiou, Christina; Kaur, Simi; Millikin, Robert J.; Dai, Yunxiang; Tiberi, Simone; Castaldi, Peter J.; Shortreed, Michael R.; Luckey, Chance John; Conesa, Ana; Smith, Lloyd M.; Deslattes Mays, Anne; Sheynkman, Gloria M. (2022). "Enhanced protein isoform characterization through long-read proteogenomics". Genome Biology. 23 (1): 69. doi: 10.1186/s13059-022-02624-y. PMC  8892804. PMID  35241129.
  55. ^ Othman, Houcemeddine; Jemimah, Sherlyn; Da Rocha, Jorge Emanuel Batista (2022). "SWAAT Bioinformatics Workflow for Protein Structure-Based Annotation of ADME Gene Variants". Journal of Personalized Medicine. 12 (2): 263. doi: 10.3390/jpm12020263. PMC  8875676. PMID  35207751.
  56. ^ Bichmann, Leon; Gupta, Shubham; Rosenberger, George; Kuchenbecker, Leon; Sachsenberg, Timo; Ewels, Phil; Alka, Oliver; Pfeuffer, Julianus; Kohlbacher, Oliver; Röst, Hannes (2021). "DIAproteomics: A Multifunctional Data Analysis Pipeline for Data-Independent Acquisition Proteomics and Peptidomics". Journal of Proteome Research. 20 (7): 3758–3766. doi: 10.1021/acs.jproteome.1c00123. PMID  34153189. S2CID  235597603.
  57. ^ Walzer, Mathias; García-Seisdedos, David; Prakash, Ananth; Brack, Paul; Crowther, Peter; Graham, Robert L.; George, Nancy; Mohammed, Suhaib; Moreno, Pablo; Papatheodorou, Irene; Hubbard, Simon J.; Vizcaíno, Juan Antonio (2022). "Implementing the reuse of public DIA proteomics datasets: From the PRIDE database to Expression Atlas". Scientific Data. 9 (1): 335. Bibcode: 2022NatSD...9..335W. doi: 10.1038/s41597-022-01380-9. PMC  9197839. PMID  35701420.
  58. ^ Hulstaert, Niels; Shofstahl, Jim; Sachsenberg, Timo; Walzer, Mathias; Barsnes, Harald; Martens, Lennart; Perez-Riverol, Yasset (2020). "ThermoRawFile Parser: Modular, Scalable, and Cross-Platform RAW File Conversion". Journal of Proteome Research. 19 (1): 537–542. doi: 10.1021/acs.jproteome.9b00328. PMC  7116465. PMID  31755270.
  59. ^ Li, Kai; Jain, Antrix; Malovannaya, Anna; Wen, Bo; Zhang, Bing (2020). "Deep Rescore: Leveraging Deep Learning to Improve Peptide Identification in Immunopeptidomics". Proteomics. 20 (21–22): e1900334. doi: 10.1002/pmic.201900334. PMC  7718998. PMID  32864883.

See also

Galaxy Snakemake


Videos

Youtube | Vimeo | Bing

Websites

Google | Yahoo | Bing

Encyclopedia

Google | Yahoo | Bing

Facebook