Monophylize!
Select fileChange Remove
File format:

Record separator:

Keep whitespace in Newick species names

Split subspecies
Write output as TSV:

Additional metadata:
Select fileChange Remove

FAQ

How does the program determine what's para- or polyphyletic?
The steps of the algorithm are described in the manuscript (see: 'How to cite') of which this service is a part. In short:
  1. Label all interior nodes with a pre- and a post-order index.
  2. Extract all distinct taxa from the tree
  3. For each taxon:
    1. Collect all tips that belong to it
    2. Find the MRCA for the collected tips
    3. Collect all descendants of the MRCA. If this set is identical to the set of step i. then the taxon is monophyletic and the analysis moves on to the next taxon.
    4. Collect all nodes that subtend tips from the focal taxon as well as at least one other taxon and sort these by their post-order index.
    5. Group the collected, sorted nodes into distinct root-to-tip paths. Internal nodes that are nested in each other are identified (and collected in the same group) by checking that the pre-order index of the focal node is larger, and the post-order index of the focal node is smaller than that of the next node in the sorted list. If there is more than one distinct root-to-tip path (i.e., group), the taxon is considered polyphyletic, otherwise paraphyletic.
    6. For each first (i.e. most recent) node in each group, collect all subtended species. The union of these sets across groups forms the set of entangled species.
How do I interpret the results?
The output of the algorithm is presented in tabular form. Each row represent one taxon from the tree. The 'assessment' column shows whether that taxon is mono-, para- or polyphyletic. The 'tanglees' column shows with which other taxa, if any, the focal taxon is entangled.
How do I export the results?
You can copy and paste the results from the browser window into a spreadsheet program. Easier still would be to check the 'TSV' box in the 'output' tab. The results will then be written as tab-separated data that can be imported directly into spreadsheet programs, R, etc.
How is this algorithm implemented?
The algorithm is written in the Perl programming language and uses the Bio::Phylo libraries to read the input data.
Can I use this "off-line"?
Yes. You can run the script locally. Consult the embedded Perl-Doc documentation in the script or run perl monophylizer.pl --help for more info.
Can I use this through an API?
Yes. The data and parameters are uploaded through an HTTP POST request with multipart/form data and the results are in the response, so this service functions as a RESTful web service. You will need to provide the following parameters:
  • infile, which is a file upload
  • format, input file format, one of: newick, nexus, nexml, phyloxml
  • separator, the character that separates the taxon name from its identifier. Default is |
  • trinomials, which is an optional argument that, when given any value other than '0', indicates that subspecific epithets need to be parsed
  • astsv, which is an optional argument that, when given any value other than '0', indicates that the output must be written as tab-separated data
  • metadata, which is an optional file upload with additional tab-separated data to join with the taxa in the output
How is this code licensed?
The analysis script proper is licensed under the Apache License, which is very permissive for most forms of re-use.
Who wrote this code?
The analysis code was written by Rutger Vos.
Where is this hosted?
The web service is hosted at Naturalis Biodiversity Center , the source code in a GitHub repository.
How to cite?
This service is part of a publication. If you use this service in your research, please cite the publication:
Mutanen, M. et al. 2016. Species-Level Poly- and Paraphyly in DNA barcode Gene Trees: Strong Operational Bias in European Lepidoptera. Systematic Biology
How to format taxonomic names?
Taxonomic names are read without any intelligence: the expectation is that names consist of the genus, the specific epithet, and, optionally, the subspecific epithet, followed by an identifier. Anything else, such as 'sp.', 'cv.', and so on are going to cause problems and will have to be avoided for this analysis.
I am getting error messages?
This is almost certainly because your input file is somehow syntactically invalid. On the 'upload' tab there is a link to extensive documentation, with example files, to explain what input file formats are accepted. If you can't figure out what's wrong with your file, try opening it in Mesquite and exporting it as NEXUS: its intepretation of the NEXUS standard is readily understood by this service, and Mesquite might give you useful feedback about your file.