Supplementary Files for thesis titled "Visual-analytics-driven bioinformatics methods for the analysis of biomolecular data"

Download files
Access & Terms of Use
open access
Copyright: UNSW
Altmetric
Abstract
This data set provides Supplementary files referenced in the thesis titled "Visual-analytics-driven bioinformatics methods for the analysis of biomolecular data". In particular, this data set consists of the following files (Details are also provided in an included README.txt file): Description of files in this data set: 1. Supplementary File 4.1. Supplementary File 4.1 - URL and variants schema.pdf. Graphical Backus-Naur schema of the variant syntax recognized by Aquaria. 2. Supplementary File 4.2. Supplementary File 4.2 - Schema.json. Aquaria feature set schema. This schema can be utilized in conjunction with user-specified JSON files for validation in online tools such as https://www.jsonschemavalidator.net/ (see Section 4.5.5). 3. Supplementary File 6.1. Supplementary File 6.1 - Illumina and complete genome IDs.xlsx. NCBI SRA accession identifiers of 673 Illumina (short-read length) and 673 PacBio sequenced genomes (long-read length), corresponding to 673 isolates sequenced using two technologies. 4. Supplementary File 6.2. Supplementary File 6.2 - Distribution of IS in complete genomes.xlsx. ISs in complete genomes. 5. Supplementary File 6.3. Supplementary File 6.3 - QUAST analysis of assemblies.xlsx. Summary of SPAdes and SKESA assembly quality statistics, generated using QUAST. 6. Supplementary File 6.4. Supplementary File 6.4 - WiIS performance metrics.xlsx. WiIS performance metrics for each genome. 7. Supplementary File 6.5. Supplementary File 6.5 - Correlation of performance metrics and assembly statistics.xlsx. Correlation of WiIS performance metrics with SPAdes and SKESA assembly quality statistics. 8. Supplementary File 6.6. Supplementary File 6.6 - IS insertions found by all tools.xlsx. IS insertions found by all tools for each of the 673 short-read sequenced genome. 9. Supplementary File 6.7. Supplementary File 6.7 - IS insertions found by all tools (20 base pair distance threshold).xlsx. IS insertions found by all tools, with a buffer length of 20 base pairs, for each of the 673 short-read sequenced genome. 10. Supplementary File 6.8. Supplementary File 6.8 - WiIS SPAdes IS insertions found with respect.xlsx. Summary of IS insertions found by WiIS (SPAdes) with respect to Tohama I (including the counts of insertions identified by WiIS, but not in Tohama I). 11. Supplementary File 6.9. Wiis.zip. WiIS code.
Persistent link to this record
Link to External Data Repository
Electronic Location
Contact Information
Research Data Creator(s)
Corporate/Industry Contributor(s)
Publication Year
2022
Resource Type
Dataset
Keyword(s)
Schema
Bordetella pertussis
Performance metrics
Tool comparisons
Insertion sequences
UNSW Faculty
Related dataset(s)
Related publication(s)
Related grant(s)