Abstract
This data set provides Supplementary files referenced in the thesis titled "Visual-analytics-driven bioinformatics methods for the analysis of biomolecular data".
In particular, this data set consists of the following files (Details are also provided in an included README.txt file):
Description of files in this data set:
1. Supplementary File 4.1. Supplementary File 4.1 - URL and variants schema.pdf. Graphical Backus-Naur schema of the variant syntax recognized by Aquaria.
2. Supplementary File 4.2. Supplementary File 4.2 - Schema.json. Aquaria feature set schema. This schema can be utilized in conjunction with user-specified JSON files for validation in online tools such as https://www.jsonschemavalidator.net/ (see Section 4.5.5).
3. Supplementary File 6.1. Supplementary File 6.1 - Illumina and complete genome IDs.xlsx. NCBI SRA accession identifiers of 673 Illumina (short-read length) and 673 PacBio sequenced genomes (long-read length), corresponding to 673 isolates sequenced using two technologies.
4. Supplementary File 6.2. Supplementary File 6.2 - Distribution of IS in complete genomes.xlsx. ISs in complete genomes.
5. Supplementary File 6.3. Supplementary File 6.3 - QUAST analysis of assemblies.xlsx. Summary of SPAdes and SKESA assembly quality statistics, generated using QUAST.
6. Supplementary File 6.4. Supplementary File 6.4 - WiIS performance metrics.xlsx. WiIS performance metrics for each genome.
7. Supplementary File 6.5. Supplementary File 6.5 - Correlation of performance metrics and assembly statistics.xlsx. Correlation of WiIS performance metrics with SPAdes and SKESA assembly quality statistics.
8. Supplementary File 6.6. Supplementary File 6.6 - IS insertions found by all tools.xlsx. IS insertions found by all tools for each of the 673 short-read sequenced genome.
9. Supplementary File 6.7. Supplementary File 6.7 - IS insertions found by all tools (20 base pair distance threshold).xlsx. IS insertions found by all tools, with a buffer length of 20 base pairs, for each of the 673 short-read sequenced genome.
10. Supplementary File 6.8. Supplementary File 6.8 - WiIS SPAdes IS insertions found with respect.xlsx. Summary of IS insertions found by WiIS (SPAdes) with respect to Tohama I (including the counts of insertions identified by WiIS, but not in Tohama I).
11. Supplementary File 6.9. Wiis.zip. WiIS code.