Engineering

Publication Search Results

Now showing 1 - 10 of 346
  • (2018) Chapre, Yogita Gunwant
    Thesis
    Indoor localization traditionally uses fingerprinting approaches based on Received Signal Strength (RSS), where RSS plays a crucial role in determining the nature and characteristics of location fingerprints stored in a radio-map. The RSS is a function of the distance between transmitter and receiver, which can vary due to in-path interference. This thesis identifies the factors affecting the RSS in indoor localization, discusses the effect of identified factors such as spatial, temporal, environmental, hardware and human presence on the RSS through extensive measurements in a typical IEEE 802.11 a/g/n network, and demonstrates the reliability of RSS-based location fingerprints using statistical analysis of the measured data for indoor localization. This thesis presents a novel Wi-Fi fingerprinting system CSI-MIMO, which uses fine-grained information known as Channel State Information (CSI). CSI-MIMO exploits frequency diversity and spatial diversity in an Orthogonal Frequency Division Multiplexing (OFDM) system using a Multiple Inputs Multiple Outputs (MIMO) system. CSI-MIMO uses either magnitude of CSI or a complex CSI location signature, depending on mobility in indoor environments. The performance of CSI-MIMO is compared to Fine-grained Indoor Fingerprinting System (FIFS), CSI with Single Input Single Output (SISO), and a simple CSI with MIMO. The experimental results show significant improvement with accuracy of 0.98 meter in a static environment and 0.31 meter in a dynamic environment, with optimal war-driving over existing CSI-based fingerprinting systems.

  • (2019) Dong, Manqing
    Thesis
    The web has greatly enhanced the way people perform certain activities such as online shopping, renting for a house, or even making friends. However, the web also provides the convenience to malicious users, who may write fake reviews to misguide normal users or spread fake news to mislead people. We call such information as false information. Detecting the false information is important, while traditional methods limit in mining the inner feature relationships and are inadequate for multi-domain problems. This dissertation adopts deep learning-based approaches for solving the above problems. Specifically, we consider detecting the false information with man-crafted features or latent representations. The first work discusses the man-crafted features and use statistical tests to select the effective features for detection. The selected features are learned by deep learning methods to inspect the potential improvement in mining the complex feature interactions. Specifically, an autoencoder learns the representations of the quality features and a neural random forest predicts the labels. A following work considers a more general situation that is suitable for one dimensional or higher dimensional inputs. A gradient boost module is designed for improving the stability of the neural random forest. The man-crafted features are inflexible for multi-domain tasks. Thus, we consider the latent-representation based approaches and exploit the potential latent representation interactions for detection. Additionally, attention mechanism is used for better explain the model and visualize the results. The third work reports the cross information of textual information and side information for false information detection. And the following work considers a general model for both opinion-based and fact-based false information detection problems. We also consider the similarity measures for a special type of false information, i.e. the clickbait detection. Different to the fake reviews or rumours, clickbait is a link that induce the web users to enter an unrelated web page. Thus, we compare the similarity between the link title and the linked pages and discuss the similarity-based measures for false information detection

  • (2019) Yang, Yang
    Thesis
    Many real-world applications model data as record dataset and treat the relationships among data as a graph. There are significant research efforts devoting towards efficiently and effectively managing and analysing record dataset and graph dataset. Among them, applying similarity search in massive record dataset and graph dataset is crucially important for a profounder understanding and better management of such dataset. However, the explosively rising data volume and consistently rapid evolution result in huge challenges, which make some deterministic methods infeasible in practice and ignite the ideas of approximate algorithm. In this thesis, we study three importance problems in mining similar patterns in massive datasets and design accurate and efficient approximate methods. Firstly, we study the problem of approximate containment similarity search. We propose a novel augmented KMV sketch technique, namely GB-KMV, which is data-dependent and can achieve a much better trade-off between the sketch size and the accuracy. We show that it outperforms the state-of-the-art technique LSH-E in terms of estimation accuracy under practical assumption. Our experiments on real-life datasets verify that GB-KMV is superior to LSH-E in terms of the space-accuracy trade-off, time-accuracy trade-off, and the sketch construction time. Secondly, we focus on the problem of selectivity estimation on set containment search. We propose an ordered trie structure based sampling approach named OT-Sampling. OT-Sampling partitions records based on element frequency and occurrence patterns and is significantly more accurate compared with simple random sampling method. To further enhance performance, a divide-and-conquer based sampling approach, DC-Sampling, is presented with an inclusion/exclusion prefix to explore the pruning opportunities. Finally, we study the problem of graphlet statistics estimation. We propose high-order Markov chain based method to estimate the graphlet statistics. Our method HRWd performs high-order random walk via adjacent tensor with respect to a specified local structure. By collecting graphlet samples during high-order random walk, we propose an unbiased estimator for 3, 4-vertex graphlet counting. Comparing to the state-of-the-art SRWd, we theoretically and experimentally illustrate that our method outperforms the previous method in terms of accuracy and efficiency.

  • (2019) Ajam, George
    Thesis
    Application Programming Interfaces (APIs) are at the core, forming a vital programming activity, by supporting code reuse for programmers and software developers. The usage of software libraries, software development kits (SDKs) and Web-based APIs is increasing yet accompanied by several challenges. Web-based APIs are taking advantage of HTTP requests as a vehicle for transferring information over the network, and by exchanging essential functionalities. More than 20 thousand of public Web APIs on the Internet, alone they represent the fuel that is essential for running new business models and enable composition of new ideas for applications and Web services. Programmers and developers often face challenges for using APIs; they need to understand several aspects and go through many types of documentation resources. Such resources contain missing information, often are ambiguous and confusing to understand. Usage examples are scattered all over the Web, making the usage based on assumptions or trial and error. This dissertation provides a solution to the discovery and exploration of API documentation, namely, threads on Questions and Answering (Q&As) site called Stack Overflow. We first provide empirical evidence that the primary type of support provided by Stack Overflow is related to API Usage, Debugging, API Constraints and Security. We propose a solution to explore main types of support in Stack Overflow to enable developers and API users to search for posts using API Topic-based queries effectively. We suggest a data-model for storing API related information, an approach for near real-time indexing and retrieval. To allow users to query API topics issues posts and use flexible terminology, we propose an enrichment approach using word embedding model. We build an API-Topics Domain Specific Query language (API-Topics DSL) to retrieve the indexed information easily. We implement on top of that a query bot which can answer API-Topics DSL queries.

  • (2019) Huang, Chaoran
    Thesis
    Online communities recently have made an alternative way for professionals to share expertise. This increasing usage of online communities enabled us to find experts via user generated contents and user activities. Traditionally locating experts in such websites consumes a large amount of time and requires vast human processes, while recent boosting in artificial intelligence and data mining can be a game changer. Hence, in this dissertation, we propose a set of algorithms and techniques, to find and recommend experts in online communities, especially online community question answering (CQA) websites. We systematically reviewed existing research and techniques for the expert recommendation in CQA with comparisons concerning their advantages and shortcomings. One issue found in CQA websites is low participation in posts. This limits the effectiveness of CQA based knowledge sharing, as well as at large diminishes performances of expert recommendation algorithms. Thus, we took Stack Overflow as the subject of study, which is a successful programming CQA website. We propose to recommend experts in the websites, to help lessen untouched questions, and ultimately enrich the contents in the community. Neural networks based techniques are proposed to produce representations for user generated contents and topics, then based on vector similarities we rank the posts by topics. Finally the ranked posts are used to refer the expert content creators, who can be promising in resolving new problems. Alternatively, it can be argued that scarcely a research focuses on multi-domain recommendation in CQA, while experts with more than one specialisations are often required to solve complicated, multi-discipline problems. Extended from our aforementioned work, we looked into StackExchange Networks, the parent website of Stack Overflow, and multiple knowledge domains are taken into consideration. Since more facets of experts are involved, tensor can be the desirable receptacle of data, and its decomposition is instinctively the technique for expert recommendation. Furthermore, due to the hierarchical structure of data source, a relationship tree is modelled to guide the decomposition, and it is proven effective in alleviating sparseness issue, which helps our decomposition. Discussions on open issues and future research directions are also included in this dissertation.

  • (2019) Behnaz, Ali
    Thesis
    The wide-ranging and complicated nature of data in the analytics domain coupled with diverse ways of analysing it requires appropriate technologies, solutions and tasks for processing, integration and sharing of information in organisations. This research focuses on statistical learning techniques, as they include models popularly used in analytics. Most analytics studies in organisations require the application of complex statistical learning techniques or various algorithms in an iterative way to produce valid, robust and precise results. Different users have their own requirements and distinct perspectives to be implemented on heterogenous platforms. This thesis tackles this problem, called data and model disparity in organisations, by using a combination of semantic technologies, ontologies and software engineering principles. An ontology development process using principles from popular ontology development methodologies is adopted to address the research problem of this thesis. Two domain specific use cases are conducted, and the competency questions for both are designed. These questions are analysed and consolidated to derive the generic concepts and competency questions required for an ontology. The generic concepts represent variables of interest, and the generic competency questions construct the interrelationships between variables of interest, datasets, models and other variables. The ontology is called the Statistical Learning Ontology (SLO). An architecture has been proposed to implement the SLO consisting of Data layer, Business layer and User layer. In addition, the queries from generic competency questions are implemented in the machine-readable language SPARQL. Finally, the SLO is evaluated in two steps. In the first step, the domain-specific competency questions are assessed to check how well the SLO can address them. In the second step, three case scenarios are designed and analysed against three evaluation criteria, namely applicability across statistical learning techniques, user-driven analysis and implementability.

  • (2019) Zhang, Haida
    Thesis
    In modern applications, the input data are often high-dimensional and complex, represented by matrices or graphs. Due to the large volume of data, the ability to mining these data with scalable algorithms has become a common requirement in many emerging environments. In this thesis, we study three important problems: matrix tracking overdistributed sliding window, graph-based semi-supervised learning on the data stream, and seeded graph matching. Firstly, we investigate scalable algorithms for tracking matrix over distributed sliding windows. In many applications, the input data, represented as matrices, are generated at distributed sites and arrive continuously. It is often infeasible to summarize the data matrices by simply centralizing all data. Further, queries are often answered solely based on the recently observed data points, which makes the problem more challenging. We propose novel communication efficient algorithms for tracking this distributed matrix: our sampling-based algorithms continuously track a weighted sample of rows according to their squared norms; we also raise deterministic tracking algorithms that require only oneway communication with better error guarantee. All proposed algorithms have provable guarantees. Secondly, we study the problem of graph-based semi-supervised learning for object classification on the data stream. In real-world phenomena, the objects are often observed sequentially and should be classified online. We propose a novel streaming object classification framework extending the classical label propagation algorithm. Our algorithm maintains a small sketch of the data stream with a strong classification ability. To improve the scalability, we propose a local label propagation strategy for efficient sketch update. Finally, we study the problem of seeded graph matching. Given graphs G1(V1, E1) and G2(V2, E2), and a small set S of pre-matched node pairs, the problem is to identify a matching between V1 and V2 such that each matched pair corresponds to the same underlying entity. A new framework is proposed that employs Personalized PageRank to quantify the matching score of each node pair. We then propose a strategy that postpones the selection of pairs that have competitors with similar matching scores; we theoretically prove the effect of postponing strategy on matching accuracy. To improve the scalability of matching large graphs, we design efficient approximation techniques based on PPR heavy hitters computation.

  • (2019) Malekpour, Amin
    Thesis
    Multiprocessor System-on-Chip (MPSoC) has become necessary due to the billions of transistors available to the designer, the need for fast design turnaround times, and the power wall. Thus, present embedded systems are designed with MPSoCs, and one possible way MPSoCs can be realized is through Pipelined MPSoC (PMPSoC) architectures, which are used in applications from video surveillance to cryptosystems. In the past, the security of a system was considered as a software level problem where attacks come from software and their countermeasures were also software-based. However, with the increasing number of transistors on a chip and the power wall, a hardware system design chain involves many untrusted third parties. Malicious modifications (referred to as Hardware Trojans, or HTs) can be done to the third-party components without designers' knowledge. Hardware Trojans (HTs) are a significant concern due to the damage caused by their stealth. An adversary could use HTs to extract secret information (data leakage), to modify functionality/data (functional modification), or make MPSoCs deny service. Detecting or preventing the activation of hardware Trojans is a challenging task. The mitigation techniques proposed in the literature cannot guarantee that the ICs are free from hardware Trojans. This thesis presents online monitoring, checking, and testing methodologies to mitigate hardware Trojans with any triggering mechanism and with Data Leakage (DL), Functional/Data Modification (FM), or/and Denial of Service (DoS) payloads. Four different techniques are presented in this thesis. In the first two chapters, mechanisms are proposed that (1) detect the presence of hardware Trojans in Third Party Intellectual Property (3PIP) cores of PMPSoCs, by continuous monitoring, and (2) recover the system by switching the infected processor core with another one. The first technique is to mitigate HTs with all three mentioned payloads, while in the second chapter, two different techniques for mitigating HTs with DoS payloads are presented. We designed, implemented, and tested the PMPSoCs on a commercial cycle-accurate multiprocessor simulation environment and showed that such a system could work in the presence of hardware Trojans. In the second two chapters, online monitoring and testing mechanisms to (1) detect and identify HTs with FM and DoS payloads in general MPSoCs, and (2) recover the MPSoCs to keep its execution in the presence of HTs. These mechanisms use the concept of application-specific testing for runtime detection of HTs. The MPSoCs are designed and implemented on a real hardware platform (FPGA) to show the practicality and the effectiveness of these techniques. Our experimental results show that the proposed HT mitigation techniques proposed in this thesis are effective to detect HT attacks, identify the infected component and isolate it, and finally recover the system from the attack. In comparison to the state-of-the-art mechanisms, the design's overheads (area and power consumption) are significantly lower.

  • (2010) Botros, Andrew
    Thesis
    Effective cochlear implant fitting (or programming) is essential for providing good hearing outcomes, yet it is a subjective and error-prone task. The initial objective of this research was to automate the procedure using the auditory nerve electrically evoked compound action potential (the ECAP) and machine intelligence. The Nucleus® cochlear implant measures the ECAP via its Neural Response Telemetry (NRT™) system. AutoNRT™, a commercial intelligent system that measures ECAP thresholds with the Nucleus Freedom™ implant, was firstly developed in this research. AutoNRT uses decision tree expert systems that automatically recognise ECAPs. The algorithm approaches threshold from lower stimulus levels, ensuring recipient safety during postoperative measurements. Clinical studies have demonstrated success on approximately 95% of electrodes, measured with the same efficacy as a human expert. NRT features other than ECAP threshold, such as the ECAP recovery function, could not be measured with similar success rates, precluding further automation and loudness prediction from data mining results. Despite this outcome, a better application of the ECAP threshold profile towards fitting was established. Since C-level profiles (the contour of maximum acceptable stimulus levels across the implant array) were observed to be flatter than T-level profiles (the contour of minimum audibility), a flattening of the ECAP threshold profile was adopted when applied as a fitting profile at higher stimulus levels. Clinical benefits of this profile scaling technique were demonstrated in a 42 subject study. Data mining results also provided an insight into the ECAP recovery function and refractoriness. It is argued that the ECAP recovery function is heavily influenced by the size of the recruited neural population, with evidence gathered from a computational model of the cat auditory nerve and NRT measurements with 21 human subjects. Slower ECAP recovery, at equal loudness, is a consequence of greater neural recruitment leading to lower mean spike probabilities. This view can explain the counterintuitive association between slower ECAP recovery and greater temporal responsiveness to increasing stimulation rate. This thesis presents the first attempt at achieving completely automated cochlear implant fitting via machine intelligence; a future generation implant, capable of high fidelity auditory system measurements, may realise the ultimate objective.

  • (2017) Hoseini, Sayed Amir
    Thesis
    The rapidly growing demand for higher networking capacity and data rates is forcing researchers to explore the unused spectrum in the higher frequency bands. Two such bands, the millimeter wave (mmWave), ranging from 30 GHz to 300 GHz, and the Terahertz (THz) band, ranging from 0.1 THz to 10 THz, are currently being investigated for possible use in future networks. Because many atmospheric molecules have their natural resonant frequencies in these bands, it is important to understand the effects of molecular absorption and re-radiation on the wireless networking performance in such high frequency bands. Building on the recently discovered molecular absorption models, this thesis conducts a theoretical study on the effect of molecular absorption and re-radiation on both single-antenna and multiantenna wireless communications. For the single-antenna communication, the study focuses on quantifying the temporal and spatial variation of path loss and noise, which is caused by variation in the molecular composition in the air. In particular, it studies the extent of spatio-temporal variation of mmWave channels in three largest cities of Australia by investigating the hourly air quality and weather data over 12 months. The study finds that mmWave channels experience significant variation in both space and time domains, which causes undesirable network capacity fluctuation in various places and hours. For the multi-antenna communication, the study yields a new theoretical discovery that the Multiple-Input and Multiple-Output (MIMO) capacity can be significantly influenced by atmosphere molecules. In more detail, some common atmosphere molecules, such as Oxygen and water, can absorb and re-radiate energy in their natural resonant frequencies, such as 60 GHz, 120GHz and 180 GHz, which belong to the mmWave spectrum. Such phenomenon can provide equivalent Non-Line-of-Sight (NLoS) paths in an environment that is dominated by Line-of-Sight (LoS) transmissions, and thus greatly improve the spatial multiplexing and diversity of a MIMO mmWave system. Finally, the performance of two main MIMO techniques, beamforming and multiplexing, in the terahertz band is studied. Our results reveal a surprising observation that the MIMO multiplexing could be a better choice than the MIMO beamforming under certain conditions in multiple THz bands. We believe that our findings will open the door for a new direction of research and development toward the feasibility of communication in mmWave and THz spectrum.