Recovering container class types in C++ binaries

We present Tiara, a novel approach to recovering container classes in C++ binaries. Given a variable address in a C++ binary, Tiara first applies a new type-relevant slicing algorithm incorporated with a decay function, Tslice, to obtain an inter-procedural forward slice of instructions expressed as a CFG to summarize how the variable is used in the binary (as our primary contribution). Tiara then makes use of a GCN (Graph Convolutional Network) to learn and predict the container type for the variable (as our secondary contribution). According to our evaluation, Tiara can advance the state of the art in inferring commonly used container types in a set of eight large real-world COTS C++ binaries efficiently (in terms of the overall analysis time) and effectively (in terms of precision, recall and F1 score).


I. INTRODUCTION
Binary type inference aims to recognize typed variables from untyped memory locations in binary executables [1].This has many applications, such as binary code reuse [2], [3], reverse engineering [4], [5], vulnerability detection [4], [6], and memory forensics [7].For most COTS binaries, neither source code nor debugging information is available.With binary type inference, it is possible, albeit extremely challenging, to recover semantic information from their binaries.Problem Statement.We address the problem of recovering the container class type of a given variable address in a C++ binary statically.For C++ programs, the C++ STL provides a set of template classes for implementing standard data structures such as linked lists, vectors, and maps.C++ templates support compile-time polymorphism instead of runtime polymorphism, enabling C++ programs to reuse template classes without the overhead of run-time performance (incurred for resolving virtual functions).As a result, templates are widely used in implementing container classes in modern C++ programs [8].Discovering container classes in binaries leads to better understanding of C++ executables.Prior Work.To the best of our knowledge, there is no prior work on recovering container classes in C++ binaries statically.Some past efforts identify certain data structures in C/C++ binaries dynamically [9]- [12].Some other past efforts infer ordinary classes in C++ binaries statically [13]- [19], but they are inapplicable in our setting, since they rely on their virtual function tables (for polymorphic classes), their member functions, the "this" pointer, and their data The last two authors are the corresponding authors of this paper.l.push_back (10); 00071164 mov esi,dword ptr [l (074004h)] 0007116A lea eax, [argn] ... 00071179 call std::_List_buy<int>::_Buynode<int>( ... 00071192 call dword ptr [_Xlength_error (073034h)] 00071198 inc ecx v.push_back (20); 00071199 mov daword ptr [ebp+8],14h l.push_back (10); 000711A0 mov dword ptr ds:[74408h],ecx ... 000711AC mov dword ptr [eax],edx v.push_back (20); 000711AE lea eax, [ebp+8] ... std::list<int> l; std::vector<int> v; l.push_back (10); v.push_back (20); sizes.OOANALYZER [20], which solves a different type inference problem for C++ binaries, distinguishes different ordinary classes by solving constraints in terms of code usage patterns.In the presence of container classes, which have different type-dependent instantiations, OOANALYZER can only discover that these instantiations represent different but unknown classes, but TIARA recognize, for example, which are std::list and which are std::vector.For C binaries, DEBIN [21] represents the state of the art, but it focuses on identifying primitive types (by applying machine learning techniques).As for compound types, DEBIN can only classify all different types of structs as one non-primitive type.
Challenges.There are several challenges in inferring container classes in C++ binaries statically.First, C++ containers embrace compile-time polymorphism (without resorting to virtual member functions).Therefore, it is not possible to identify container classes by looking for their virtual function tables (VFTs) in binaries, as is done for polymorphic classes [14], [17].Second, the C++ compiler often applies function inlining to improve performance.This has two consequences: (1) multiple copies of a member function may co-exist in the binary, and (2) the instructions of the functions from different container classes may be interleaved.In Figure 1, l.push_back() and v.push_back() are inlined, with their inlined code sequences mixed together.Therefore, it is difficult to identify container classes based on their member  Fig. 2.An example illustrating how TIARA predicts the container type of an address v 0 , which is actually the address of variable l of type std::list in Figure 1.The instructions in gray are translated from l.push_back (10) and the remaining ones from v.push_back (20) in Figure 1.The instructions marked with a T in (a), i.e., those in (b) are in the slice of v 0 found.functions and the "this" pointer, as for polymorphic classes [15].Third, for a variable of a container type T , the data size of T depends on its type parameters.If T is T * instead, the data size of T * is fixed but no size information about T is revealed.Thus, it is also difficult to identify the container type of a variable from its data size.Finally, for large COTS C++ binaries such as clang (57MB stripped) containing more than 100K variable addresses, how to infer their container classes efficiently and accurately is non-trivial.
Our Solution.We present TIARA, a novel approach to recovering the container types of the variables in stripped COTS C++ binaries statically.Our key insight is that variables of different container types exhibit different behaviors in terms of how they are used.For example, std::vector and std::list share an identically-named member function, push_back(), but the former performs heap reallocation (via malloc() and free()) internally while the latter performs only allocation (via malloc()).Thus, the variables of these two types can be identified by exploiting such userelated features.
Therefore, TIARA infers the container type of an address representing a variable in a binary in two stages.In the first stage (as our primary contribution), we apply a new typerelevant slicing algorithm incorporated with a decay function, TSLICE, to obtain an inter-procedural forward slice of instructions context-sensitively, expressed as a CFG to summarize how the variable is used.For large binary programs, traditional slicing algorithms are unscalable or "very imprecise, often including essentially the entire program" [22].Thus, TSLICE is designed to find a small yet relevant slice to capture where and how the variable is used both efficiently (0.2 seconds per slice, on average) and effectively (with 50 instructions per slice, on average, that are sufficient to identify its container type in most cases).In the second stage (as our secondary contribution), we make use of a GCN (Graph Convolutional Network) [23], [24] to predict the container type for the given variable (based on a learned classifier).

Contributions.
TIARA (https://sites.google.com/view/tiara-tool) represents a general-purpose approach to recovering the container classes in C++ binaries statically.TIARA is expected to provide benefits for many other binary type inference tasks mentioned earlier.
1) We present TIARA, a novel type inference approach to recovering container types in COTS C++ binaries.2) We offer a type-relevant slicer, TSLICE, which extracts just enough relevant instructions for a variable in C++ binaries in order to characterize its container type.3) We design a GCN-based classifier to learn and predict the container types of the variables in C++ binaries, where data labeling is fully automatic.4) We show that TIARA advances the state of the art in inferring STL container types in eight real-world COTS C++ binaries efficiently (in terms of analysis times) and effectively (in terms of precision, recall and F1 score).The rest of the paper is organized as follows.Section II, motivates TIARA with an example.Section III, introduces TIARA.Section IV evaluates TIARA.Section V discusses the related work.Finally, Section VI concludes the paper.

II. MOTIVATION
We illustrate the basic idea behind TIARA by using a binary program in Figure 2, translated from the code snippet in Figure 1.Given v 0 = 074404h, which represents the address of variable l of type std::list at the source code, we describe how TIARA predicts its type at the binary level.TIARA works by first finding a type-dependent slice starting from v 0 , as shown in Figure 2(a) (Section II-A) and then using a machine-learning-based type classifier to infer its type, as shown in Figure 2(b) (Section II-B).In this work, we focus on inferring container types and will thus treat all the primitive types as one single primitive type (non-discriminately).

A. Type-Relevant Slicing
We explain first how TSLICE works and then how we address the efficiency and precision challenges in slicing binary code to recover the container types in COTS C++ binaries.
In Figure 2, its instructions are translated from Figure 1, where l.push_back (10) and v.push_back (20) are inlined and interleaved.I 0 − I 14 and I 16 − I 19 are from l.push_back (10) and the rest from v.push_back (20).We can divide the instructions for l.push_back (10)  Given v 0 as a slicing criterion, Figure 2(a) illustrates the slicing process.Column "Disassembly" lists the instructions in the assembly format.Column "Tracing" shows the dependence-based analysis for each instruction along the control flow, attempting to establish its dependence with v 0 context-sensitively.Column "Rules" gives the rules used.Column "Faith" indicates the amount of faith we have on an instruction being v 0 -dependent (calculated by a decay function along the control flow).Finally, Column "Dep" indicates whether an instruction actually depends on v 0 or not.
Starting from v 0 , TSLICE computes context-sensitively an inter-procedural forward slice of v 0 , S v0 , by searching for the instructions that depend on v 0 , i.e., operate on values derived from v 0 .TSLICE starts from I 0 , the first instruction operating on v 0 = 074404h, and searches for some v 0 -dependent instructions forwards.As I 0 moves * v 0 to register esi, we can assert that (1) I 0 depends on v 0 , and (2) any future instruction that uses esi will also depend on v 0 (if not killed).
where c is a constant), such as I 4 , I 9 and I 14 , by an v 0 -dependent but unknown value, (other, * ), and (3) using a decay function to decease the faith or likelihood of an instruction's dependence on v 0 along the control flow.
For C++ container types, our key insight is that the slice S v0 that contains the v 0 -dependent instructions captured via the register and stack dependences according to our inference rules is sufficient to predict its type in most cases (Section IV).This is in sharp contrast to the state-of-the-art (sound) slicing algorithms for binary codes, which are either unscalable or imprecise [22] (Section I).
As a result, TSLICE is both efficient (by finding typerelevant slices in seconds each) and effective (by enabling predicting the C++ container types of variables accurately).To the best of our knowledge, this is the first paper for recovering C++ container types in real-world stripped COTS C++ binaries.

B. Type Classification
Once a slice S v0 (represented as a CFG) has been found for v 0 , TIARA will make use of a GCN-based classifier (for the first time), which is pre-trained with automatically labeled data, to predict the type of v 0 , as illustrated in Figure 2(b).

Type-Relevant Slicing Type Classification
Training Binary

III. DESIGN OF TIARA
TIARA proceeds in two stages (Figure 3).In the typerelevant slicing stage (Section III-A), an inter-procedural forward slice for a a variable address is found context-sensitively.In the type classification stage (Section III-B), a GCN-based classifier is used to predict the type for the given variable.

A. Type-Relevant Slicing
To infer the type of a variable at a given address v 0 in a C++ binary program, TIARA will construct a small forward slice that reflects the behavior of v 0 by using v 0 as the slicing criterion.To reason about the data dependences in the binary program, TIARA will disassemble it into an intermediate representation (IR).We introduce our slicing algorithm for a small language, where an instruction I has the form: (1) loc := addr | addr + c addr := r | m A mov instruction moves a value from opr 2 to opr 1 .An op ⊕ instruction represents a binary arithmetic operation such as add or mul.For example, "op + opr 1 , opr 2 " adds opr 1 and opr 2 and then stores (i.e., moves) the result to opr 1 .A use instruction represents any instruction (e.g., jmp) that reads the given operands without any side effect.An operand opr can be either a constant c, a reference to a location loc, or an indirect reference to loc.A location loc may be an address addr, or an addr with an offset c.Finally, an address addr may denote either a register r or a memory address m.
In our language, function calls are represented implicitly.A call instruction can be modeled simply as a push followed by a use (i.e., jmp) while a return instruction is modeled as a pop followed by a use (i.e., jmp).As a result, we can use one single CFG, G = (I, E), to represent a binary program, where I = {I 0 , I 1 , ..., I n } is the set of instructions and E ⊆ I × I is the set of edges.If (I p , I s ) ∈ E, I s may be executed right after I p .Let I 0 be the entry of the program.For a variable v 0 , TSLICE aims to find a forward slice, i.e., a subset S v0 ⊆ I.
TSLICE builds S v0 , starting from I 0 (as any instruction may operate on v 0 ), by finding the v 0 -dependent instructions along the control flow through reasoning about data dependences.To keep S v0 small, TSLICE will keep track of only approximately 01 if i is in an indirect addressing mode 0.005 elif i is push or pop 0.001 otherwise 6 foreach j ∈ I such that (I0, j) ∈ E do CompDependences(I0, j); procedure CompDependences(pre : I, i : I) if D(pre) = true and F(pre) = 0 then return; 9 Update (V(i), S(i), D(i)) by the rules in Figure 4; 10 CompDependences(i, j); end Algorithm 1: Finding a forward slice.the v 0 -dependent heap values that are obtained by arithmetic operations on * (v 0 + c), where c is a constant.As the majority of template-related values (such as iterators) are already allocated in registers and the call stack by modern C++ compilers, S v0 will usually still contain the relevant instructions for the type of v 0 to be deduced.
We make use of four functions to reason about data dependences.The function V : I → (R → 2 A ) records all possible values in a register after an instruction i ∈ I has been executed, where R is the set of registers and A = {ptr, ref, const} × Z ∪ {(other, * )} denotes the set of all possible values that TSLICE may care about.If (ptr, c) ∈ V(i)(r), register r may contain a pointer to (v 0 + c) after i has been executed.Similarly, (ref, c) means that r may contain the value stored in v 0 + c (aka.* (v 0 + c)), (const, c) represents a constant c, and (other, * ) denotes a v 0 -dependent but unknown value.The function S : I → (Z → 2 A ) reveals the set of stack values at an offset z ∈ Z from f p ∈ R after an instruction i ∈ I has been executed.In order to track the inter-procedural data flow, TSLICE monitors the frame pointer register f p ∈ R and the stack pointer register sp ∈ R (i.e., ebp and esp in x86) to update S. The function D : I → {true, f alse} indicates whether an instruction is data-dependent on v 0 .The function F : I → [0, 1] estimates the faith, i.e., the likelihood of an instruction i's dependence on v 0 .If F(i) = 1, TSLICE is fully confident that i depends on v 0 .If F(i) = 0, TSLICE believes that i does not depend on v 0 at all.When computing F(i), TSLICE may opt to decay F(i), as it only needs to find a decent number of instructions to capture the behavior of v 0 .
As depicted in Algorithm 1, TSLICE starts from I 0 (lines 1 -5) and invokes CompDependences() recursively to update V, S, D, and F for each reachable instruction (line 6).After all the dependent instructions have been found, S v0 is obtained Fig. 4. Rules for updating V(i), S(i) and D(i) based on V(pre), S(pre) and D(pre) at instruction i.
as desired (line 7).When CompDependences() is called, i is an instruction and pre is one of its predecessors.If F(pre) (the faith of pre) has decayed to 0, CompDependences() simply returns (line 8).Otherwise, V(i), S(i), and D(i) are updated according to the rules in Figure 4 (line 9) and F(i) is updated (line 10), as explained below.If V, S or D has changed after i has been analyzed, CompDependences() is invoked recursively to update the dependence for each successor of i (line 12).Otherwise, CompDependences() returns (line 11).
In TSLICE, we use the faith function F and a decay function, Decay : I → N, to find a small slice with relevant instructions quickly.Initially, if an instruction i is found to depend on v 0 , we are fully confident about the dependence, since F(i) = 1 (line 4).Every time when we descend to i from one of its predecessor instructions, pre, F(i) is decayed according to line 10 to ensure that F(i) drops monotonically (since Decay(i) > 0) and quickly (due to the use of min).As a result, no successor instructions of pre will ever be visited once F(pre) = 0 (line 8), since we are now fully confident about their independence on v 0 .To define Decay, we use a linear decay function (line 5).When an instruction i is visited (line 10), we decrease our confidence about its dependence on v 0 by 0.001, in general.However, our decrement will be 0.005 for each push or pop instruction and 0.01 for each indirect addressing instruction, as we become less and less confident about the dependence of i on v 0 .These heuristically tuned parameters work well in practice (Section IV).Of course, other more sophisticated decay functions can also be used.
Let us explain the notations used in our inference rules (Figure 4) for updating (V(i), S(i), D(i)) based on (V(pre), S(pre), D(pre)) given.Given an evaluation environment consisting of (1) both Γ = (V(pre), S(pre), D(pre), V(i), S(i), D(i)) and (2) i is an instruction, each rule gives the updated (V(i), S(i), D(i)) in the conclusion under its given premises.Given a function F ∈ {V(i), S(i), D}, F [x → n] represents the same function F except that F (x) has been updated as F (x) = n.In addition, we also make use of the following auxiliary function to test if i depends on v 0 : By convention, sp (f p) is the stack (frame) pointer register.
Let us examine the rules given in Figure 4, which are applied in line 9 of Algorithm 1. Two points are in order.
• We distinguish f p and sp from every other register r / ∈ {f p, sp} so that V(i)(f p) and V(i)(sp) are always strongly updated and V(i)(r) is always weakly updated except in [MOV-RV-KILL], [MOV-RIV-KILL] and [MOV-RC-KILL].This design decision allows us to keep track of only one stack frame instead of multiple stack frames when analyzing a call in order to achieve efficiency at some slight loss of precision for the slice obtained.In practice, most COTS C++ programs are compiled without turning on a so-called frame-pointer omission flag, which can be checked easily.Consider the binaries on x86 produced by the Microsoft Visual C++ compiler for Windows.If a function's prologue and epilogue are of the form "push sp; mov fp,sp" and "leave; ret", respectively, then its /Oy (framepointer omission) flag is off.If we see something like "sub sp, ..." and "add sp, ...; ret", then /Oy is on.When /Oy is on (causing f p to be used as a general register), we can modify each rule by simply changing r ∈ {f p, sp} (if it exists) to r ∈ {sp} in its premises.
• When reaching a call instruction (flagged by IDA Pro [25] as discussed in Section IV), we record the address of its ensuing instruction as a return address.As discussed earlier, we handle a call as a push and a use (i.e., jmp) in our formalism.When reaching a return instruction (modeled as a pop and a use, i.e., jmp) in analyzing the function called, we will continue to analyze the instruction marked by the previously recorded return address.Thus, TSLICE finds an inter-procedural slice context-sensitively.
We go through our four groups of rules, [MOV-*], [OP-*], [STK-*] and [USE-*], for handling four types of instructions in our language.In our rules, S(i) is only updated in [MOV-SR], [OP-SR], as well as [STK-*]s, D(i) is updated in the majority of the rules similarly except that i is checked to see if it becomes now dependent on v 0 (if it is not before), and V(i) is updated also in the majority of the rules but differently, depending on the nature of instruction i being analyzed.Therefore, we focus naturally more on describing how V(i) is updated (i.e., how each register is updated) below.
• mov opr 1 , opr ∈ {f p, sp} may now also contain a constant c. [MOV-RC-1] behaves similarly for r ∈ {f p, sp} except that a strong update to V(i)(r) is performed.[MOV-FP] handles an assignment of sp to f p while [MOV-SP] handles an assignment of f p to sp.In both cases, a strong update is performed.[MOV-DR] says that an instruction that writes into a v 0 -dependent address depends on v 0 .Finally, [MOV-RV-KILL], [MOV-RIV-KILL] and [MOV-RC-KILL] perform a strong update to r ∈ {sp, f p} as described.
• op ⊕ opr k , opr There is no rule for mov [f p+c ′ ], c, as it can be modeled as a sequence of two instructions, mov r, c and mov [f p + c ′ ], r.

B. Type Classification
Given a slice S v0 , we have designed a GCN-based classifier to infer its type.In Section III-B1, we describe how to encode each node (i.e., instruction) in a slice with a feature vector in order to find a feature vector representation for the entire slice.In Section III-B2, we introduce a GCN-based classifier.
1) Encoding Instructions with Feature Vectors: To apply a GCN [23], [24] to turn a slice into a graph representation characterized by a feature vector, we first encode each node in the slice, which contains instruction i, as a 42-dimensional feature vector, according to the following seven features:  opcodes.Since the opcodes with similar semantics are close together (e.g., push/pushaw/pusha assigned with 143/144/ 145), our opcode representation is preferred.• F 3 i (1st operand) and F 4 i (2nd operand): A 13-bit one-hot encoding of the operand type of every such an operand in i.There are 13 operand types: nil operand, register, direct memory reference, memory reference with base and index registers, memory reference with base and index registers plus a displacement, immediate value, immediate far address, immediate near address, and five additional processor-specific types provided by IDA Pro [25].
• F 5 i : Whether i calls a heap allocation function (e.g., malloc()) directly or indirectly or not (along a call chain).
• F 6 i : Whether i calls a heap free function (e.g., free()) or not.
• F 7 i : The number of levels of pointer indirections for using v 0 in i represented by an integer (as i depends on v 0 ). Figure 5 illustrates how a call instruction (identified by IDA Pro [25] as discussed in Section IV) is encoded.
2) A GCN-based Classifier: We first describe how to obtain a GCN-based classifier during a training process.We then discuss how to use it to predict the type of a variable.
Training.Let T = {t 1 , t 2 , . . ., t n , t primitive } be a set of types to recover from C++ binaries, where t 1 , t 2 , . . ., t n are candidate container types and t primitive represents any of all the possible primitive types (which are not distinguished as discussed in Section I).Let G = {G 1 , G 2 . . ., G m } be a set of slices with each slice of a variable address being labeled by a unique type t ∈ T , implying that the variable is of type t or a pointer to t (with one or more levels of indirections).Our objective is to learn a classifier to predict the type of G i .
We have designed a GCN [23] to learn a graph representation h G of a slice G in a message-passing manner.Each node v ∈ G starts with X v encoded as per Section III-B1: We then perform k iterations of aggregation along the edges in G.We update the representation of a node by aggregating its representation with those of its predecessor neighboring nodes by using the element-wise mean pooling mechanism: where N (v) gives the set of predecessor nodes of v in G.
At the end of the k-th iteration, we use a simple readout function to obtain (with V being the set of nodes in G): Finally, we connect the output h G to a linear transformation layer and apply softmax to estimate the probabilities for G to have different types in T .Thus, we can choose the predicted type, ŷG , as the one with the largest probability: Note that W L above and W k given in ( 4) are the parameters in the prediction model learned during the training process.
Prediction.Given a variable address, the slice found by TSLICE will be fed into our classifier to predict its type.

IV. EVALUATION
TIARA is the first to infer container types in COTS C++ binaries (Section I).Due to the lack of earlier tools to compare with, we focus on demonstrating that TIARA is effective in recovering container types by answering four RQs: • RQ1: Is TIARA effective in identifying container types by restricting training and testing to the same project?• RQ2: Is TIARA effective in identifying container types by performing training and testing in different projects?• RQ3: Is TIARA more effective when compared with a version of TIARA that uses a simple slicer (due to the lack of an open-source slicer for C++ binaries)?• RQ4: Is TIARA efficient in slicing and training?COTS C++ Binaries.We consider the binaries in the Microsoft Portable Executable (PE) format targeting x86, which are generated by using the Microsoft Visual C++ 15 2017 toolchain (abbreviated to MSVC).For a binary, we use IDA Pro [25] to disassemble it and find the information required by TIARA.In type-relevant slicing, we find its entry point I 0 as needed.For type classification, we need to encode each instruction in terms of a feature vector (Section III-B1).We use IDA Pro to find the functions calling malloc() and free() (possibly indirectly).When it fails to provide the information of a particular feature for an instruction, the default 0 is used.
Benchmarks.We consider eight programs from eight projects (Table I), which mostly include more than one executable.We have selected only one program from each project in order to prevent code duplications due to reasons like static linking.We have compiled these programs with the "release mode" settings, by enabling the most aggressive optimization (/O2), to simulate how COTS programs are compiled when released.
For the eight programs selected, Table I gives their binary sizes and the number of variable addresses having type t or a pointer type to t (with one or more levels of indirections), where t is one of the four types considered, std::list, std::vector, std::map, and primitive (representing all possible primitive types).We have selected std::list, std::vector and std::map, since they are, respectively, the representatives of non-contiguous sequential, contiguous sequential, and associative containers, the three common STL container categories in C++ programs.The first seven are the popular C++ programs from GitHub.However, std::list is not as frequently used as the other two due to the reasons explained by Stroustrup [26].As the first seven programs contain relatively few variables of type std::list, we have added an eighth program, called list_extension, which contains a few list-related code snippets taken directly from a Microsoft documentation [27], resulting in an increase of std::list-related variables by 33%.
For a binary, we find its addresses representing variables by using Microsoft Debug Interface Access SDK [28].We then apply TIARA to predict their types.For a COTS binary without debugging information, its variable addresses must be detected orthogonally.However, finding such addresses is much less challenging than finding their types [29].
To summarize, we use std::vector, std::list, and std::map.to measure the effectiveness of TIARA.T = {t list , t vector , t map , t primitive } is the set of type labels used.
Training.The GCN used in TIARA is made up of two graph convolutional layers of size 64 each.This GCN is implemented in terms of Deep Graph Library [30] with PyTorch [31] as the back-end.We train it by using the Adam algorithm [32] as the optimizer with the cross-entropy loss function.We have used a learning rate of 0.001 and an epoch size of 300.In TIARA, data labeling is fully automatic.When labeling a slice constructed for a variable, we use Microsoft Debug Interface Access SDK [28] to find its type automatically.
When evaluating TIARA in recovering container types within a project (RQ1) and across the projects (RQ2), we will discuss how their training and testing programs are selected.
Metrics.We evaluate TIARA by considering three metrics: precision, recall and F1 score.Precision is the percentage of variables with a correctly inferred type among all the variables that are inferred to have that type.Recall is the percentage of variables with a correctly inferred type among all the variables that actually have that type.Finally, F1 score is the harmonic mean of Precision and Recall.We evaluate the efficiency of TIARA by considering its slicing and training times.
Computing Platforms.TIARA has two stages (Figure 3).Its type-relevant slicing stage (TSLICE) runs on a Windows 10 desktop containing an Intel Core i9-10900X CPU of 3.70 GHz with 64G memory.Its type classification stage runs on a Ubuntu server containing two Intel Xeon CPUs of 2.6 GHz with 128G memory, accelerated by a 16G Tesla P100 GPU.
Results.Table II gives the results for addressing RQ1 -RQ3 (with each row representing one independent experiment to be explained when we discuss these RQs).In addition, Table III contains some additional results for addressing RQ3 only.Finally, Table IV gives the results for addressing RQ4.

A. RQ1: Intra-Project Type Prediction
To address RQ1, we report and analyze the results of five independent experiments listed in five rows marked as I1a -I5a in Table II.In each experiment, the project(s) considered are given.In each experiment, the data, which is given as a set of variable addresses with their associated types in T = {t list , t vector , t map , t primitive } (Table I), is divided into a training set and a testing set.The ratio of training over testing samples is 4 : 1 (with both randomly selected).
According to the results reported in I1a -I5a of Table II, TIARA is highly effective, achieving the precision (ranging from 0.86 to 1.00) with an average of 0.94, the recall (ranging from 0.79 to 1.00) with an average of 0.89, and the F1 scores (ranging from 0.88 to 0.98) with an average of 0.91 across all the variables of the four types in the five experiments.TIARA's effectiveness is also revealed by the average precision, recall and F1 score both across the four types for an experiment and across the five experiments for a given type.
Several observations are in order.First, TIARA can identify all the three container types equally well (as reflected by the average precision, recall and F1 score for each container type across the five experiments).Second, TIARA achieves a precision or recall of 1.0 in five cases when its corresponding samples are small (Table I).This explains why TIARA is more effective in I4a -I5a than in I1a -I3a for std::list.Third, TIARA can recover primitive types well, due to (1) a relatively large number of samples available for primitives (Table I), (2) relatively smaller slices found (Table III), and (3) the fact that different primitive types are not distinguished.Finally, TIARA loses some precision since the compiler may optimize variables of different types that do not have overlapping scopes to share the same stack slot, i.e., the same binary address.

B. RQ2: Cross-Project Type Prediction
To address RQ2, we report and analyze the results of four independent experiments listed in four rows marked as C6a -C9a in Table II.In each experiment, we simulate realworld inference scenarios by performing training in one set   of projects but testing in all the remaining ones given in our benchmark suite (Table I).For example, in row C7a, clang is the testing project and all-clang indicates that the remaining projects are used as the training programs.
According to the results in Table II, TIARA is effective, achieving the precision (ranging from 0.61 to 0.95) with an average of 0.84, the recall (ranging from 0.59 to 0.97) with an average of 0.79, and the F1 scores (ranging from 0.60 to 0.95) with an average of 0.80 across all the variables of all the four types in the four experiments.
When comparing C6a -C9a (for cross-project type recovery) with I1a -I5a (intra-project type recovery) in Table II, we find that TIARA is only slightly less effective, with the average precision, recall and F1 score dropping from 0.94, 0.89 and 0.91 to 0.84, 0.78 and 0.80, respectively (calculated again across all the four types in all their respective experiments).The slight performance degradation is as expected since, for example, different coding styles and conventions in different projects will lead to different program behaviors in their binaries.Given this, TIARA is considered to be effective in recovering container types in real-world COTS binaries.

C. RQ3: Comparing with the State of the Art
To the best of our knowledge, TIARA is the first tool for recovering container types in C++ binaries.To evaluate TIARA against the state of the art, we compare TIARA with a version of TIARA in which a different slicing algorithm is used.However, we are not aware of any open-source tool for computing inter-procedural slices in COTS binaries.BEST [33], which targets PowerPC binaries, is designed to estimate statically the WCET (Worst-Case Execution Times) of a program and is thus limited to single-function programs.
Therefore, we have decided to compare TIARA with a version of TIARA, denoted TIARA SSLICE , where TSLICE has been replaced by a simple slicer, named SSLICE.Given a variable address v 0 , SSLICE produces a slice consisting of all the instructions in the function that contains the first access to v 0 and all the instructions in its directly called functions.This comparison is reasonable, as existing scalable binary slicers are very imprecise, often producing a slice that includes nearly the entire program [22].Table III compares the average slice sizes obtained by TSLICE and SSLICE in terms of average node and edge counts.For each type in T = {t list , t vector , t map , t primitive }, the number of slices produced by each slicer is the same as the number of variable addresses in Table I.TSLICE is lightweight, producing one slice in 0.2 seconds, on average.The slices found by TSLICE are one order (two orders) of magnitude smaller than those found by SSLICE for a container (primitive) type, one average.In Table II, we have compared TIARA (rows I1a -I5a and C6a -C9a) with TIARA SSLICE (rows I1b -I5b and C6b -C9b).Except for a few cases highlighted by red boxes, TIARA SSLICE is substantially less effective than TIARA, resulting in significantly lower average precision, recall and F1 score (both across the different types for an experiment and across the different experiments for a given type).
These results show that, given a variable address, the slice produced by TSLICE, while substantially smaller than that from SSLICE (in general), contains still enough type-relevant instructions for characterizing its type (Figure 2).Consider the extreme but illuminating case reported in C7a and C7b of Table II when inferring std::list from the 89 variables in clang by training with the 100 std::list-related samples in all-clang (Table I).As the ratio of training over testing samples is low, the impact of the imprecision of SSLICE on the effectiveness of TIARA SSLICE is maximally exposed.TIARA SSLICE failed to make any correct prediction, producing the following classification for the 89 variables in clang: 53 of std::vector, 3 of std::map, and 33 of primitives.In contrast, TIARA has achieved a precision, recall and F1 score of 0.85, 0.73, and 0.79, respectively, predicting correctly 65 out of the 89 variables are typed std::list.

D. RQ4: Efficiency
TIARA, shown in Table IV, is highly efficient.The average slicing time per intra-project (cross-project) experiment is 2 hours (10 hours, identically for each cross-project experiment as all programs are involved).The average training time per intra-project (cross-project) experiment is 5.1 (20.5) minutes.On the other hand, TIARA SSLICE , which is substantially less effective in recovering container types than TIARA (Table II), is even slightly slower overall.The average slicing time per intra-project (cross-project) experiment is 1.7 hours (8.6 hours).However, due to the larger slices produced, its average training time per intra-project (cross-project) experiment is now much longer, reaching 11.4 (94.5) minutes.

V. RELATED WORK
We review both rule-and machine-learning-based approaches for performing binary analysis in the literature.Rule-based Binary Type Inference.While there are many previous efforts [34]- [36] on inferring primitive types, recursive types, polymorphic types, variables, and function prototypes from binaries, we review only a few related ones on class type recovery.In C++ binaries, C++ classes may leave some class-specific clues such as the "this" pointer and virtual function tables (VFTs).SmartDec [13] exploits the "this" pointer to construct class hierarchies from binaries.vfGuard [37] and VTint [38] reconstruct VFTs by using the dynamic dispatch mechanism in C++.The run-time type information (RTTI) in C++ has also been leveraged [17] to find class hierarchies and member functions/variables.OBJDIGGER [15] recovers objects by tracking the "this" pointer, by combining symbolic execution and inter-procedural data flow analysis.
C++ templates support compile-time polymorphism instead of run-time polymorphism.Existing efforts [13]- [19] for recovering polymorphic classes in C++ binaries cannot be applied to recover container classes (Section I).
OOANALYZER [20] recovers ordinary classes from C++ binaries by using a Prolog-based reasoning system.It distinguishes class types by distinguishing their related methods.However, as each method of a container type has different type-dependent instantiations in binaries, OOANALYZER can only recognize these instantiations as belonging to different classes without actually knowing what their types are.
TIARA (as proposed here) relies on a new type-relevant slicing algorithm for finding a small slice from a variable address in C++ binaries to predict its type.According to a recent study [22], the slices computed by state-of-the-art techniques for binaries are very imprecise.The only opensource slicer that we are aware of works only for PowerPC binaries, limited to small single-function programs only [33].Machine-Learning-based Binary Analysis.Machine learning techniques are increasingly being used in reverse engineering and binary analysis.There are efforts on recovering partial source-level information in binaries, including functions [39]- [41], coding style and programmers' names [42], and their toolchains used [43].EKLAVYA [44] utilizes an RNN to identify function signatures from binaries.These approaches tend to learn properties from blocks of binary codes, while TIARA aims to predict the type for a single address.
Katz et al. [18] use a Markov model to predict types from their object tracelets.Katz et al. [19] propose a variable-order Markov model to recover the class hierarchies from binaries.These techniques rely on the "this" pointer and VFTs to extract the function calls related to receiver objects.
DEBIN [21] focuses on predicting the primitive types of variables in COTS binaries compiled from C programs.It transforms binary programs into dependence graphs and trains a conditional random field (CRF) model from the graphs thus obtained.During the inference, the model is used to assign the types to unknown graph nodes to maximize the joint probability.DEBIN applies a global strategy to infer a variable's properties from its relationships with other variables.TypeMiner [45] recovers types in C binaries statically by relying on dependence analysis and a classification of execution traces of data objects.Like DEBIN, however, it focuses on recovering primitive types and does not handle template class types.TIARA, which focuses on inferring container types in C++ binaries, utilizes a local strategy to infer the type of a variable from its own behavior.TIARA is capable of obtaining a high accuracy with a small number of training binaries.

VI. CONCLUSION
In this paper, we introduce an effective approach, TIARA, for identifying container class types from COTS C++ binaries.TIARA consists of a new slicing algorithm for finding a typerelevant slice of a given variable in C++ binaries and a new GCN-based classifier that allows its container type to be predicted.Its type-relevant slicing algorithm can also be used as a stand-alone tool in detecting code clones [46], security vulnerabilities [4], [6], and software bugs [47], [48].
It collects the graphs from the three JSON files and saves them in out.json.
For the intra-program experiments, we split the JSON files into the training and testing parts.For example, to generate the training and test data for I2a, run: To evaluate all pre-trained models, simply run the shell script: # /app/tiara-artifact/eval-all.sh The DGL Library may prompt to ask what valid backend to use.Type pytorch and hit enter to continue.

E. Evaluation and Expected Results
Evaluating this work by using the pre-trained models should reproduce all experimental results given in Table II.Similar results are expected to be obtained if new models are trained.

Fig. 1 .
Fig. 1.The disassembled code for the code snippet given.

1 (
a) Type-relevant slicing (other is an unknown v 0 -dependent value)

Fig. 5 .
Fig. 5.An example for encoding a call instruction.

#
combine.py cmake.jsonlistExtension.json--split --trainout train.json--testout test.jsonThis will read the graphs from the two given JSON files and split them into the training and test data in two separate files.5) Training and Testing: The following command trains a model using A.json, tests it with B.json, and generates a file named model.pt:# train.py-t A.json -v B.json -m model.ptTo evaluate a model, run: # eval.py-f X.json -m model.pt

Table 1 :
The process of slicing for the given std::list variable l in figure ??.Instructions with gray background are identified as dependent on l.
At the end of its analysis, the top three elements in S will be removed (due to its three pop instructions, which match the three push instructions I 2 , I 4 and I 5 ).When I 7 is analyzed, we record ecx → {(ref, 4)}, as I 7 also depends on v 0 (i.e., * (v 0 + 4)).I 9 and I 14 are each found to operate a v 0 -dependent but unknown value, obtained by an arithmetic instruction on * (v 0 +4).I 17 −I 19 serve to adjust the pointers in the underlying std::list to point to the new node created by the call to _Buynode() at I 6 .As TSLICE keeps track of the dependences on v 0 only, L 17 is considered to be dependent on v 0 but I 18 and I 19 are not.Finally, S v0 consists of the instructions given in {I 0 , I 4 -I 7 , I 9 -I 10 , I 14 , I 16 , I 17 }.
0 -dependent values in the heap (obtained by performing arithmetic operations on heap addresses of the form 2. All the mov instructions are handled by the 14 [MOV-*] rules.[MOV-RV] is simple.V(i) is updated to reflect the fact that r may now also contain a pointer to v 0 + c, and consequently, instruction i is now known to depend on v 0 .[MOV-RIV] is similar except that r contains (ref, c), i.e., * (v 0 + c) (known as a reference to v 0 ).[MOV-RR] handles a register move instruction i of the form "mov r 1 , r 2 ", by updating V(i)(r 1 ) with V(i)(r 2 ) and making i dependent on v 0 (if it is not yet) as long as r 2 contains any v 0 -dependent value.Consider [MOV-RI] now.If r 2 contains (ptr, c ′ ), i.e., a pointer to v 0 , r 1 is added with (ref, c + c ′ ), i.e., a reference to v 0 after the update.
If r 2 contains (ref, c ′ ), i.e., a reference to v 0 , r 1 is added with (other, * ).If r 2 contains (other, * ) already, (other, * ) is ignored to reduce the number of irrelevant instructions added to S v0 .[MOV-RS] and [MOV-SR] propagate the value information between a stack slot and a register r in either 2. The binary arithmetic instructions are handled by the seven [OP-*] rules.[OP-RC] is similar to [MOV-RC] except that we must now account for the semantics of ⊕.Similarly, [OP-RC-1] is an analogue of [MOV-RC-1].For an instruction of the form "op ⊕ r 1 , r 2 ", [OP-RR] updates V(i)(r 1 ) only when either r 1 or r 2 contains a constant (in order to keep S v0 small and relevant), and additionally, [OP-RREF] propagates a reference or (other, * ) conservatively as (other, * ) from r 2 to r 1 .[OP-RI] is a (conservative) analogue of [MOV-RI], as it handles an op ⊕ instead of a mov instruction.Finally, [OP-RS] ([OP-SR]) is an analogue of [MOV-RS] ([MOV-SR]), except that it is much more conservative.Let us examine [OP-RS] for handling "op ⊕ r, [f p + c]".If [f p + c] contains a v 0 -dependent value, then r will contain an over-approximation of that value as (other, * ).If [f p+c] contains a constant (const, c), r will not be made to contain also (const, c ′ + c), even if r contains (const, c ′ ), for two reasons.First, (const, c ′ + c) often results in no extra relevant instructions added to S v0 than (const, c ′ ) already does.Second, tracking only relevant constants makes TSLICE lightweight.• push/pop r .These two stack manipulating instructions are straightforward.[STK-PUSH] stores every pushed value at the top stack location and increases sp by 1. [STK-POP] handles the information flow in the opposite direction.• use . . .opr k . . . .All the use instructions are handled by one single rule, [USE-DEP], to update D(i) only.

TABLE I BENCHMARK
STATISTICS FOR THE COTS BINARIES (WITH THE NUMBER OF VARIABLES ADDRESSES OF A PARTICULAR TYPE GIVEN).

TABLE II EXPERIMENTAL
RESULTS FOR RQ1-RQ3.FOR ALL METRICS, LARGER IS BETTER.

TABLE III AVERAGE
SLICE SIZES PRODUCED BY TSLICE AND SSLICE.

TABLE IV THE
EFFICIENCY OF TIARA AND TIARA SSLICE .