Clang-Chimera is a software tool designed to manipulate source code written in C/C++. In particular, it automatically generates software mutants by applying user-defined rules.
Clang-Chimera is part of the Chimera tools, a set of tools for source code mutation.
Clang-Chimera can be employed either as a framwork or as a library, meant that it can be embodied within a software project for having available a source mutation engine or used as a stand-alone software. In both cases, clang-Chimera provides a mechanism to apply mutation operators (or just operators) to source codes, e.g. enabling pre-process actions, such as macro expansion or reformat.
The separation of concerns in clang-Chimera gives the possibility to implement new operators by coding only their inner logic, while their actualization on the source code is completely managed by clang-Chimera.
In clang-Chimera, a mutation operator implements a specific set of transformations (mutators), while a mutator represents a general and reusable transformation. Indeed a mutator expresses global transformation rules, while an operator has a more tight scope, that is it involves functions.
The mechanism behind the definition and application of the mutation operator is based on the exploration of the abstract syntax three (AST) obtained from the source code.
Each mutator is characterized by one or more mutation types: given a specific matching rule, used for identify a pattern over the AST, the mutator is able to mutate the code according to the selected type. Therefore, a single mutator is able to generate more than one mutant, for every matched type.
Each mutator can be defined as First Order Mutator (FOM) or High Order Mutator (HOM). The definition of a FOM produces as many variants as the number of matched rules over the AST, while the HOM accumulates mutations over the code before gives in output the mutated code.
The same definition can be used for the mutation operators. A FOM operator is a simple mutators wrapper, i.e. it contains FOM or HOM mutators. It is clear that HOM mutators of a FOM operator has a scope local to a matched function, so at most a mutant can have mutations accumulated on single functions. Conversely, a HOM operator is used when global mutations have to be applied, implying that it wraps only HOM mutators. It follows that it accumulates the transformations on all the target function before producing a mutated source code.
Executing clang-chimera without any input argument will prompt a short help message, which reports the main flags and options.
Clang-Chimera makes use of Clang-Tool and requires a compilation database. It can be provided to clang-Chimera in two different ways: as last argument after the double dash (--), or as a JSON file named compilation_database.json. We suggest to exploit CMake for automatically get a compilation_database.json file.
For instance, to manually pass compilation commands, run clang-chimera as follows:
$ clang-chimera input.cc -- -I/path/for/include -std=c++11
The additional commands specify where clang-chimera will find source file to resolve dependencies and the dialect used to interpret the file input.cc, in this case the C++11.
The application of mutators to target function is configured through a configuration file, namely function/operators configuration file, formatted as comma separated values (CSV).
The configuration file has to be provided to clang-chimera by means of -funop flag. Here below an example of configuration file:
function1,operatorA,operatorB,operatorC
function2,operatorA,operatorD
The file specifies that the function named function1 has be mutated by means of operators operatorA, operatorB and operatorC, while function2 by means of operatorA and operatorD.
The funop configuration file supports two special keywords, namely CHIMERA_ALL_OPERATORS and CHIMERA_ALL_FUNCTIONS, which avoid to list all the operators or functions whenever they have to be involved at the same time.
CHIMERA_ALL_FUNCTION,operatorA,operatorB
function1,CHIMERA_ALL_OPERATORS
The funop configuration files supports skipping (command) lines by using \\
Here we report a short walk-through for developing a new operator within the clang-Chimera tool. Actually, it is the basic example that you download with the source code.
We target the Relational Operator Replacement mutation, a method/function-level mutator operator, which replaces relational operators with others. For the sake of ease, let us consider only the greater than relational operator (>): we want to mutate it into the less than (<) and less than or equal (≤), namely to mutation types.
Let us assume a single mutation per mutant, which implies to have a FOM mutator/operator. Moreover, we aim to avoid mutations of code which resides into condition expressions of a for statement.
Here the main steps you need to accomplish to implement such described ROR:
The mutator concept encapsulates the modification of a specific part of the AST, i.e. an AST pattern, which is an ensemble of AST nodes. Each mutator has:
mutator class of clang-Chimera uses the API provided by Clang's LibTooling for matching and modify the AST. It is an abstract class, with pure and non-pure virtual methods which have to implement/override. In particular those methods implement the matching and mutation rules.
The matching rules are implemented using two levels, namely a coarse and a fine grained. The only implementation of the coarse grained could turn out to be enough.
The coarse grained matching is based on the ASTMatchers. There are different types of AST matchers and it is mandatory to specify which is its type in order to override the corresponding method. The method is in the form getSpecificTypeMatcher and it must return that specific type of matcher. So a specific ASTMatcher has to be implemented and it will be used to retrieve the correspondent MatchFinder::MatchResult, which is the type of node managed by the mutator methods.
The getStatementMatcher method returns the statement matcher to match the greater than (>) operation
::clang::ast_matchers::StatementMatcher
chimera::examples::MutatorGreaterOpReplacement::getStatementMatcher() {
// It has to match a binary operation with a specific operator name (>). In
// order to retrieve the match, it is necessary to bind a string in this case
// "op".
// But we want to avoid matches in for loops, so in this phase the mather has
// to gather information about the surroundings, i.e. if the binary operation
// is inside a for loop, this is done checking the ancestor. Such condition is
// OPTIONAL, indeed it is used a little trick using anything.
return stmt(binaryOperator(hasOperatorName(">")).bind("greater_op"),
anyOf(hasAncestor(forStmt().bind("forStmt")),
anything() // It must be as last
));
}
The match method implements the fine grained matching rules.
bool chimera::examples::MutatorGreaterOpReplacement::match(
const ::chimera::mutator::NodeType &node) {
// First operation: Retrieve the node
const BinaryOperator *bop =
node.Nodes.getNodeAs("greater_op");
assert(bop && "BinaryOperator is nullptr");
// In order to see if the operation is not part of the condition expression,
// it is simply checked if the operation position is not in the range of such
// expression
SourceRange bopRange = bop->getSourceRange();
// IF a construct has been matched
// If stmt
const ForStmt *forStmt = node.Nodes.getNodeAs("forStmt");
// Check if there is forStmt
if (forStmt != nullptr) {
// Check if it is inside the ExpressionCondition range
if (bopRange.getBegin().getRawEncoding() >=
forStmt->getCond()->getSourceRange().getBegin().getRawEncoding() &&
bopRange.getEnd().getRawEncoding() >=
forStmt->getCond()->getSourceRange().getEnd().getRawEncoding()) {
// The match is invalid, return false
return false;
}
}
// At this point the match is still valid, return true
return true;
}
The mutation rules are implemented using the methods of a Rewriter object.
::clang::Rewriter &chimera::examples::MutatorGreaterOpReplacement::mutate(
const ::chimera::mutator::NodeType &node,
::chimera::mutator::MutatorType type, ::clang::Rewriter &rw) {
// As first operation always retrieve the node
const BinaryOperator *op = node.Nodes.getNodeAs("greater_op");
// Assert a precondition
assert(op != nullptr && "getNodeAs returned a nullptr");
// The rewriter object passed can be used to "mutate" the source code
// In this case there are two mutation types, so the type parameter will
// assume values: 0 and 1.
// Select the correct replacement using the mutation type
std::string opReplacement = "";
switch (type) {
case 0: {
opReplacement = "<"; // First replacement
} break;
case 1: {
opReplacement = "<="; // Second replacement
} break;
default:
llvm_unreachable("Mutation type NOT SUPPORTED!");
break;
}
// Get the left and right hand side of the operation
std::string lhs = rw.getRewrittenText(op->getLHS()->getSourceRange());
std::string rhs = rw.getRewrittenText(op->getRHS()->getSourceRange());
// Replace all the text of the binary operator, substituting the operation
// (which bind lhs and rhs) with the replacement
rw.ReplaceText(op->getSourceRange(), lhs + " " + opReplacement + " " + rhs);
return rw;
}
As the approach introduced by clang-Chimera requires to write new source code, it is inherently released with a mutator testing framework. This testing framework is built upon Google Test.
Testing a mutator includes the test of its matching and mutation rules. While the mutation rules are difficult to automatically test, since there is not an immediate way to provide the testing oracle (i.e. the desired mutated source code), the testing of the matching rules is straightforward. Its simplicity is due to the following assumption: given a source code, each token has a unique location, which can be identified with a line and a column.
So, while it is possible to create an oracle to automatically check the matching rules, for the mutating rules they have to be manually checked, but the framework tries also to ease it.
The following are the steps to follow in order to create a test for a mutator:
A mutator operator is a class which wraps mutators. To make clang-Chimera aware of the new operator, we need to register it using the proper register function. It is registered using a unique pointer (an ::std::unique\_ptr). Typically, it is used a function \texttt{get\textit{OperatorName}Operator} which returns the pointer. Here the definition of the ROR operator:
::std::unique_ptr<::chimera::m_operator::MutationOperator>
chimera::examples::getROROperator() {
::std::unique_ptr<::chimera::m_operator::MutationOperator> Op(
new ::chimera::m_operator::MutationOperator(
"ExampleROROperator", // Operator identifier
"Sample Operator: ROR", // Description
false) // It is NOT a HOM Operator (default:false). It could be
// omitted.
);
// Add mutators to the current operator
Op->addMutator(
::chimera::m_operator::MutatorPtr(new ::chimera::examples::MutatorGreaterOpReplacement()));
// Return the operator
return Op;
}