|
Posted
over 14 years
ago
by
DanielGrunwald
When ILSpy was only two weeks old, I blogged about the decompiler architecture. The basic idea of the decompiler pipeline (IL -> ILAst -> C#) is still valid, but there were several changes in the details, and tons of additions as ILSpy learned
... [More]
about more features in the C# language.
The pipeline has grown a lot - there are now 47 separate steps, while in the middle of February (when the previous architecture post was written), there were only 14.
If you want to follow this post, grab the source code of ILSpy and create a debug build, so that you can take a look at the intermediate steps while I am discussing them. Only debug builds will show all the intermediate steps in the language dropdown.
It's impossible to give a short sample where every intermediate step does something (the sample would have to use every possible C# feature), but the following sample should show what is going on in the most important steps:
static IEnumerable<IEnumerable<char>> Test(List<string> list)
{
foreach (string current in list) {
yield return (from c in current where char.IsUpper(c) || char.IsDigit(c) select char.ToLower(c));
}
yield return new List<char> { 'E', 'N', 'D' };
}
Take this code, compile it, and then decompile it with a debug build of ILSpy, so that you can take a look at the results of the intermediate steps.
Essentially, the decompiler pipeline can be separated into two phases: the first phase works on a tree representation of the IL code - we call this representation the ILAst. The second phase works on C# code, stored in the C# Abstract Syntax Tree provided by the NRefactory library.
ILSpy uses the Mono.Cecil library for reading assembly files. Cecil parses the IL code into a flat list of IL instructions, and also takes care of reading all the metadata. Thus, the decompiler's input is Cecil's object model, giving it approximately the same information as you see when you select 'IL' language in the dropdown.
ILAst
We construct the intermediate representation ILAst. Basically, every IL
instruction becomes one ILAst instruction. The main difference is that
ILAst does not use an implicit evaluation stack, but creates temporary
variables for every write to a stack location. However, the ILAst also supports additional opcodes (called pseudo-opcodes) which are used by various decompiler steps to represent higher-level constructs.
Another difference is that we create a tree structure for try-finally blocks - Cecil just provides us with the exception handler table from the metadata.
Implementation: ILAstBuilder.cs
Variable Splitting
Using data flow analysis, we split up variables where possible.
So if you had "x = 1; x = add(x, 1);", that will become "x_1 = 1; x_2 =
add(x_1, 1)". We do not use SSA form for this (although there's an
unused SSA implementation left over in the codebase), we only split
variables up when this is possible without having to introduce
phi-functions. The goal of this operation is to make compiler-generated variables eligible for inlining.
Implementation: ILAstBuilder.cs
ILAst Optimizations
Dead code removal. We remove unreachable code, because it's impossible to infer any information about the stack usage of unreachable code. Also, obfuscators tend to put invalid IL into unreachable code sections. This actually already happens as part of the ILAst construction, before variable splitting.
Remove redundant code
Delete 'nop' instructions
Delete 'br' instructions that jump directly to the next instruction
Delete 'dup' instructions - since ILAst works with variables for stack locations, we can just read a variable twice, eliminating the 'dup'.
Simplify instruction set for branch instructions
Replaces all conditional branches with 'brtrue'. This works by replacing the 'b*' instructions (branch instructions) with 'brtrue(c*)' (branch if compare instruction returns true). This step makes use the 'LogicNot' pseudo-opcode.The goal simply is to reduce the number of different cases that the following steps have to handle.
Copy propagation. This is a classical compiler optimization; however, ILSpy uses it only for two specific cases:
Any address-loading instruction is copied to its point of use. This ensures that no decompiler-generated variable has a managed reference as type - "ref int v = someVariable;" wouldn't be valid C# code, so we have to instead use "ref someVariable" in the place where "v" is used.
Copies of parameters of the current function are propagated, as long as the parameter is never written to. This mainly exists in order to propagate the "this" parameter, so that the following patterns can detect it more easily.
Dead store removal. If a variable is stored and nobody is there to read it, then was it really written?Originally we removed all such dead stores; but after some users complained about 'missing code', we restricted this optimization to apply only to stack locations. Dead stores to stack locations occur mainly after the removal of 'pop' instructions.
The optimizations are primarily meant to even out the differences between debug and release builds, by optimizing away the stuff that the C# compiler
adds to debug builds.
Implementation: ILAstOptimizer.cs
Inlining
We perform 'inlining' on the ILAst. That is, if instruction N
stores a variable, and instruction N+1 reads it, and there's no other
place using that variable, then we move the definition of the variable
into the next expression.
So "stack0 = local1; stack1 = ldc.i4(1); stack2 = add(stack0,
stack1); local1 = stack2" will become "local1 = add(local1, ldc.i4(1))".
Inlining is the main operation that produces trees from the flat IL.
Implementation: ILInlining.cs
Yield Return
If the method is an iterator (constructs a [CompilerGenerated] type that implements IEnumerator), then we perform the yield-return-transformation.
Implementation: YieldReturnDecompiler.cs
Analysis of higher-level constructs
After inlining, we tend to have a single C# statement in a single ILAst statement. However, some C# expressions compile to a sequence of statements. We now try to detect those constructs, and replace the statement sequence with a single statement using a pseudo-opcode.
We can detect and replace a construct only if it's represented by consecutive statements, so when one construct is nested in another, we first have to process the nested construct before processing the outer construct. Because constructs can be nested arbitrarily, we run all the analyses in a "do { ... } while(modified);" loop. If you select "ILAst (after step X)" in the language dropdown, decompilation will stop after that step in the first loop iteration.
SimplifyShortCircuit: introduces && and || operators.
SimplifyTernaryOperator: introduces ?: operator
SimplifyNullCoalescing: introduces ?? operator
JoinBasicBlocks: The decompiler tries to use the minimal possible number of basic blocks. Some optimizations might remove branches and therefore it is necessary to check whether two consecutive basic blocks can be joined into one after such optimizations. It is important to do this because other optimizations like inlining might not work if the code is split into two basic blocks.
TransformDecimalCtorToConstant: changes invocations of the "new decimal(int lo, int mid, int hi, bool isNegative, byte scale)" constructor into literals.
SimplifyLdObjAndStObj: replaces "ldobj(ldloca(X))" with "ldloc(X)", and similar for other kinds of address-loading instructions.
TransformArrayInitializers: introduces array initializers
TransformObjectInitializers: introduces object and collection initializers
MakeAssignmentExpression: detects when the result of an assignment is used in another expression, and inlines the stloc-instruction accordingly. This is essential for decompiling loops like "while ((line = r.ReadLine()) != null)", as otherwise the loop condition couldn't be represented as a single expression.This step also introduces the 'CompoundAssignment' opcode for C# code like "this.M().Property *= 10;". Only because this step de-duplicates the expression on the left-hand side of the assignment, the "this.M()" method call can be inlined into it.
IntroducePostIncrement: While pre-increments are handled as special case of compound assignments; post-increment expressions need to be handled separately.
InlineVariables2: this performs inlining again, since the steps in the loop might have opened up additional inlining possibilities. The next loop iteration depends on the fact that variables are inlined where possible.
Implementation: ILAstOptimizer.cs, PeepholeTransform.cs, InitializerPeepholeTransform.cs
To get more of an idea of what is going on, consider the collection initializer "new List<char> { 'E', 'N', 'D' }". In the ILAst, this is represented as 5 separate instructions:
stloc(g__initLocal0, newobj(List`1<char>::.ctor))
callvirt(List`1<char>::Add, ldloc(g__initLocal0), ldc.i4(69))
callvirt(List`1<char>::Add, ldloc(g__initLocal0), ldc.i4(78))
callvirt(List`1<char>::Add, ldloc(g__initLocal0), ldc.i4(68))
yieldreturn(ldloc(g__initLocal0))
The collection initializer transformation will change this into:
stloc(g__initLocal0, initcollection(newobj(List`1<char>::.ctor), callvirt(List`1<char>::Add, initializedobject(), ldc.i4(69)), callvirt(List`1<char>::Add, initializedobject(), ldc.i4(78)), callvirt(List`1<char>::Add, initializedobject(), ldc.i4(68))))yieldreturn(ldloc(g__initLocal0))
Now after this transformation, the value g__initLocal0 is written to exactly once, and read from exactly one. This allows us to inline the 'initcollection' expression into the 'yieldreturn' statement, thus combining all of the 5 original statements into a single one.
Loop Detection and Condition Detection
Using control flow analysis (finding dominators and dominance frontiers), we detect loops in the control flow graph. A heuristic on a control flow graph is used to find the most likely loop body.
We also build 'if' statements from the remaining conditional branch instructions.
Implementation: LoopsAndConditions.cs
Goto Removal
Goto statements are removed when they are made redundant by the control flow structures built up in the previous step. Remaining goto statements are converted into 'break;' or 'continue;' statements where possible.
Implementation: GotoRemoval.cs
Reduce If Nesting
We try to re-arrange the if statements to reduce the nesting level. For example, if the end of the then-block is unreachable (e.g. because the then-block ends with 'return;'), we can move the else block below the if statement.
Remove Delegate Initialization
The C# compiler will use static fields (and in some cases also local variables) to cache the delegate instances associated with lambda expressions. This step will remove such caching, which opens up additional inlining opportunities. In fact, we will have to move this step into the big 'while(modified)' loop so that we can correctly handle lambda expressions within object/collection initializers.
Introduce Fixed Statements
.NET implements fixed statements as special 'pinned' local variables. As there isn't any representation for those in C#, we translate them into 'fixed' statements.
Variable Recombination
Split up variables were useful for inlining and some other analyses; but now we don't need them any more. This step simply recombines the variables that we split up earlier.
Type Analysis
Here, finally, comes the semantic analysis. All previous steps
just transformed the IL code. Some were introducing some higher-level
constructs, but those were defined as pseudo-IL-opcodes, which pretty
much just are shorthands for certain IL sequences. Semantic analysis now
figures out whether "ldc.i4(1)" means "1" or "true" or
"StringComparison.CurrentCultureIgnoreCase".
This is formulated as a type inference problem: we determine the
expected type and the actual type for each expression in the ILAst. In case some decompiler-generated variables (for the stack locations) weren't removed by the ILAst transformations, we also need to infer types for those.
Implementation: TypeAnalysis.cs
This concludes our discussion of the first phase of the decompiler pipeline. In the next post, I will describe the translation to C# and the remaining transformations. [Less]
|
|
Posted
over 14 years
ago
by
DanielGrunwald
ILSpy supports LINQ query expression - we added that feature shortly before the M2 release.
Today, I implemented support for decompiling object initializers and fixed some bugs related to deeply nested lambdas. With these two improvements, query
... [More]
expression translation becomes possible in several more cases.
This screenshot shows Luke Hoban's famous LINQ ray-tracer.
Why are queries related to object initializers? Simple: LINQ queries allow only the use of expressions. When an object initializer is decompiled into multiple statements, there's no way to fit those into a "let" or "select" clause, so query expression translation has to abort.
Another issue with this sample was the deep nesting of the compiler-generated lambdas. Once closures are nested more than two levels deep, the C# compiler starts copying the parent-pointer from one closure into its subclosure ("localsZ.localsY = localsX.localsY;"). This case was missing from the lambda decompilation, so some references to the closure classes were left in the decompiled code. This bug has now been fixed, so nested lambdas should decompile correctly.
We're now close to supporting all features in C# 3.0, the only major missing item is expression tree support. So LINQ queries currently decompile into query syntax only if they're compiled into delegates (LINQ-to-Objects, Parallel LINQ), not if they're compiled into expression trees (LINQ-to-SQL etc.). [Less]
|
|
Posted
over 14 years
ago
by
ChristophWille
ILSpy has come a long way since M1 - let me provide you with a quick rundown on new features, starting with the decompilation engine – we now support:
query expressions
yield return (architectural background)
unsafe code
checked/unchecked
... [More]
That list isn’t exhaustive, and we even shipped a separate debugger preview for ILSpy!
Aside from the improvements to the decompilaton engine, you will also find the MEF-based extensibility in this milestone. For documentation and samples please see the source download (or glance over the readme). We are keen on getting feedback before entering the Beta phase for ILSpy. (ILSpy Forum)
You want to remote-control ILSpy? This feature has landed too in M2 – to get a highly technical overview, visit our command line readme.
Now for the bits - the M2 build can be downloaded here:
http://sourceforge.net/projects/sharpdevelop/files/ILSpy/1.0/ILSpy_1.0.0.737_M2_Binaries.zip/download
If you are interested in testing the latest versions, please head over to our build server:
http://build.sharpdevelop.net/BuildArtefacts/#ILSpy [Less]
|
|
Posted
over 14 years
ago
by
Eusebiu
In this blog post I'd like to present a new feature of ILSpy decompiler: the integrated debugger. You can download it from
http://sourceforge.net/projects/sharpdevelop/files/ILSpy/1.0/ILSpy-Debugger-Preview.zip/download
Please note that the ILSpy
... [More]
base functionality is similar to the soon-to-be-released M2 build. However, only use it if you want to test the debugger - if you are only interested in decompilation continue using the standard builds. The update check in this preview does not work, please disregard the request to update.
The debugger engine - Debugger.Core.dll - is basically the same as the one in SharpDevelop IDE - some modifications were made in order to use the latest version of NRefactory. The library that handles the UI stuff (like breakpoints, tooltips, attach to process window) - ILSpy.Debugger.dll - is based on SharpDevelop Debugger.Addin library.
When opening ILSpy, a new menu item is available: Debugger. Under this menu, you will find the following menu-items:
Debug an executable - you will be asked to point to an .NET executable that ILSpy will start debugging
Attach to a running application - you will be asked to point to a running .NET executable that ILSpy will debug
Continue debugging (F5) - will continue the execution of the process
Step into (F11) - will step into the code
Step over (F10) - will step over the code
Steo out (F11) - will step out the code
Detach from running application - will detach the debugger
Remove all breakpoints - will remove all breakpoints
Also, when decompiling a whole type, a margin appears on the left side of editor where breakpoints can be set (just like in SharpDevelop IDE). Setting breakpoints and debugging a single method/property will be implemented in future versions of this feature.
To debug a .NET application, one can use two different options:
Debug an executable
Attach to running application
The main difference of these two (from debugging experience point of view) is that when attaching to a running application that was optimized, the evaluation will not work. Therefore, we recomand to use the Debug an executable option. Using the last option, the evaluation will work for any kind of application.
When debugger is attached and a breakpoint is hit, the same common operations are available: continue, step into, step over, step over and evaluate something. Also, the status is shown in the status bar: Stand by, Running and Debugging.
Beside debugging in C# code, debugging in IL is also supported and breakpoints/current line mark are synchronized.
What is great about the debugger (beside it's existence in ILSpy :) ) is the fact that no PDB files are generated. The IL-C# code-mappings are determined on-demand and used to update the user interface.
TODO list:
set breakpoints in single methods/properties - this is top priority
mixed code (C# - IL) debugging - idea of David
drag and drop current line marker
others - community wishes :)
Known issues:
debugging ASP.NET applications and web services is not working (yet)
If you find any issues on this feature, please let us now in our forum or in our github issues page.
Have fun debugging the decompiled code! :) [Less]
|
|
Posted
over 14 years
ago
by
Eusebiu
Now, the latest build of SharpDevelop v4.1 (starting from build 7383) has a new pad under View->Debug menu: Memory pad.
As the name suggests, you can see the memory associated with the process under debug.
You can:
navigate through memory
... [More]
search a specific memory address
refresh the current memory addresses
change the byte display: 1, 2, 4 bytes
You can see here a screencast of the pad.
Basically, it looks like this.
Have fun! [Less]
|
|
Posted
almost 15 years
ago
by
ChristophWille
Yesterday, Siegfried added support for reading Silverlight resources, namely Expression Blend sample data:
|
|
Posted
almost 15 years
ago
by
DanielGrunwald
This weekend, I worked on decompiling 'yield return' statements. The C# compiler is performing quite a bit magic to make 'yield return' work, and the decompiler must be aware of all this magic and be able to revert it.
After two days of hard work
... [More]
, I'm happy to announce that ILSpy (starting with 1.0.0.528) can now decompile enumerators.
Grab the new ILSpy build while it's hot, or just look at the obligatory screenshot:
If you want to understand the code generated by the compiler, you can disable this new feature in the new 'View > Options' dialog. Or you could read Jon Skeet's great article on this topic: Iterator block implementation details: auto-generated state machines.
Here's the generated MoveNext() code for the SelectMany implementation:
private bool MoveNext()
{
bool flag;
try {
int i = this.$1__state;
if (i == 0) {
this.$1__state = -1;
this.$7__wrap17 = this.source.GetEnumerator();
this.$1__state = 1;
goto IL_B0;
}
if (i != 3) {
goto IL_C6;
}
this.$1__state = 2;
IL_9D:
if (this.$7__wrap19.MoveNext()) {
this.<subElement>5__16 = this.$7__wrap19.Current;
this.$2__current = this.<subElement>5__16;
this.$1__state = 3;
flag = true;
return flag;
}
this.$m__Finally1a();
IL_B0:
if (this.$7__wrap17.MoveNext()) {
this.<element>5__15 = this.$7__wrap17.Current;
this.$7__wrap19 = this.selector.Invoke(this.<element>5__15).GetEnumerator();
this.$1__state = 2;
goto IL_9D;
}
this.$m__Finally18();
IL_C6:
flag = false;
} catch { // in IL, this is a try-fault block, but C# doesn't have those...
this.Dispose();
throw;
}
return flag;
}
Now how can one map the generated code back to the original C#? The general idea is simple (the devil is in the details...):
Every time the code assigns to this.current, this.state and then returns, we transform that into a "yield return" instruction and a "goto" instruction to the label belonging to the new state. Because we run this transformation very early in the decompiler's pipeline (prior to any control flow analysis), the following steps will pick up on the "goto"s and be able to detect loops and simplify the "goto"s away.
However, how do we determine the label that is responsible for (to give an example) state 3? The answer is 'IL_9D', but figuring this out is non-trivial: the C# compiler makes use of if-statements (to be exact: beq and bne.un), switch statements, and mixtures of both. Moreover, switch statements are usually preceded by subtractions, as the IL switch only deals with cases 0 to n-1. The ILAst for the beginning of the above MoveNext() method looks like this:
stloc(var_1_06, ldfld(Enumerable/<SelectManyIterator>d__14`2<TSource, TResult>::<>1__state, ldarg(0)))
brtrue(IL_17, ceq(ldloc(var_1_06), ldc.i4(0)))
brtrue(IL_96, ceq(ldloc(var_1_06), ldc.i4(3)))
br(IL_C6)
IL_17:
stfld(Enumerable/<SelectManyIterator>d__14`2<TSource, TResult>::<>1__state, ldarg(0), ldc.i4(-1)) ...
If you haven't been following the previous posts: the ILAst is an intermediate data structure used in the decompiler. It represents an IL program using nested expressions, thus eliminating the IL evaluation stack. At the point where the "yield return" transformation runs, opcodes have already been simplified, so "beq" now is "brtrue(ceq)".
To determine where MoveNext() will branch to in a given state, ILSpy will simulate the execution of the beginning of the MoveNext() method. It does this symbolically: "this.$1__state" evaluates to (state+0). In general, "values" in this symbolic execution are (x), (state+x), (state==x) and (this), where x is an int32. The execution will go linearly through the ILAst; it works on the assumption that there are no backward branches. Execution stops once it encounters a statement it doesn't understand - usually, this is the assignment "this.$1__state = -1;", which indicates that the enumerator started executing. For each statement in the ILAst, the range of states that can lead to that value is stored.
So the result of the analysis is the following table: IL_17: state 0 to 0 IL_96: state 3 to 3 IL_C6: state int.MinValue to -1; 1 to 2, and 4 to int.MaxValue
This allows us to reconstruct the control flow in the MoveNext() method. However, one piece of the puzzle is still missing: the try-finally blocks. The C# compiler doesn't compile any of those into the MoveNext() method. Instead, it puts each finally block into its own method, and calls them in the MoveNext() method only on the regular exit of the try blocks. In case of an exception, the try-fault handler simply calls Dispose(), which takes care of calling the finally blocks depending on the current state:
void System.IDisposable.Dispose()
{
switch (this.$1__state) {
case 1:
case 2:
case 3: {
try {
switch (this.$1__state)
{
case 2:
case 3:
{
try {
} finally {
this.$m__Finally1a();
}
break;
}
}
} finally {
this.$m__Finally18();
}
return;
}
}
}
private void $m__Finally18()
{
this.$1__state = -1;
if (this.$7__wrap17 != null) {
this.$7__wrap17.Dispose();
}
}
private void $m__Finally1a()
{
this.$1__state = 1;
if (this.$7__wrap19 != null) {
this.$7__wrap19.Dispose();
}
}
We analyze the Dispose() method using the same symbolic execution that we used for the jump code at the beginning of MoveNext(). This tells us that $m__Finally1a is called in states 2 and 3; and that $m__Finally18 is called in states 1 to 3. Using this information, we can reconstruct the try-finally blocks within MoveNext(). The remaining parts of the ILAst pipeline then take care to replace the "goto"s with loop and if structures. Finally, the C# pattern transformations take care of translating the code back to the foreach pattern, resulting in the highly readable code in the screenshot at the beginning of this post.
[Less]
|
|
Posted
almost 15 years
ago
by
ChristophWille
One more new feature in ILSpy - Analyze. It currently works with fields and methods. Simply right-click on a field and select Analyze from the context menu:
This opens the Analyzer view below the decompilation view (Read By and Assigned By are not
... [More]
expanded by default):
The same procedure also works with methods, however the analysis is different (Uses and Used By, I did not expand Uses because the Display method does indeed use quite a few other methods):
Basically, this brings you the "Find Usage" feature in ILSpy. To be added: support for properties and events. [Less]
|
|
Posted
almost 15 years
ago
by
ChristophWille
When you are hovering over an IL instruction, you'll get a tooltip (in case you are wondering: the tooltip content is derived from "C:\Program Files (x86)\Reference
Assemblies\Microsoft\Framework\.NETFramework\v4.0\mscorlib.xml")
The hand
... [More]
cursor shows that on clicking the instruction, you will be taken to the MSDN documentation for this instruction (in a separate browser window). [Less]
|
|
Posted
almost 15 years
ago
by
ChristophWille
A week ago we announced that a preview is ready – to be downloaded right from our build server. Today, we are entering the official testing phase of ILSpy with the availability of ILSpy 1.0 M1:
... [More]
http://sourceforge.net/projects/sharpdevelop/files/ILSpy/1.0/ILSpy_1.0.0.417_M1_Binaries.zip/download
The most important improvement from the preview(s) to M1 is the quality of the decompiler. A lot more statements and constructs are presented in a nice C# way now, more types decompile – in general, we consider M1 to be a consumer-friendly build that can be used already for your everyday decompilation needs.
In addition to the core functionality, you will find two very useful new features: decompilation of BAML resources into XAML, as well as saving an entire assembly as C# project (so you can easily peruse the source code in your favorite IDE).
Screenshots, screencasts, blog posts as well as how to report bugs / suggest features can be found at
http://ilspy.net/
Development of ILSpy is continuing at a fast pace; if you are interested in testing the latest versions on the way to M2, please head over to our build server:
http://build.sharpdevelop.net/BuildArtefacts/#ILSpy
New stable test versions, features as well as technical background blog posts will be announced on Twitter going forward:
http://twitter.com/ilspy
The SharpDevelop Team [Less]
|