compiler Archives - prodSens.live

Pointers are a Double-Edged Sword

Will Brightling — Fri, 17 May 2024 13:20:26 +0000

Pointers are handy. They allow us to pass heavy objects around with little computational overhead. However, this assumption sometimes leads developers to design their program’s data flow based on pointer types. But this is where things could go wrong.

Let’s look at an example of when overusing pointer types to pass data around the program could damage the system’s performance. Consider the copy function in Go. It takes two arguments: the destination slice and the slice that is being copied into the destination slice.

func copy(dst, src []Type) int

You might wonder why the copy function doesn’t simply take the source slice and return a copied slice. The reason is that, in Go (as any other garbage collected language), a function returning references to its internal variables would actually slow the program down!

But why is returning references slow?

In Go, Slices are passed by reference, which means the underlying array is not copied; instead, just a reference to it gets passed around. This, in theory, should make things faster! After all, it’s not copying the data, but that’s not always the case.

We wrote a copy function that just uses Go’s built-in copy function to copy a slice, but we did it in two different ways. In one of them, the destination slice is passed down as an argument, but in the other one, it is created inside the function and returned at the end.

func CopyPointerAsParam(dest []int, src []int) {
    copy(dest, src)
}

func CopyPointerAsReturn(src []int) []int {
    dst := make([]int, len(src))
    copy(dst, src)
    return dst
}

We wrote some benchmarks for them as well:

func BenchmarkCopyPointerAsParam(b *testing.B) {
    for i := 0; i < b.N; i++ {
        src := []int{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}
        var dest []int
        CopyPointerAsParam(dest, src)
    }
}

func BenchmarkCopyPointerAsReturn(b *testing.B) {
    for i := 0; i < b.N; i++ {
        src := []int{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}
        _ = CopyPointerAsReturn(src)
    }
}

Let’s run the benchmarks and see which one is faster:

$ go test -bench ./...

goos: linux
goarch: amd64
pkg: github.com/alirostami1/escape-analysis-exp/pointer_as_param
cpu: 12th Gen Intel(R) Core(TM) i7-12650H
BenchmarkCopyPointerAsParam-16          423221650                2.817 ns/op
BenchmarkCopyPointerAsReturn-16         43333108                27.54 ns/op
PASS
ok      github.com/alirostami1/escape-analysis-exp/pointer_as_param     2.706s

By running the benchmarks, we see that the version with the destination slice being passed down as an argument is ten times faster than the one that returns the destination slice back to the caller function. Odd! Isn’t it?

This is happening because when returning a slice (which is a reference to an underlying array defined inside the function), the compiler is forced to store the array in the heap memory; otherwise, after the function returns and the function stack gets cleaned, the slice would be pointing to a position in the stack that isn’t part of the stack anymore. It points to garbage data that could be overwritten by another function at any time!

Enters Heap

Instead of storing the underlying array in stack memory, the Go compiler detects that it might be referenced outside the function, so it escapes the array from the stack memory to the heap memory. This means that any slice referencing the array is still valid even after the function returns.

We can see this escape from stack memory to heap memory by enabling the -m compiler flag which prints out the go compiler’s optimization decisions including Escape Analysis:

$ go test -gcflags="-m" ./... -bench .
...
pointer_as_param/pointer_as_param.go:4:25: dest does not escape
pointer_as_param/pointer_as_param.go:4:37: src does not escape
pointer_as_param/pointer_as_param.go:9:26: src does not escape
pointer_as_param/pointer_as_param.go:10:13: make([]int, len(src)) escapes to heap
...

As you can see, while no variable inside the func CopyPointerAsParam(dest []int, src []int) escapes, the make([]int, len(src)) in the func CopyPointerAsReturn(src []int) []int escapes to heap.

But why is this slowing the program down? Because heap memory is slow compared to stack memory. With stack, the runtime can just move the stack pointer down to where the function started when it returns, and everything after the pointer will be discarded. However, the heap memory is different; the runtime has to keep track of each object and its references to identify and delete unreachable objects. The garbage collector makes a graph of objects and their references and periodically traverses the graph to identify unreachable nodes, which adds significant complexity and computational overhead.

Passing the slice down to the copy function allows the compiler to keep the underlying array inside the main function stack frame, as it will no longer be discarded after the copy function returns. As a general rule, pointers and data types that are passed by reference, like maps and slices, should be passed down the call stack, not up.

The post Pointers are a Double-Edged Sword appeared first on prodSens.live.

Analyzing AST in Go with JSON tools

Dominick Sorrentino — Tue, 06 Sep 2022 04:03:02 +0000

There are many specific tasks that could significantly improve and automate your ongoing maintenance of big-project. Some of them require building tools that can analyze or change source code created by developers.

For example, such tools could be:

gathering metadata from comments,
gathering strings that need to be translated,
understanding the structure of the code to calculate some complexity metrics or build explanatory diagrams,
or even apply some automatic code optimization and refactoring patterns

Solving such tasks seems to lead us to complicated topics of compilers and parsers. But in 2022 every modern programming language comes with batteries included. The structure of code in form of AST that is ready to be searched and manipulated is presented as a built-in library. Basically, parsing files with code and searching for specific things is not much harder as do the same for JSON or XML.

In this article, we will cover AST analysis in Go.

Existing approaches

In golang there is a standard package ast that provides structs of AST nodes and functions for parsing source files. It is quite easy and straightforward for experienced go developers to write code for the tool. Also, there is printer package that can convert AST back into source code.

Here is a list of articles describing how to manipulate AST in golang:

One small aspect, you need to know the structure of golang AST. For me, when I first dive into the topic, the problem was to understand how nodes are combined together and figure out what exactly I need to search in terms of node structure. Of course, you can print AST using built-in capabilities. You will get output in some strange format:

   0  *ast.File {
   1  .  Package: 1:1
   2  .  Name: *ast.Ident {
   3  .  .  NamePos: 1:9
   4  .  .  Name: "main"
   5  .  }
   6  .  Decls: []ast.Decl (len = 2) {
   7  .  .  0: *ast.GenDecl {
   8  .  .  .  TokPos: 3:1
   9  .  .  .  Tok: import
  10  .  .  .  Lparen: 3:8
....

Also, since the format is very specific, you can’t use any tools to navigate it, except text search. Tools like goast-viewer can help with this, but capabilities are limited.

Proposed solution

I started thinking of the library that would allow us to convert AST into some very conventional format like JSON. JSON is easy to manipulate, and many tools (like jq) and approaches exist to search and modify JSON.

So, what I end up with is asty

Asty is a small library written in go that allows parsing source code and presenting it in JSON structure. But, moreover, it allows also to do the reverse conversion. It means that now you can manipulate go code with a tool or algorithm developed with any programming language.

You can use it as go package, as a standalone executable, or even as a docker container. Try this page to experiment with asty in web assembly.

Example go code:

package main

import "fmt"

func main() {
    fmt.Println("hello world")
}

Example JSON output:

{
  "NodeType": "File",
  "Name": {
    "NodeType": "Ident",
    "Name": "main"
  },
  "Decls": [
    {
      "NodeType": "GenDecl",
      "Tok": "import",
      "Specs": [
        {
          "NodeType": "ImportSpec",
          "Name": null,
          "Path": {
            "NodeType": "BasicLit",
            "Kind": "STRING",
            "Value": ""fmt""
          }
        }
      ]
    },
    {
      "NodeType": "FuncDecl",
      "Recv": null,
      "Name": {
        "NodeType": "Ident",
        "Name": "main"
      },
      "Type": {
        "NodeType": "FuncType",
        "TypeParams": null,
        "Params": {
          "NodeType": "FieldList",
          "List": null
        },
        "Results": null
      },
      "Body": {
        "NodeType": "BlockStmt",
        "List": [
          {
            "NodeType": "ExprStmt",
            "X": {
              "NodeType": "CallExpr",
              "Fun": {
                "NodeType": "SelectorExpr",
                "X": {
                  "NodeType": "Ident",
                  "Name": "fmt"
                },
                "Sel": {
                  "NodeType": "Ident",
                  "Name": "Println"
                }
              },
              "Args": [
                {
                  "NodeType": "BasicLit",
                  "Kind": "STRING",
                  "Value": ""hello world""
                }
              ]
            }
          }
        ]
      }
    }
  ]
}

asty is also capable to output comments, positions of tokens in original source, and reference ids. In some places, AST of go is not actually a tree but rather a DAG. So nodes may have the same ids specified in JSON.

Development principles and constraints

In the development of asty I tried to follow some rules:

Make JSON output as close to real golang structures as possible. There is no additional logic introduced. No normalization. No reinterpretation. The only things that were introduced are the names of some enum values. Even names of fields are preserved in the same way they exist in go ast package.
Make it very explicit. No reflection. No listing of fields. This is done to facilitate future maintenance. If something will be changed in future versions of golang this code will probably break compile time. Literally, asty contains 2 copies for each AST node struct to define marshaling and unmarshaling of JSON.
Keep polymorphism in JSON structure. If some field references an expression then a particular type will be discriminated from the object type name stored in a separate field NodeType. It is tricky to achieve so if you want something like this for other tasks I would recommend checking out this example https://github.com/karaatanassov/go_polymorphic_json

Further work

I am looking for cooperation with other developers interested in language tools development. Meanwhile, you can check another repository with examples where I experiment with AST JSON in python.

The post Analyzing AST in Go with JSON tools appeared first on prodSens.live.