Label: perfomance

Performance in Software: What Actually Matters and Where Time Is Lost

What is performance

In software engineering, performance refers to how efficiently a program uses resources to complete its work.

In practice, it is not just about “being fast.” Performance is a combination of:

execution time
latency
throughput
resource usage (CPU, memory, I/O)

The key point:

Performance is always relative to a workload and constraints.

Why performance is often misunderstood

A common mistake is focusing on the wrong layer.

Developers tend to:

micro-optimize code paths
rewrite functions
change algorithms prematurely

while the real bottleneck is:

I/O latency
locking
memory allocation
system calls

In many real systems, CPU time is not the limiting factor at all.

Where programs actually spend time

To understand performance, you need to know where time is lost.

CPU execution

This is what most people think about:

instruction execution
branching
arithmetic

But modern CPUs are fast enough that pure computation is rarely the main issue.

Memory access

Memory is often the real bottleneck:

cache misses cost hundreds of cycles
poor data locality kills performance
pointer-heavy structures degrade throughput

This is why:

data layout often matters more than algorithms

Synchronization

Multithreaded code introduces:

locks
contention
cache line bouncing

Even small critical sections can dominate runtime under load.

I/O operations

Disk, network, and console output are slow compared to CPU:

file writes
socket operations
logging

These can completely dominate execution time.

Latency vs throughput

Two core performance metrics are often confused.

Latency

time to complete a single operation
important for user-facing systems

Throughput

amount of work done per unit time
important for batch processing and servers

Optimizing one often hurts the other.

The cost of abstraction

Modern languages and frameworks add layers:

virtual calls
allocations
hidden copies

These are not inherently bad, but:

they hide costs
they make performance less predictable

Understanding what happens under the hood is critical.

Measuring performance

You cannot improve what you don’t measure.

Profiling

Use profilers to:

identify hotspots
measure call frequency
detect expensive operations

Guessing is almost always wrong.

Benchmarking

Microbenchmarks help isolate:

specific functions
algorithm choices

But they can mislead if:

they don’t reflect real workloads
the compiler optimizes away logic

Typical performance traps

Premature optimization

Optimizing before measuring:

wastes time
complicates code

Ignoring I/O

Programs often spend more time:

waiting for disk
waiting for network

than executing instructions.

Overusing threads

More threads do not mean better performance:

context switching
contention
memory pressure

can make things worse.

Performance in low-level systems

When working close to the system (C/C++, binary patching, etc.), performance becomes more explicit.

You deal with:

instruction-level behavior
cache effects
memory alignment
system call overhead

Even small changes can:

improve latency significantly
or break performance completely

Performance and real systems

In real-world applications:

performance is constrained by the slowest component
scaling introduces new bottlenecks
fixes often shift the problem elsewhere

This is why performance work is iterative:

measure
identify bottleneck
fix
repeat

Final thoughts

Performance is not about writing “fast code” in isolation.
It is about understanding the entire system:

CPU
memory
I/O
concurrency

And most importantly:

the biggest wins usually come from fixing the right problem, not writing clever code.

03 Mar 2017

String transfer from a function in GO to a code on C without memory allocation (Part 1)

By Eduard Mishkurov

Go (Golang)

CGO documentation illuminates string transfer issue rather poorly. They only mention that C.CString() function should be used in order to convert a string object into a pointer to a buffer with a zero-terminated string – char*, which is coherent for the C code. This is great; however, a memory block is being allocated during this procedure