SIMD - Experimental helpers

This page lists all operations on slices, available in the exp/simd sub-package. These helpers use AVX (128-bit), AVX2 (256-bit) or AVX512 (512-bit) SIMD when built with Go 1.26+, the GOEXPERIMENT=simd flag, and on amd64.

Help improve this documentation

This documentation is still new and evolving. If you spot any mistakes, unclear explanations, or missing details, please open an issue.

Your feedback helps us improve!

Unstable API

SIMD helpers are experimental. The API may break in the future.

Performance

Benchmarks show that running SIMD operators on small datasets is slower:

BenchmarkSumInt8/small/Fallback-lo-4             203616572        5.875 ns/op
BenchmarkSumInt8/small/AVX-x16-4                 100000000        12.04 ns/op
BenchmarkSumInt8/small/AVX2-x32-4                 64041816        17.93 ns/op
BenchmarkSumInt8/small/AVX512-x64-4               26947528        44.75 ns/op

But much much faster on big datasets:

BenchmarkSumInt8/xlarge/Fallback-lo-4               247677       4860 ns/op
BenchmarkSumInt8/xlarge/AVX-x16-4                  3851040      311.4 ns/op
BenchmarkSumInt8/xlarge/AVX2-x32-4                 7100002      169.2 ns/op
BenchmarkSumInt8/xlarge/AVX512-x64-4              10107534      118.1 ns/op

Checks if a target value is present in a collection using SIMD instructions. The suffix (x4, x8, x16, x32, x64) indicates the number of lanes processed simultaneously.

Note: Choose the variant matching your CPU's capabilities. Higher lane counts provide better performance but require newer CPU support.

// Using AVX2 variant (32 lanes at once) - Intel Haswell+ / AMD Excavator+
found := simd.ContainsInt8x32([]int8{1, 2, 3, 4, 5}, 3)
// true

// Using AVX variant (16 lanes at once) - works on all amd64
found := simd.ContainsInt64x2([]int64{1000000, 2000000, 3000000}, 2000000)
// true

// Using AVX-512 variant (64 lanes at once) - Intel Skylake-X+
found := simd.ContainsUint8x64([]uint8{10, 20, 30, 40, 50}, 30)
// true

// Float32 with AVX2 (8 lanes at once)
found := simd.ContainsFloat32x8([]float32{1.1, 2.2, 3.3, 4.4}, 3.3)
// true

// Empty collection returns false
found := simd.ContainsInt16x16([]int16{}, 5)
// false

Prototypes:

func ContainsInt8x16[T ~int8](collection []T, target T) bool
func ContainsInt8x32[T ~int8](collection []T, target T) bool
func ContainsInt8x64[T ~int8](collection []T, target T) bool
func ContainsInt16x8[T ~int16](collection []T, target T) bool
func ContainsInt16x16[T ~int16](collection []T, target T) bool
func ContainsInt16x32[T ~int16](collection []T, target T) bool
func ContainsInt32x4[T ~int32](collection []T, target T) bool
func ContainsInt32x8[T ~int32](collection []T, target T) bool
func ContainsInt32x16[T ~int32](collection []T, target T) bool
func ContainsInt64x2[T ~int64](collection []T, target T) bool
func ContainsInt64x4[T ~int64](collection []T, target T) bool
func ContainsInt64x8[T ~int64](collection []T, target T) bool
func ContainsUint8x16[T ~uint8](collection []T, target T) bool
func ContainsUint8x32[T ~uint8](collection []T, target T) bool
func ContainsUint8x64[T ~uint8](collection []T, target T) bool
func ContainsUint16x8[T ~uint16](collection []T, target T) bool
func ContainsUint16x16[T ~uint16](collection []T, target T) bool
func ContainsUint16x32[T ~uint16](collection []T, target T) bool
func ContainsUint32x4[T ~uint32](collection []T, target T) bool
func ContainsUint32x8[T ~uint32](collection []T, target T) bool
func ContainsUint32x16[T ~uint32](collection []T, target T) bool
func ContainsUint64x2[T ~uint64](collection []T, target T) bool
func ContainsUint64x4[T ~uint64](collection []T, target T) bool
func ContainsUint64x8[T ~uint64](collection []T, target T) bool
func ContainsFloat32x4[T ~float32](collection []T, target T) bool
func ContainsFloat32x8[T ~float32](collection []T, target T) bool
func ContainsFloat32x16[T ~float32](collection []T, target T) bool
func ContainsFloat64x2[T ~float64](collection []T, target T) bool
func ContainsFloat64x4[T ~float64](collection []T, target T) bool
func ContainsFloat64x8[T ~float64](collection []T, target T) bool

Sums the values in a collection using SIMD instructions. The suffix (x2, x4, x8, x16, x32, x64) indicates the number of lanes processed simultaneously.

Note: Choose the variant matching your CPU's capabilities. Higher lane counts provide better performance but require newer CPU support.

// Using AVX2 variant (32 lanes at once) - Intel Haswell+ / AMD Excavator+
sum := simd.SumInt8x32([]int8{1, 2, 3, 4, 5})
// 15

// Using AVX-512 variant (16 lanes at once) - Intel Skylake-X+
sum := simd.SumFloat32x16([]float32{1.1, 2.2, 3.3, 4.4})
// 11

// Using AVX variant (4 lanes at once) - works on all amd64
sum := simd.SumInt32x4([]int32{1000000, 2000000, 3000000})
// 6000000

// Empty collection returns 0
sum := simd.SumUint16x16([]uint16{})
// 0

Similar:

sumby

Prototypes:

func SumInt8x16[T ~int8](collection []T) T
func SumInt8x32[T ~int8](collection []T) T
func SumInt8x64[T ~int8](collection []T) T
func SumInt16x8[T ~int16](collection []T) T
func SumInt16x16[T ~int16](collection []T) T
func SumInt16x32[T ~int16](collection []T) T
func SumInt32x4[T ~int32](collection []T) T
func SumInt32x8[T ~int32](collection []T) T
func SumInt32x16[T ~int32](collection []T) T
func SumInt64x2[T ~int64](collection []T) T
func SumInt64x4[T ~int64](collection []T) T
func SumInt64x8[T ~int64](collection []T) T
func SumUint8x16[T ~uint8](collection []T) T
func SumUint8x32[T ~uint8](collection []T) T
func SumUint8x64[T ~uint8](collection []T) T
func SumUint16x8[T ~uint16](collection []T) T
func SumUint16x16[T ~uint16](collection []T) T
func SumUint16x32[T ~uint16](collection []T) T
func SumUint32x4[T ~uint32](collection []T) T
func SumUint32x8[T ~uint32](collection []T) T
func SumUint32x16[T ~uint32](collection []T) T
func SumUint64x2[T ~uint64](collection []T) T
func SumUint64x4[T ~uint64](collection []T) T
func SumUint64x8[T ~uint64](collection []T) T
func SumFloat32x4[T ~float32](collection []T) T
func SumFloat32x8[T ~float32](collection []T) T
func SumFloat32x16[T ~float32](collection []T) T
func SumFloat64x2[T ~float64](collection []T) T
func SumFloat64x4[T ~float64](collection []T) T
func SumFloat64x8[T ~float64](collection []T) T

Calculates the arithmetic mean of a collection using SIMD instructions. The suffix (x2, x4, x8, x16, x32, x64) indicates the number of lanes processed simultaneously.

Note: Choose the variant matching your CPU's capabilities. Higher lane counts provide better performance but require newer CPU support.

// Using AVX2 variant (32 lanes at once) - Intel Haswell+ / AMD Excavator+
mean := simd.MeanInt8x32([]int8{1, 2, 3, 4, 5})
// 3

// Using AVX-512 variant (16 lanes at once) - Intel Skylake-X+
mean := simd.MeanFloat32x16([]float32{1.0, 2.0, 3.0, 4.0})
// 2.5

// Using AVX variant (8 lanes at once) - works on all amd64
mean := simd.MeanInt16x8([]int16{10, 20, 30, 40})
// 25

// Empty collection returns 0
mean := simd.MeanUint32x4([]uint32{})
// 0

Similar:

meanby

Prototypes:

func MeanInt8x16[T ~int8](collection []T) T
func MeanInt8x32[T ~int8](collection []T) T
func MeanInt8x64[T ~int8](collection []T) T
func MeanInt16x8[T ~int16](collection []T) T
func MeanInt16x16[T ~int16](collection []T) T
func MeanInt16x32[T ~int16](collection []T) T
func MeanInt32x4[T ~int32](collection []T) T
func MeanInt32x8[T ~int32](collection []T) T
func MeanInt32x16[T ~int32](collection []T) T
func MeanInt64x2[T ~int64](collection []T) T
func MeanInt64x4[T ~int64](collection []T) T
func MeanInt64x8[T ~int64](collection []T) T
func MeanUint8x16[T ~uint8](collection []T) T
func MeanUint8x32[T ~uint8](collection []T) T
func MeanUint8x64[T ~uint8](collection []T) T
func MeanUint16x8[T ~uint16](collection []T) T
func MeanUint16x16[T ~uint16](collection []T) T
func MeanUint16x32[T ~uint16](collection []T) T
func MeanUint32x4[T ~uint32](collection []T) T
func MeanUint32x8[T ~uint32](collection []T) T
func MeanUint32x16[T ~uint32](collection []T) T
func MeanUint64x2[T ~uint64](collection []T) T
func MeanUint64x4[T ~uint64](collection []T) T
func MeanUint64x8[T ~uint64](collection []T) T
func MeanFloat32x4[T ~float32](collection []T) T
func MeanFloat32x8[T ~float32](collection []T) T
func MeanFloat32x16[T ~float32](collection []T) T
func MeanFloat64x2[T ~float64](collection []T) T
func MeanFloat64x4[T ~float64](collection []T) T
func MeanFloat64x8[T ~float64](collection []T) T

Finds the minimum value in a collection using SIMD instructions. The suffix (x2, x4, x8, x16, x32, x64) indicates the number of lanes processed simultaneously.

Note: Choose the variant matching your CPU's capabilities. Higher lane counts provide better performance but require newer CPU support.

// Using AVX2 variant (32 lanes at once) - Intel Haswell+ / AMD Excavator+
min := simd.MinInt8x32([]int8{5, 2, 8, 1, 9})
// 1

// Using AVX-512 variant (16 lanes at once) - Intel Skylake-X+
min := simd.MinFloat32x16([]float32{3.5, 1.2, 4.8, 2.1})
// 1.2

// Using AVX variant (4 lanes at once) - works on all amd64
min := simd.MinInt32x4([]int32{100, 50, 200, 75})
// 50

// Empty collection returns 0
min := simd.MinUint16x8([]uint16{})
// 0

Prototypes:

func MinInt8x16[T ~int8](collection []T) T
func MinInt8x32[T ~int8](collection []T) T
func MinInt8x64[T ~int8](collection []T) T
func MinInt16x8[T ~int16](collection []T) T
func MinInt16x16[T ~int16](collection []T) T
func MinInt16x32[T ~int16](collection []T) T
func MinInt32x4[T ~int32](collection []T) T
func MinInt32x8[T ~int32](collection []T) T
func MinInt32x16[T ~int32](collection []T) T
func MinInt64x2[T ~int64](collection []T) T
func MinInt64x4[T ~int64](collection []T) T
func MinInt64x8[T ~int64](collection []T) T
func MinUint8x16[T ~uint8](collection []T) T
func MinUint8x32[T ~uint8](collection []T) T
func MinUint8x64[T ~uint8](collection []T) T
func MinUint16x8[T ~uint16](collection []T) T
func MinUint16x16[T ~uint16](collection []T) T
func MinUint16x32[T ~uint16](collection []T) T
func MinUint32x4[T ~uint32](collection []T) T
func MinUint32x8[T ~uint32](collection []T) T
func MinUint32x16[T ~uint32](collection []T) T
func MinUint64x2[T ~uint64](collection []T) T
func MinUint64x4[T ~uint64](collection []T) T
func MinUint64x8[T ~uint64](collection []T) T
func MinFloat32x4[T ~float32](collection []T) T
func MinFloat32x8[T ~float32](collection []T) T
func MinFloat32x16[T ~float32](collection []T) T
func MinFloat64x2[T ~float64](collection []T) T
func MinFloat64x4[T ~float64](collection []T) T
func MinFloat64x8[T ~float64](collection []T) T

SumBy transforms a collection using an iteratee function and sums the result using SIMD instructions. The automatic dispatch functions (e.g., SumByInt8) will select the best SIMD variant based on CPU capabilities. The specific variants (e.g., SumByInt8x32) use a fixed SIMD instruction set regardless of CPU capabilities.

Note: The automatic dispatch functions (e.g., SumByInt8) will use the best available SIMD variant for the current CPU. Use specific variants (e.g., SumByInt8x32) only if you know your target CPU supports that instruction set.

type Person struct {
    Name string
    Age  int8
}

people := []Person{
    {Name: "Alice", Age: 25},
    {Name: "Bob", Age: 30},
    {Name: "Charlie", Age: 35},
}

// Automatic dispatch - uses best available SIMD
sum := simd.SumByInt8(people, func(p Person) int8 {
    return p.Age
})
// 90

type Product struct {
    Name  string
    Price float32
    Stock int32
}

products := []Product{
    {Name: "Widget", Price: 10.50, Stock: 5},
    {Name: "Gadget", Price: 20.00, Stock: 3},
    {Name: "Tool", Price: 15.75, Stock: 2},
}

// Sum stock value using specific AVX2 variant
sum := simd.SumByFloat32x8(products, func(p Product) float32 {
    return p.Price * float32(p.Stock)
})
// 152.5

type Metric struct {
    Value uint16
}

metrics := []Metric{
    {Value: 100},
    {Value: 200},
    {Value: 300},
    {Value: 400},
}

// Using AVX variant - works on all amd64
sum := simd.SumByUint16x8(metrics, func(m Metric) uint16 {
    return m.Value
})
// 1000

// Empty collection returns 0
type Item struct {
    Count int64
}

sum := simd.SumByInt64([]Item{}, func(i Item) int64 {
    return i.Count
})
// 0

Similar:

meanby sum

Prototypes:

func SumByInt8[T any, R ~int8](collection []T, iteratee func(item T) R) R
func SumByInt16[T any, R ~int16](collection []T, iteratee func(item T) R) R
func SumByInt32[T any, R ~int32](collection []T, iteratee func(item T) R) R
func SumByInt64[T any, R ~int64](collection []T, iteratee func(item T) R) R
func SumByUint8[T any, R ~uint8](collection []T, iteratee func(item T) R) R
func SumByUint16[T any, R ~uint16](collection []T, iteratee func(item T) R) R
func SumByUint32[T any, R ~uint32](collection []T, iteratee func(item T) R) R
func SumByUint64[T any, R ~uint64](collection []T, iteratee func(item T) R) R
func SumByFloat32[T any, R ~float32](collection []T, iteratee func(item T) R) R
func SumByFloat64[T any, R ~float64](collection []T, iteratee func(item T) R) R
func SumByInt8x16[T any, R ~int8](collection []T, iteratee func(item T) R) R
func SumByInt8x32[T any, R ~int8](collection []T, iteratee func(item T) R) R
func SumByInt8x64[T any, R ~int8](collection []T, iteratee func(item T) R) R
func SumByInt16x8[T any, R ~int16](collection []T, iteratee func(item T) R) R
func SumByInt16x16[T any, R ~int16](collection []T, iteratee func(item T) R) R
func SumByInt16x32[T any, R ~int16](collection []T, iteratee func(item T) R) R
func SumByInt32x4[T any, R ~int32](collection []T, iteratee func(item T) R) R
func SumByInt32x8[T any, R ~int32](collection []T, iteratee func(item T) R) R
func SumByInt32x16[T any, R ~int32](collection []T, iteratee func(item T) R) R
func SumByInt64x2[T any, R ~int64](collection []T, iteratee func(item T) R) R
func SumByInt64x4[T any, R ~int64](collection []T, iteratee func(item T) R) R
func SumByInt64x8[T any, R ~int64](collection []T, iteratee func(item T) R) R
func SumByUint8x16[T any, R ~uint8](collection []T, iteratee func(item T) R) R
func SumByUint8x32[T any, R ~uint8](collection []T, iteratee func(item T) R) R
func SumByUint8x64[T any, R ~uint8](collection []T, iteratee func(item T) R) R
func SumByUint16x8[T any, R ~uint16](collection []T, iteratee func(item T) R) R
func SumByUint16x16[T any, R ~uint16](collection []T, iteratee func(item T) R) R
func SumByUint16x32[T any, R ~uint16](collection []T, iteratee func(item T) R) R
func SumByUint32x4[T any, R ~uint32](collection []T, iteratee func(item T) R) R
func SumByUint32x8[T any, R ~uint32](collection []T, iteratee func(item T) R) R
func SumByUint32x16[T any, R ~uint32](collection []T, iteratee func(item T) R) R
func SumByUint64x2[T any, R ~uint64](collection []T, iteratee func(item T) R) R
func SumByUint64x4[T any, R ~uint64](collection []T, iteratee func(item T) R) R
func SumByUint64x8[T any, R ~uint64](collection []T, iteratee func(item T) R) R
func SumByFloat32x4[T any, R ~float32](collection []T, iteratee func(item T) R) R
func SumByFloat32x8[T any, R ~float32](collection []T, iteratee func(item T) R) R
func SumByFloat32x16[T any, R ~float32](collection []T, iteratee func(item T) R) R
func SumByFloat64x2[T any, R ~float64](collection []T, iteratee func(item T) R) R
func SumByFloat64x4[T any, R ~float64](collection []T, iteratee func(item T) R) R
func SumByFloat64x8[T any, R ~float64](collection []T, iteratee func(item T) R) R

Finds the maximum value in a collection using SIMD instructions. The suffix (x2, x4, x8, x16, x32, x64) indicates the number of lanes processed simultaneously.

Note: Choose the variant matching your CPU's capabilities. Higher lane counts provide better performance but require newer CPU support.

// Using AVX2 variant (32 lanes at once) - Intel Haswell+ / AMD Excavator+
max := simd.MaxInt8x32([]int8{5, 2, 8, 1, 9})
// 9

// Using AVX-512 variant (16 lanes at once) - Intel Skylake-X+
max := simd.MaxFloat32x16([]float32{3.5, 1.2, 4.8, 2.1})
// 4.8

// Using AVX variant (4 lanes at once) - works on all amd64
max := simd.MaxInt32x4([]int32{100, 50, 200, 75})
// 200

// Empty collection returns 0
max := simd.MaxUint16x8([]uint16{})
// 0

Prototypes:

func MaxInt8x16[T ~int8](collection []T) T
func MaxInt8x32[T ~int8](collection []T) T
func MaxInt8x64[T ~int8](collection []T) T
func MaxInt16x8[T ~int16](collection []T) T
func MaxInt16x16[T ~int16](collection []T) T
func MaxInt16x32[T ~int16](collection []T) T
func MaxInt32x4[T ~int32](collection []T) T
func MaxInt32x8[T ~int32](collection []T) T
func MaxInt32x16[T ~int32](collection []T) T
func MaxInt64x2[T ~int64](collection []T) T
func MaxInt64x4[T ~int64](collection []T) T
func MaxInt64x8[T ~int64](collection []T) T
func MaxUint8x16[T ~uint8](collection []T) T
func MaxUint8x32[T ~uint8](collection []T) T
func MaxUint8x64[T ~uint8](collection []T) T
func MaxUint16x8[T ~uint16](collection []T) T
func MaxUint16x16[T ~uint16](collection []T) T
func MaxUint16x32[T ~uint16](collection []T) T
func MaxUint32x4[T ~uint32](collection []T) T
func MaxUint32x8[T ~uint32](collection []T) T
func MaxUint32x16[T ~uint32](collection []T) T
func MaxUint64x2[T ~uint64](collection []T) T
func MaxUint64x4[T ~uint64](collection []T) T
func MaxUint64x8[T ~uint64](collection []T) T
func MaxFloat32x4[T ~float32](collection []T) T
func MaxFloat32x8[T ~float32](collection []T) T
func MaxFloat32x16[T ~float32](collection []T) T
func MaxFloat64x2[T ~float64](collection []T) T
func MaxFloat64x4[T ~float64](collection []T) T
func MaxFloat64x8[T ~float64](collection []T) T

MeanBy transforms a collection using an iteratee function and calculates the arithmetic mean of the result using SIMD instructions. The automatic dispatch functions (e.g., MeanByInt8) will select the best SIMD variant based on CPU capabilities. The specific variants (e.g., MeanByInt8x32) use a fixed SIMD instruction set regardless of CPU capabilities.

Note: The automatic dispatch functions (e.g., MeanByInt8) will use the best available SIMD variant for the current CPU. Use specific variants (e.g., MeanByInt8x32) only if you know your target CPU supports that instruction set.

type Person struct {
    Name string
    Age  int8
}

people := []Person{
    {Name: "Alice", Age: 20},
    {Name: "Bob", Age: 30},
    {Name: "Charlie", Age: 40},
}

// Automatic dispatch - uses best available SIMD
mean := simd.MeanByInt8(people, func(p Person) int8 {
    return p.Age
})
// 30

type Product struct {
    Name  string
    Price float32
}

products := []Product{
    {Name: "Widget", Price: 10.50},
    {Name: "Gadget", Price: 20.00},
    {Name: "Tool", Price: 15.75},
}

// Mean price using specific AVX2 variant
mean := simd.MeanByFloat32x8(products, func(p Product) float32 {
    return p.Price
})
// 15.4167

type Metric struct {
    Value uint16
}

metrics := []Metric{
    {Value: 100},
    {Value: 200},
    {Value: 300},
    {Value: 400},
}

// Using AVX variant - works on all amd64
mean := simd.MeanByUint16x8(metrics, func(m Metric) uint16 {
    return m.Value
})
// 250

// Empty collection returns 0
type Item struct {
    Count int64
}

mean := simd.MeanByInt64([]Item{}, func(i Item) int64 {
    return i.Count
})
// 0

Similar:

mean sumby

Prototypes:

func MeanByInt8[T any, R ~int8](collection []T, iteratee func(item T) R) R
func MeanByInt16[T any, R ~int16](collection []T, iteratee func(item T) R) R
func MeanByInt32[T any, R ~int32](collection []T, iteratee func(item T) R) R
func MeanByInt64[T any, R ~int64](collection []T, iteratee func(item T) R) R
func MeanByUint8[T any, R ~uint8](collection []T, iteratee func(item T) R) R
func MeanByUint16[T any, R ~uint16](collection []T, iteratee func(item T) R) R
func MeanByUint32[T any, R ~uint32](collection []T, iteratee func(item T) R) R
func MeanByUint64[T any, R ~uint64](collection []T, iteratee func(item T) R) R
func MeanByFloat32[T any, R ~float32](collection []T, iteratee func(item T) R) R
func MeanByFloat64[T any, R ~float64](collection []T, iteratee func(item T) R) R
func MeanByInt8x16[T any, R ~int8](collection []T, iteratee func(item T) R) R
func MeanByInt8x32[T any, R ~int8](collection []T, iteratee func(item T) R) R
func MeanByInt8x64[T any, R ~int8](collection []T, iteratee func(item T) R) R
func MeanByInt16x8[T any, R ~int16](collection []T, iteratee func(item T) R) R
func MeanByInt16x16[T any, R ~int16](collection []T, iteratee func(item T) R) R
func MeanByInt16x32[T any, R ~int16](collection []T, iteratee func(item T) R) R
func MeanByInt32x4[T any, R ~int32](collection []T, iteratee func(item T) R) R
func MeanByInt32x8[T any, R ~int32](collection []T, iteratee func(item T) R) R
func MeanByInt32x16[T any, R ~int32](collection []T, iteratee func(item T) R) R
func MeanByInt64x2[T any, R ~int64](collection []T, iteratee func(item T) R) R
func MeanByInt64x4[T any, R ~int64](collection []T, iteratee func(item T) R) R
func MeanByInt64x8[T any, R ~int64](collection []T, iteratee func(item T) R) R
func MeanByUint8x16[T any, R ~uint8](collection []T, iteratee func(item T) R) R
func MeanByUint8x32[T any, R ~uint8](collection []T, iteratee func(item T) R) R
func MeanByUint8x64[T any, R ~uint8](collection []T, iteratee func(item T) R) R
func MeanByUint16x8[T any, R ~uint16](collection []T, iteratee func(item T) R) R
func MeanByUint16x16[T any, R ~uint16](collection []T, iteratee func(item T) R) R
func MeanByUint16x32[T any, R ~uint16](collection []T, iteratee func(item T) R) R
func MeanByUint32x4[T any, R ~uint32](collection []T, iteratee func(item T) R) R
func MeanByUint32x8[T any, R ~uint32](collection []T, iteratee func(item T) R) R
func MeanByUint32x16[T any, R ~uint32](collection []T, iteratee func(item T) R) R
func MeanByUint64x2[T any, R ~uint64](collection []T, iteratee func(item T) R) R
func MeanByUint64x4[T any, R ~uint64](collection []T, iteratee func(item T) R) R
func MeanByUint64x8[T any, R ~uint64](collection []T, iteratee func(item T) R) R
func MeanByFloat32x4[T any, R ~float32](collection []T, iteratee func(item T) R) R
func MeanByFloat32x8[T any, R ~float32](collection []T, iteratee func(item T) R) R
func MeanByFloat32x16[T any, R ~float32](collection []T, iteratee func(item T) R) R
func MeanByFloat64x2[T any, R ~float64](collection []T, iteratee func(item T) R) R
func MeanByFloat64x4[T any, R ~float64](collection []T, iteratee func(item T) R) R
func MeanByFloat64x8[T any, R ~float64](collection []T, iteratee func(item T) R) R

Clamps each element in a collection between min and max values using SIMD instructions. The suffix (x2, x4, x8, x16, x32, x64) indicates the number of lanes processed simultaneously.

Note: Choose the variant matching your CPU's capabilities. Higher lane counts provide better performance but require newer CPU support.

// Using AVX2 variant (32 lanes at once) - Intel Haswell+ / AMD Excavator+
result := simd.ClampInt8x32([]int8{1, 5, 10, 15, 20}, 5, 15)
// []int8{5, 5, 10, 15, 15}

// Using AVX-512 variant (16 lanes at once) - Intel Skylake-X+
result := simd.ClampFloat32x16([]float32{0.5, 1.5, 2.5, 3.5}, 1.0, 3.0)
// []float32{1.0, 1.5, 2.5, 3.0}

// Using AVX variant (8 lanes at once) - works on all amd64
result := simd.ClampInt16x8([]int16{100, 150, 200, 250}, 120, 220)
// []int16{120, 150, 200, 220}

// Empty collection returns empty collection
result := simd.ClampUint32x4([]uint32{}, 10, 100)
// []uint32{}

Prototypes:

func ClampInt8x16[T ~int8, Slice ~[]T](collection Slice, min, max T) Slice
func ClampInt8x32[T ~int8, Slice ~[]T](collection Slice, min, max T) Slice
func ClampInt8x64[T ~int8, Slice ~[]T](collection Slice, min, max T) Slice
func ClampInt16x8[T ~int16, Slice ~[]T](collection Slice, min, max T) Slice
func ClampInt16x16[T ~int16, Slice ~[]T](collection Slice, min, max T) Slice
func ClampInt16x32[T ~int16, Slice ~[]T](collection Slice, min, max T) Slice
func ClampInt32x4[T ~int32, Slice ~[]T](collection Slice, min, max T) Slice
func ClampInt32x8[T ~int32, Slice ~[]T](collection Slice, min, max T) Slice
func ClampInt32x16[T ~int32, Slice ~[]T](collection Slice, min, max T) Slice
func ClampInt64x2[T ~int64, Slice ~[]T](collection Slice, min, max T) Slice
func ClampInt64x4[T ~int64, Slice ~[]T](collection Slice, min, max T) Slice
func ClampInt64x8[T ~int64, Slice ~[]T](collection Slice, min, max T) Slice
func ClampUint8x16[T ~uint8, Slice ~[]T](collection Slice, min, max T) Slice
func ClampUint8x32[T ~uint8, Slice ~[]T](collection Slice, min, max T) Slice
func ClampUint8x64[T ~uint8, Slice ~[]T](collection Slice, min, max T) Slice
func ClampUint16x8[T ~uint16, Slice ~[]T](collection Slice, min, max T) Slice
func ClampUint16x16[T ~uint16, Slice ~[]T](collection Slice, min, max T) Slice
func ClampUint16x32[T ~uint16, Slice ~[]T](collection Slice, min, max T) Slice
func ClampUint32x4[T ~uint32, Slice ~[]T](collection Slice, min, max T) Slice
func ClampUint32x8[T ~uint32, Slice ~[]T](collection Slice, min, max T) Slice
func ClampUint32x16[T ~uint32, Slice ~[]T](collection Slice, min, max T) Slice
func ClampUint64x2[T ~uint64, Slice ~[]T](collection Slice, min, max T) Slice
func ClampUint64x4[T ~uint64, Slice ~[]T](collection Slice, min, max T) Slice
func ClampUint64x8[T ~uint64, Slice ~[]T](collection Slice, min, max T) Slice
func ClampFloat32x4[T ~float32, Slice ~[]T](collection Slice, min, max T) Slice
func ClampFloat32x8[T ~float32, Slice ~[]T](collection Slice, min, max T) Slice
func ClampFloat32x16[T ~float32, Slice ~[]T](collection Slice, min, max T) Slice
func ClampFloat64x2[T ~float64, Slice ~[]T](collection Slice, min, max T) Slice
func ClampFloat64x4[T ~float64, Slice ~[]T](collection Slice, min, max T) Slice
func ClampFloat64x8[T ~float64, Slice ~[]T](collection Slice, min, max T) Slice

Performance​

Contains​

Sum​

Mean​

Min​

SumBy​

Max​

MeanBy​

Clamp​

Performance

Contains

Sum

Mean

Min

SumBy

Max

MeanBy

Clamp