Skip to main content
Help improve this documentation

This documentation is still new and evolving. If you spot any mistakes, unclear explanations, or missing details, please open an issue.

Your feedback helps us improve!

SIMD helpersโ€‹

This page lists all operations on slices, available in the exp/simd sub-package. These helpers use AVX (128-bit), AVX2 (256-bit) or AVX512 (512-bit) SIMD when built with Go 1.26+, the GOEXPERIMENT=simd flag, and on amd64.

Unstable API

SIMD helpers are experimental. The API may break in the future.

Performanceโ€‹

Benchmarks show that running SIMD operators on small datasets is slower:

BenchmarkSumInt8/small/Fallback-lo-4             203616572        5.875 ns/op
BenchmarkSumInt8/small/AVX-x16-4 100000000 12.04 ns/op
BenchmarkSumInt8/small/AVX2-x32-4 64041816 17.93 ns/op
BenchmarkSumInt8/small/AVX512-x64-4 26947528 44.75 ns/op

But much much faster on big datasets:

BenchmarkSumInt8/xlarge/Fallback-lo-4               247677       4860 ns/op
BenchmarkSumInt8/xlarge/AVX-x16-4 3851040 311.4 ns/op
BenchmarkSumInt8/xlarge/AVX2-x32-4 7100002 169.2 ns/op
BenchmarkSumInt8/xlarge/AVX512-x64-4 10107534 118.1 ns/op
  • Checks if a target value is present in a collection using SIMD instructions. The suffix (x4, x8, x16, x32, x64) indicates the number of lanes processed simultaneously.

    Note: Choose the variant matching your CPU's capabilities. Higher lane counts provide better performance but require newer CPU support.

    // Using AVX2 variant (32 lanes at once) - Intel Haswell+ / AMD Excavator+
    found := simd.ContainsInt8x32([]int8{1, 2, 3, 4, 5}, 3)
    // true
    // Using AVX variant (16 lanes at once) - works on all amd64
    found := simd.ContainsInt64x2([]int64{1000000, 2000000, 3000000}, 2000000)
    // true
    // Using AVX-512 variant (64 lanes at once) - Intel Skylake-X+
    found := simd.ContainsUint8x64([]uint8{10, 20, 30, 40, 50}, 30)
    // true
    // Float32 with AVX2 (8 lanes at once)
    found := simd.ContainsFloat32x8([]float32{1.1, 2.2, 3.3, 4.4}, 3.3)
    // true
    // Empty collection returns false
    found := simd.ContainsInt16x16([]int16{}, 5)
    // false
    Prototypes:
    func ContainsInt8x16[T ~int8](collection []T, target T) bool
    func ContainsInt8x32[T ~int8](collection []T, target T) bool
    func ContainsInt8x64[T ~int8](collection []T, target T) bool
    func ContainsInt16x8[T ~int16](collection []T, target T) bool
    func ContainsInt16x16[T ~int16](collection []T, target T) bool
    func ContainsInt16x32[T ~int16](collection []T, target T) bool
    func ContainsInt32x4[T ~int32](collection []T, target T) bool
    func ContainsInt32x8[T ~int32](collection []T, target T) bool
    func ContainsInt32x16[T ~int32](collection []T, target T) bool
    func ContainsInt64x2[T ~int64](collection []T, target T) bool
    func ContainsInt64x4[T ~int64](collection []T, target T) bool
    func ContainsInt64x8[T ~int64](collection []T, target T) bool
    func ContainsUint8x16[T ~uint8](collection []T, target T) bool
    func ContainsUint8x32[T ~uint8](collection []T, target T) bool
    func ContainsUint8x64[T ~uint8](collection []T, target T) bool
    func ContainsUint16x8[T ~uint16](collection []T, target T) bool
    func ContainsUint16x16[T ~uint16](collection []T, target T) bool
    func ContainsUint16x32[T ~uint16](collection []T, target T) bool
    func ContainsUint32x4[T ~uint32](collection []T, target T) bool
    func ContainsUint32x8[T ~uint32](collection []T, target T) bool
    func ContainsUint32x16[T ~uint32](collection []T, target T) bool
    func ContainsUint64x2[T ~uint64](collection []T, target T) bool
    func ContainsUint64x4[T ~uint64](collection []T, target T) bool
    func ContainsUint64x8[T ~uint64](collection []T, target T) bool
    func ContainsFloat32x4[T ~float32](collection []T, target T) bool
    func ContainsFloat32x8[T ~float32](collection []T, target T) bool
    func ContainsFloat32x16[T ~float32](collection []T, target T) bool
    func ContainsFloat64x2[T ~float64](collection []T, target T) bool
    func ContainsFloat64x4[T ~float64](collection []T, target T) bool
    func ContainsFloat64x8[T ~float64](collection []T, target T) bool
  • Sums the values in a collection using SIMD instructions. The suffix (x2, x4, x8, x16, x32, x64) indicates the number of lanes processed simultaneously.

    Note: Choose the variant matching your CPU's capabilities. Higher lane counts provide better performance but require newer CPU support.

    // Using AVX2 variant (32 lanes at once) - Intel Haswell+ / AMD Excavator+
    sum := simd.SumInt8x32([]int8{1, 2, 3, 4, 5})
    // 15
    // Using AVX-512 variant (16 lanes at once) - Intel Skylake-X+
    sum := simd.SumFloat32x16([]float32{1.1, 2.2, 3.3, 4.4})
    // 11
    // Using AVX variant (4 lanes at once) - works on all amd64
    sum := simd.SumInt32x4([]int32{1000000, 2000000, 3000000})
    // 6000000
    // Empty collection returns 0
    sum := simd.SumUint16x16([]uint16{})
    // 0
    Similar:
    Prototypes:
    func SumInt8x16[T ~int8](collection []T) T
    func SumInt8x32[T ~int8](collection []T) T
    func SumInt8x64[T ~int8](collection []T) T
    func SumInt16x8[T ~int16](collection []T) T
    func SumInt16x16[T ~int16](collection []T) T
    func SumInt16x32[T ~int16](collection []T) T
    func SumInt32x4[T ~int32](collection []T) T
    func SumInt32x8[T ~int32](collection []T) T
    func SumInt32x16[T ~int32](collection []T) T
    func SumInt64x2[T ~int64](collection []T) T
    func SumInt64x4[T ~int64](collection []T) T
    func SumInt64x8[T ~int64](collection []T) T
    func SumUint8x16[T ~uint8](collection []T) T
    func SumUint8x32[T ~uint8](collection []T) T
    func SumUint8x64[T ~uint8](collection []T) T
    func SumUint16x8[T ~uint16](collection []T) T
    func SumUint16x16[T ~uint16](collection []T) T
    func SumUint16x32[T ~uint16](collection []T) T
    func SumUint32x4[T ~uint32](collection []T) T
    func SumUint32x8[T ~uint32](collection []T) T
    func SumUint32x16[T ~uint32](collection []T) T
    func SumUint64x2[T ~uint64](collection []T) T
    func SumUint64x4[T ~uint64](collection []T) T
    func SumUint64x8[T ~uint64](collection []T) T
    func SumFloat32x4[T ~float32](collection []T) T
    func SumFloat32x8[T ~float32](collection []T) T
    func SumFloat32x16[T ~float32](collection []T) T
    func SumFloat64x2[T ~float64](collection []T) T
    func SumFloat64x4[T ~float64](collection []T) T
    func SumFloat64x8[T ~float64](collection []T) T
  • Calculates the arithmetic mean of a collection using SIMD instructions. The suffix (x2, x4, x8, x16, x32, x64) indicates the number of lanes processed simultaneously.

    Note: Choose the variant matching your CPU's capabilities. Higher lane counts provide better performance but require newer CPU support.

    // Using AVX2 variant (32 lanes at once) - Intel Haswell+ / AMD Excavator+
    mean := simd.MeanInt8x32([]int8{1, 2, 3, 4, 5})
    // 3
    // Using AVX-512 variant (16 lanes at once) - Intel Skylake-X+
    mean := simd.MeanFloat32x16([]float32{1.0, 2.0, 3.0, 4.0})
    // 2.5
    // Using AVX variant (8 lanes at once) - works on all amd64
    mean := simd.MeanInt16x8([]int16{10, 20, 30, 40})
    // 25
    // Empty collection returns 0
    mean := simd.MeanUint32x4([]uint32{})
    // 0
    Similar:
    Prototypes:
    func MeanInt8x16[T ~int8](collection []T) T
    func MeanInt8x32[T ~int8](collection []T) T
    func MeanInt8x64[T ~int8](collection []T) T
    func MeanInt16x8[T ~int16](collection []T) T
    func MeanInt16x16[T ~int16](collection []T) T
    func MeanInt16x32[T ~int16](collection []T) T
    func MeanInt32x4[T ~int32](collection []T) T
    func MeanInt32x8[T ~int32](collection []T) T
    func MeanInt32x16[T ~int32](collection []T) T
    func MeanInt64x2[T ~int64](collection []T) T
    func MeanInt64x4[T ~int64](collection []T) T
    func MeanInt64x8[T ~int64](collection []T) T
    func MeanUint8x16[T ~uint8](collection []T) T
    func MeanUint8x32[T ~uint8](collection []T) T
    func MeanUint8x64[T ~uint8](collection []T) T
    func MeanUint16x8[T ~uint16](collection []T) T
    func MeanUint16x16[T ~uint16](collection []T) T
    func MeanUint16x32[T ~uint16](collection []T) T
    func MeanUint32x4[T ~uint32](collection []T) T
    func MeanUint32x8[T ~uint32](collection []T) T
    func MeanUint32x16[T ~uint32](collection []T) T
    func MeanUint64x2[T ~uint64](collection []T) T
    func MeanUint64x4[T ~uint64](collection []T) T
    func MeanUint64x8[T ~uint64](collection []T) T
    func MeanFloat32x4[T ~float32](collection []T) T
    func MeanFloat32x8[T ~float32](collection []T) T
    func MeanFloat32x16[T ~float32](collection []T) T
    func MeanFloat64x2[T ~float64](collection []T) T
    func MeanFloat64x4[T ~float64](collection []T) T
    func MeanFloat64x8[T ~float64](collection []T) T
  • Finds the minimum value in a collection using SIMD instructions. The suffix (x2, x4, x8, x16, x32, x64) indicates the number of lanes processed simultaneously.

    Note: Choose the variant matching your CPU's capabilities. Higher lane counts provide better performance but require newer CPU support.

    // Using AVX2 variant (32 lanes at once) - Intel Haswell+ / AMD Excavator+
    min := simd.MinInt8x32([]int8{5, 2, 8, 1, 9})
    // 1
    // Using AVX-512 variant (16 lanes at once) - Intel Skylake-X+
    min := simd.MinFloat32x16([]float32{3.5, 1.2, 4.8, 2.1})
    // 1.2
    // Using AVX variant (4 lanes at once) - works on all amd64
    min := simd.MinInt32x4([]int32{100, 50, 200, 75})
    // 50
    // Empty collection returns 0
    min := simd.MinUint16x8([]uint16{})
    // 0
    Prototypes:
    func MinInt8x16[T ~int8](collection []T) T
    func MinInt8x32[T ~int8](collection []T) T
    func MinInt8x64[T ~int8](collection []T) T
    func MinInt16x8[T ~int16](collection []T) T
    func MinInt16x16[T ~int16](collection []T) T
    func MinInt16x32[T ~int16](collection []T) T
    func MinInt32x4[T ~int32](collection []T) T
    func MinInt32x8[T ~int32](collection []T) T
    func MinInt32x16[T ~int32](collection []T) T
    func MinInt64x2[T ~int64](collection []T) T
    func MinInt64x4[T ~int64](collection []T) T
    func MinInt64x8[T ~int64](collection []T) T
    func MinUint8x16[T ~uint8](collection []T) T
    func MinUint8x32[T ~uint8](collection []T) T
    func MinUint8x64[T ~uint8](collection []T) T
    func MinUint16x8[T ~uint16](collection []T) T
    func MinUint16x16[T ~uint16](collection []T) T
    func MinUint16x32[T ~uint16](collection []T) T
    func MinUint32x4[T ~uint32](collection []T) T
    func MinUint32x8[T ~uint32](collection []T) T
    func MinUint32x16[T ~uint32](collection []T) T
    func MinUint64x2[T ~uint64](collection []T) T
    func MinUint64x4[T ~uint64](collection []T) T
    func MinUint64x8[T ~uint64](collection []T) T
    func MinFloat32x4[T ~float32](collection []T) T
    func MinFloat32x8[T ~float32](collection []T) T
    func MinFloat32x16[T ~float32](collection []T) T
    func MinFloat64x2[T ~float64](collection []T) T
    func MinFloat64x4[T ~float64](collection []T) T
    func MinFloat64x8[T ~float64](collection []T) T
  • SumBy transforms a collection using an iteratee function and sums the result using SIMD instructions. The automatic dispatch functions (e.g., SumByInt8) will select the best SIMD variant based on CPU capabilities. The specific variants (e.g., SumByInt8x32) use a fixed SIMD instruction set regardless of CPU capabilities.

    Note: The automatic dispatch functions (e.g., SumByInt8) will use the best available SIMD variant for the current CPU. Use specific variants (e.g., SumByInt8x32) only if you know your target CPU supports that instruction set.

    type Person struct {
    Name string
    Age int8
    }

    people := []Person{
    {Name: "Alice", Age: 25},
    {Name: "Bob", Age: 30},
    {Name: "Charlie", Age: 35},
    }

    // Automatic dispatch - uses best available SIMD
    sum := simd.SumByInt8(people, func(p Person) int8 {
    return p.Age
    })
    // 90
    type Product struct {
    Name string
    Price float32
    Stock int32
    }

    products := []Product{
    {Name: "Widget", Price: 10.50, Stock: 5},
    {Name: "Gadget", Price: 20.00, Stock: 3},
    {Name: "Tool", Price: 15.75, Stock: 2},
    }

    // Sum stock value using specific AVX2 variant
    sum := simd.SumByFloat32x8(products, func(p Product) float32 {
    return p.Price * float32(p.Stock)
    })
    // 152.5
    type Metric struct {
    Value uint16
    }

    metrics := []Metric{
    {Value: 100},
    {Value: 200},
    {Value: 300},
    {Value: 400},
    }

    // Using AVX variant - works on all amd64
    sum := simd.SumByUint16x8(metrics, func(m Metric) uint16 {
    return m.Value
    })
    // 1000
    // Empty collection returns 0
    type Item struct {
    Count int64
    }

    sum := simd.SumByInt64([]Item{}, func(i Item) int64 {
    return i.Count
    })
    // 0
    Similar:
    Prototypes:
    func SumByInt8[T any, R ~int8](collection []T, iteratee func(item T) R) R
    func SumByInt16[T any, R ~int16](collection []T, iteratee func(item T) R) R
    func SumByInt32[T any, R ~int32](collection []T, iteratee func(item T) R) R
    func SumByInt64[T any, R ~int64](collection []T, iteratee func(item T) R) R
    func SumByUint8[T any, R ~uint8](collection []T, iteratee func(item T) R) R
    func SumByUint16[T any, R ~uint16](collection []T, iteratee func(item T) R) R
    func SumByUint32[T any, R ~uint32](collection []T, iteratee func(item T) R) R
    func SumByUint64[T any, R ~uint64](collection []T, iteratee func(item T) R) R
    func SumByFloat32[T any, R ~float32](collection []T, iteratee func(item T) R) R
    func SumByFloat64[T any, R ~float64](collection []T, iteratee func(item T) R) R
    func SumByInt8x16[T any, R ~int8](collection []T, iteratee func(item T) R) R
    func SumByInt8x32[T any, R ~int8](collection []T, iteratee func(item T) R) R
    func SumByInt8x64[T any, R ~int8](collection []T, iteratee func(item T) R) R
    func SumByInt16x8[T any, R ~int16](collection []T, iteratee func(item T) R) R
    func SumByInt16x16[T any, R ~int16](collection []T, iteratee func(item T) R) R
    func SumByInt16x32[T any, R ~int16](collection []T, iteratee func(item T) R) R
    func SumByInt32x4[T any, R ~int32](collection []T, iteratee func(item T) R) R
    func SumByInt32x8[T any, R ~int32](collection []T, iteratee func(item T) R) R
    func SumByInt32x16[T any, R ~int32](collection []T, iteratee func(item T) R) R
    func SumByInt64x2[T any, R ~int64](collection []T, iteratee func(item T) R) R
    func SumByInt64x4[T any, R ~int64](collection []T, iteratee func(item T) R) R
    func SumByInt64x8[T any, R ~int64](collection []T, iteratee func(item T) R) R
    func SumByUint8x16[T any, R ~uint8](collection []T, iteratee func(item T) R) R
    func SumByUint8x32[T any, R ~uint8](collection []T, iteratee func(item T) R) R
    func SumByUint8x64[T any, R ~uint8](collection []T, iteratee func(item T) R) R
    func SumByUint16x8[T any, R ~uint16](collection []T, iteratee func(item T) R) R
    func SumByUint16x16[T any, R ~uint16](collection []T, iteratee func(item T) R) R
    func SumByUint16x32[T any, R ~uint16](collection []T, iteratee func(item T) R) R
    func SumByUint32x4[T any, R ~uint32](collection []T, iteratee func(item T) R) R
    func SumByUint32x8[T any, R ~uint32](collection []T, iteratee func(item T) R) R
    func SumByUint32x16[T any, R ~uint32](collection []T, iteratee func(item T) R) R
    func SumByUint64x2[T any, R ~uint64](collection []T, iteratee func(item T) R) R
    func SumByUint64x4[T any, R ~uint64](collection []T, iteratee func(item T) R) R
    func SumByUint64x8[T any, R ~uint64](collection []T, iteratee func(item T) R) R
    func SumByFloat32x4[T any, R ~float32](collection []T, iteratee func(item T) R) R
    func SumByFloat32x8[T any, R ~float32](collection []T, iteratee func(item T) R) R
    func SumByFloat32x16[T any, R ~float32](collection []T, iteratee func(item T) R) R
    func SumByFloat64x2[T any, R ~float64](collection []T, iteratee func(item T) R) R
    func SumByFloat64x4[T any, R ~float64](collection []T, iteratee func(item T) R) R
    func SumByFloat64x8[T any, R ~float64](collection []T, iteratee func(item T) R) R
  • Finds the maximum value in a collection using SIMD instructions. The suffix (x2, x4, x8, x16, x32, x64) indicates the number of lanes processed simultaneously.

    Note: Choose the variant matching your CPU's capabilities. Higher lane counts provide better performance but require newer CPU support.

    // Using AVX2 variant (32 lanes at once) - Intel Haswell+ / AMD Excavator+
    max := simd.MaxInt8x32([]int8{5, 2, 8, 1, 9})
    // 9
    // Using AVX-512 variant (16 lanes at once) - Intel Skylake-X+
    max := simd.MaxFloat32x16([]float32{3.5, 1.2, 4.8, 2.1})
    // 4.8
    // Using AVX variant (4 lanes at once) - works on all amd64
    max := simd.MaxInt32x4([]int32{100, 50, 200, 75})
    // 200
    // Empty collection returns 0
    max := simd.MaxUint16x8([]uint16{})
    // 0
    Prototypes:
    func MaxInt8x16[T ~int8](collection []T) T
    func MaxInt8x32[T ~int8](collection []T) T
    func MaxInt8x64[T ~int8](collection []T) T
    func MaxInt16x8[T ~int16](collection []T) T
    func MaxInt16x16[T ~int16](collection []T) T
    func MaxInt16x32[T ~int16](collection []T) T
    func MaxInt32x4[T ~int32](collection []T) T
    func MaxInt32x8[T ~int32](collection []T) T
    func MaxInt32x16[T ~int32](collection []T) T
    func MaxInt64x2[T ~int64](collection []T) T
    func MaxInt64x4[T ~int64](collection []T) T
    func MaxInt64x8[T ~int64](collection []T) T
    func MaxUint8x16[T ~uint8](collection []T) T
    func MaxUint8x32[T ~uint8](collection []T) T
    func MaxUint8x64[T ~uint8](collection []T) T
    func MaxUint16x8[T ~uint16](collection []T) T
    func MaxUint16x16[T ~uint16](collection []T) T
    func MaxUint16x32[T ~uint16](collection []T) T
    func MaxUint32x4[T ~uint32](collection []T) T
    func MaxUint32x8[T ~uint32](collection []T) T
    func MaxUint32x16[T ~uint32](collection []T) T
    func MaxUint64x2[T ~uint64](collection []T) T
    func MaxUint64x4[T ~uint64](collection []T) T
    func MaxUint64x8[T ~uint64](collection []T) T
    func MaxFloat32x4[T ~float32](collection []T) T
    func MaxFloat32x8[T ~float32](collection []T) T
    func MaxFloat32x16[T ~float32](collection []T) T
    func MaxFloat64x2[T ~float64](collection []T) T
    func MaxFloat64x4[T ~float64](collection []T) T
    func MaxFloat64x8[T ~float64](collection []T) T
  • MeanBy transforms a collection using an iteratee function and calculates the arithmetic mean of the result using SIMD instructions. The automatic dispatch functions (e.g., MeanByInt8) will select the best SIMD variant based on CPU capabilities. The specific variants (e.g., MeanByInt8x32) use a fixed SIMD instruction set regardless of CPU capabilities.

    Note: The automatic dispatch functions (e.g., MeanByInt8) will use the best available SIMD variant for the current CPU. Use specific variants (e.g., MeanByInt8x32) only if you know your target CPU supports that instruction set.

    type Person struct {
    Name string
    Age int8
    }

    people := []Person{
    {Name: "Alice", Age: 20},
    {Name: "Bob", Age: 30},
    {Name: "Charlie", Age: 40},
    }

    // Automatic dispatch - uses best available SIMD
    mean := simd.MeanByInt8(people, func(p Person) int8 {
    return p.Age
    })
    // 30
    type Product struct {
    Name string
    Price float32
    }

    products := []Product{
    {Name: "Widget", Price: 10.50},
    {Name: "Gadget", Price: 20.00},
    {Name: "Tool", Price: 15.75},
    }

    // Mean price using specific AVX2 variant
    mean := simd.MeanByFloat32x8(products, func(p Product) float32 {
    return p.Price
    })
    // 15.4167
    type Metric struct {
    Value uint16
    }

    metrics := []Metric{
    {Value: 100},
    {Value: 200},
    {Value: 300},
    {Value: 400},
    }

    // Using AVX variant - works on all amd64
    mean := simd.MeanByUint16x8(metrics, func(m Metric) uint16 {
    return m.Value
    })
    // 250
    // Empty collection returns 0
    type Item struct {
    Count int64
    }

    mean := simd.MeanByInt64([]Item{}, func(i Item) int64 {
    return i.Count
    })
    // 0
    Similar:
    Prototypes:
    func MeanByInt8[T any, R ~int8](collection []T, iteratee func(item T) R) R
    func MeanByInt16[T any, R ~int16](collection []T, iteratee func(item T) R) R
    func MeanByInt32[T any, R ~int32](collection []T, iteratee func(item T) R) R
    func MeanByInt64[T any, R ~int64](collection []T, iteratee func(item T) R) R
    func MeanByUint8[T any, R ~uint8](collection []T, iteratee func(item T) R) R
    func MeanByUint16[T any, R ~uint16](collection []T, iteratee func(item T) R) R
    func MeanByUint32[T any, R ~uint32](collection []T, iteratee func(item T) R) R
    func MeanByUint64[T any, R ~uint64](collection []T, iteratee func(item T) R) R
    func MeanByFloat32[T any, R ~float32](collection []T, iteratee func(item T) R) R
    func MeanByFloat64[T any, R ~float64](collection []T, iteratee func(item T) R) R
    func MeanByInt8x16[T any, R ~int8](collection []T, iteratee func(item T) R) R
    func MeanByInt8x32[T any, R ~int8](collection []T, iteratee func(item T) R) R
    func MeanByInt8x64[T any, R ~int8](collection []T, iteratee func(item T) R) R
    func MeanByInt16x8[T any, R ~int16](collection []T, iteratee func(item T) R) R
    func MeanByInt16x16[T any, R ~int16](collection []T, iteratee func(item T) R) R
    func MeanByInt16x32[T any, R ~int16](collection []T, iteratee func(item T) R) R
    func MeanByInt32x4[T any, R ~int32](collection []T, iteratee func(item T) R) R
    func MeanByInt32x8[T any, R ~int32](collection []T, iteratee func(item T) R) R
    func MeanByInt32x16[T any, R ~int32](collection []T, iteratee func(item T) R) R
    func MeanByInt64x2[T any, R ~int64](collection []T, iteratee func(item T) R) R
    func MeanByInt64x4[T any, R ~int64](collection []T, iteratee func(item T) R) R
    func MeanByInt64x8[T any, R ~int64](collection []T, iteratee func(item T) R) R
    func MeanByUint8x16[T any, R ~uint8](collection []T, iteratee func(item T) R) R
    func MeanByUint8x32[T any, R ~uint8](collection []T, iteratee func(item T) R) R
    func MeanByUint8x64[T any, R ~uint8](collection []T, iteratee func(item T) R) R
    func MeanByUint16x8[T any, R ~uint16](collection []T, iteratee func(item T) R) R
    func MeanByUint16x16[T any, R ~uint16](collection []T, iteratee func(item T) R) R
    func MeanByUint16x32[T any, R ~uint16](collection []T, iteratee func(item T) R) R
    func MeanByUint32x4[T any, R ~uint32](collection []T, iteratee func(item T) R) R
    func MeanByUint32x8[T any, R ~uint32](collection []T, iteratee func(item T) R) R
    func MeanByUint32x16[T any, R ~uint32](collection []T, iteratee func(item T) R) R
    func MeanByUint64x2[T any, R ~uint64](collection []T, iteratee func(item T) R) R
    func MeanByUint64x4[T any, R ~uint64](collection []T, iteratee func(item T) R) R
    func MeanByUint64x8[T any, R ~uint64](collection []T, iteratee func(item T) R) R
    func MeanByFloat32x4[T any, R ~float32](collection []T, iteratee func(item T) R) R
    func MeanByFloat32x8[T any, R ~float32](collection []T, iteratee func(item T) R) R
    func MeanByFloat32x16[T any, R ~float32](collection []T, iteratee func(item T) R) R
    func MeanByFloat64x2[T any, R ~float64](collection []T, iteratee func(item T) R) R
    func MeanByFloat64x4[T any, R ~float64](collection []T, iteratee func(item T) R) R
    func MeanByFloat64x8[T any, R ~float64](collection []T, iteratee func(item T) R) R
  • Clamps each element in a collection between min and max values using SIMD instructions. The suffix (x2, x4, x8, x16, x32, x64) indicates the number of lanes processed simultaneously.

    Note: Choose the variant matching your CPU's capabilities. Higher lane counts provide better performance but require newer CPU support.

    // Using AVX2 variant (32 lanes at once) - Intel Haswell+ / AMD Excavator+
    result := simd.ClampInt8x32([]int8{1, 5, 10, 15, 20}, 5, 15)
    // []int8{5, 5, 10, 15, 15}
    // Using AVX-512 variant (16 lanes at once) - Intel Skylake-X+
    result := simd.ClampFloat32x16([]float32{0.5, 1.5, 2.5, 3.5}, 1.0, 3.0)
    // []float32{1.0, 1.5, 2.5, 3.0}
    // Using AVX variant (8 lanes at once) - works on all amd64
    result := simd.ClampInt16x8([]int16{100, 150, 200, 250}, 120, 220)
    // []int16{120, 150, 200, 220}
    // Empty collection returns empty collection
    result := simd.ClampUint32x4([]uint32{}, 10, 100)
    // []uint32{}
    Prototypes:
    func ClampInt8x16[T ~int8, Slice ~[]T](collection Slice, min, max T) Slice
    func ClampInt8x32[T ~int8, Slice ~[]T](collection Slice, min, max T) Slice
    func ClampInt8x64[T ~int8, Slice ~[]T](collection Slice, min, max T) Slice
    func ClampInt16x8[T ~int16, Slice ~[]T](collection Slice, min, max T) Slice
    func ClampInt16x16[T ~int16, Slice ~[]T](collection Slice, min, max T) Slice
    func ClampInt16x32[T ~int16, Slice ~[]T](collection Slice, min, max T) Slice
    func ClampInt32x4[T ~int32, Slice ~[]T](collection Slice, min, max T) Slice
    func ClampInt32x8[T ~int32, Slice ~[]T](collection Slice, min, max T) Slice
    func ClampInt32x16[T ~int32, Slice ~[]T](collection Slice, min, max T) Slice
    func ClampInt64x2[T ~int64, Slice ~[]T](collection Slice, min, max T) Slice
    func ClampInt64x4[T ~int64, Slice ~[]T](collection Slice, min, max T) Slice
    func ClampInt64x8[T ~int64, Slice ~[]T](collection Slice, min, max T) Slice
    func ClampUint8x16[T ~uint8, Slice ~[]T](collection Slice, min, max T) Slice
    func ClampUint8x32[T ~uint8, Slice ~[]T](collection Slice, min, max T) Slice
    func ClampUint8x64[T ~uint8, Slice ~[]T](collection Slice, min, max T) Slice
    func ClampUint16x8[T ~uint16, Slice ~[]T](collection Slice, min, max T) Slice
    func ClampUint16x16[T ~uint16, Slice ~[]T](collection Slice, min, max T) Slice
    func ClampUint16x32[T ~uint16, Slice ~[]T](collection Slice, min, max T) Slice
    func ClampUint32x4[T ~uint32, Slice ~[]T](collection Slice, min, max T) Slice
    func ClampUint32x8[T ~uint32, Slice ~[]T](collection Slice, min, max T) Slice
    func ClampUint32x16[T ~uint32, Slice ~[]T](collection Slice, min, max T) Slice
    func ClampUint64x2[T ~uint64, Slice ~[]T](collection Slice, min, max T) Slice
    func ClampUint64x4[T ~uint64, Slice ~[]T](collection Slice, min, max T) Slice
    func ClampUint64x8[T ~uint64, Slice ~[]T](collection Slice, min, max T) Slice
    func ClampFloat32x4[T ~float32, Slice ~[]T](collection Slice, min, max T) Slice
    func ClampFloat32x8[T ~float32, Slice ~[]T](collection Slice, min, max T) Slice
    func ClampFloat32x16[T ~float32, Slice ~[]T](collection Slice, min, max T) Slice
    func ClampFloat64x2[T ~float64, Slice ~[]T](collection Slice, min, max T) Slice
    func ClampFloat64x4[T ~float64, Slice ~[]T](collection Slice, min, max T) Slice
    func ClampFloat64x8[T ~float64, Slice ~[]T](collection Slice, min, max T) Slice