Difference between revisions of "Paired single"

From WiiBrew
Jump to navigation Jump to search
(Added complex multiplication)
m (Broadway hardware)
 
(6 intermediate revisions by 4 users not shown)
Line 1: Line 1:
Paired singles are a unique part of the Gekko/[[Hardware/Broadway|Broadway]] processors used in the Gamecube and Wii. They provide fast vector math by keeping two single-precision floating point numbers in a single floating point register, and doing math across registers. This page will demonstrate how these instructions are to be used.
+
Paired singles are a unique part of the Gekko/[[Hardware/Broadway|Broadway]] processors used in the Gamecube and Wii. They provide fast vector math by keeping two single-precision floating point numbers in a single floating point register, and doing math across registers. This page will demonstrate how these instructions work.
  
 
== Quantization and Dequantization ==
 
== Quantization and Dequantization ==
Line 21: Line 21:
 
To load and store Paired-singles, one must use the psq_l and psq_st instructions respectively, or one of their variants.
 
To load and store Paired-singles, one must use the psq_l and psq_st instructions respectively, or one of their variants.
 
=== psq_l ===
 
=== psq_l ===
  psq_l     frD, d(rA), W, I
+
  psq_l     frD, d(rA), W, I
 
This instruction dequantizes values from the memory address in '''d'''+('''rA'''|0) and puts them into PS0 and PS1 in '''frD'''. If '''W''' is 1, however, it only dequantizes one number, and places that into PS0. PS1 is loaded with 1.0 always when '''W''' is 1. '''I''' specifies the GQR to use for dequantization parameters. The two numbers read from the memory are directly after each other, regardless of size (for example, if the GQR specified to load as a u16, you would have '''d'''+('''rA'''|0) point to a two-element array of u16s)
 
This instruction dequantizes values from the memory address in '''d'''+('''rA'''|0) and puts them into PS0 and PS1 in '''frD'''. If '''W''' is 1, however, it only dequantizes one number, and places that into PS0. PS1 is loaded with 1.0 always when '''W''' is 1. '''I''' specifies the GQR to use for dequantization parameters. The two numbers read from the memory are directly after each other, regardless of size (for example, if the GQR specified to load as a u16, you would have '''d'''+('''rA'''|0) point to a two-element array of u16s)
 
===== psq_lx =====
 
===== psq_lx =====
  psq_lx   frD, rA, rB, W, I
+
  psq_lx     frD, rA, rB, W, I
 
This instruction acts exactly like psq_l, except instead of ('''rA''') being offset by '''d''', it is offset by ('''rB''').
 
This instruction acts exactly like psq_l, except instead of ('''rA''') being offset by '''d''', it is offset by ('''rB''').
 
===== psq_lu =====
 
===== psq_lu =====
  psq_lu   frD, d(rA), W, I
+
  psq_lu     frD, d(rA), W, I
 
This instruction acts exactly like psq_l, except '''rA''' cannot be 0, and '''d'''+('''rA''') is placed back into '''rA'''.
 
This instruction acts exactly like psq_l, except '''rA''' cannot be 0, and '''d'''+('''rA''') is placed back into '''rA'''.
 
===== psq_lux =====
 
===== psq_lux =====
  psq_lux   frD, rA, rB, W, I
+
  psq_lux   frD, rA, rB, W, I
This instruction acts exactly like psq_lx, except '''rA''' cannot be 0, and '''d'''+('''rA''') is placed back into '''rA'''.
+
This instruction acts exactly like psq_lx, except '''rA''' cannot be 0, and '''rB'''+('''rA''') is placed back into '''rA'''.
  
 
=== psq_st ===
 
=== psq_st ===
  psq_st   frD, d(rA), W, I
+
  psq_st     frD, d(rA), W, I
 
This instruction quantizes values from the Paired Singles in '''frD''' and places them in the memory address in '''d'''+('''rA'''|0). If '''W''' is 1, however, it only quantizes PS0. '''I''' specifies the GQR to use for dequantization parameters. The two numbers written to memory are directly after each other, regardless of size (for example, if the GQR specified to store as a u16, '''d'''+('''rA'''|0) would be treated as a two-element array of u16s)
 
This instruction quantizes values from the Paired Singles in '''frD''' and places them in the memory address in '''d'''+('''rA'''|0). If '''W''' is 1, however, it only quantizes PS0. '''I''' specifies the GQR to use for dequantization parameters. The two numbers written to memory are directly after each other, regardless of size (for example, if the GQR specified to store as a u16, '''d'''+('''rA'''|0) would be treated as a two-element array of u16s)
 
===== psq_stx =====
 
===== psq_stx =====
  psq_stx   frD, rA, rB, W, I
+
  psq_stx   frD, rA, rB, W, I
 
This instruction acts exactly like psq_st, except instead of ('''rA''') being offset by '''d''', it is offset by ('''rB''').
 
This instruction acts exactly like psq_st, except instead of ('''rA''') being offset by '''d''', it is offset by ('''rB''').
 
===== psq_stu =====
 
===== psq_stu =====
  psq_stu   frD, d(rA), W, I
+
  psq_stu   frD, d(rA), W, I
 
This instruction acts exactly like psq_st, except '''rA''' cannot be 0, and '''d'''+('''rA''') is placed back into '''rA'''.
 
This instruction acts exactly like psq_st, except '''rA''' cannot be 0, and '''d'''+('''rA''') is placed back into '''rA'''.
 
===== psq_stux =====
 
===== psq_stux =====
  psq_stux frD, rA, rB, W, I
+
  psq_stux   frD, rA, rB, W, I
This instruction acts exactly like psq_stx, except '''rA''' cannot be 0, and '''d'''+('''rA''') is placed back into '''rA'''.
+
This instruction acts exactly like psq_stx, except '''rA''' cannot be 0, and '''rB'''+('''rA''') is placed back into '''rA'''.
  
 
== Single Parameter Operations ==
 
== Single Parameter Operations ==
 
These functions operate on one FPR.
 
These functions operate on one FPR.
 
=== ps_abs ===
 
=== ps_abs ===
  ps_abs   frD, frB
+
Single floating-point absolute value on both ps0 and ps1.
 +
  ps_abs     frD, frB
  
 
  frD(ps0) = abs(frB(ps0))
 
  frD(ps0) = abs(frB(ps0))
Line 55: Line 56:
  
 
=== ps_mr ===
 
=== ps_mr ===
  ps_mr     frD, frB
+
Move both ps0 and ps1 from one fpr to another.
 +
  ps_mr     frD, frB
  
 
  frD(ps0) = frB(ps0)
 
  frD(ps0) = frB(ps0)
Line 61: Line 63:
  
 
=== ps_nabs ===
 
=== ps_nabs ===
  ps_nabs   frD, frB
+
Single floating-point negative abs value on both ps0 and ps1.
 +
  ps_nabs   frD, frB
  
 
  frD(ps0) = -abs(frB(ps0))
 
  frD(ps0) = -abs(frB(ps0))
Line 67: Line 70:
  
 
=== ps_neg ===
 
=== ps_neg ===
  ps_neg   frD, frB
+
Single floating-point negate on both ps0 and ps1.
 +
  ps_neg     frD, frB
  
 
  frD(ps0) = -frB(ps0)
 
  frD(ps0) = -frB(ps0)
Line 73: Line 77:
  
 
=== ps_res ===
 
=== ps_res ===
  ps_res   frD, frB
+
Reciprocal of ps0 and ps1.
 +
  ps_res     frD, frB
  
 
  frD(ps0) = -1/frB(ps0)
 
  frD(ps0) = -1/frB(ps0)
Line 80: Line 85:
  
 
=== ps_rsqrte ===
 
=== ps_rsqrte ===
  ps_rsqrte frD, frB
+
Single floating-point reciprocal sqrt estimate.
 +
  ps_rsqrte frD, frB
  
 
  frD(ps0) = -1/sqrt(frB(ps0))
 
  frD(ps0) = -1/sqrt(frB(ps0))
Line 89: Line 95:
 
Simple everyday math.
 
Simple everyday math.
 
=== ps_add ===
 
=== ps_add ===
  ps_add   frD, frA, frB
+
Single floating-point add on both ps0 and ps1.
 +
  ps_add     frD, frA, frB
  
 
  frD(ps0) = frA(ps0) + frB(ps0)
 
  frD(ps0) = frA(ps0) + frB(ps0)
 
  frD(ps1) = frA(ps1) + frB(ps1)
 
  frD(ps1) = frA(ps1) + frB(ps1)
  
=== ps_div ===
+
=== ps_sub ===
  ps_div    frD, frA, frB
+
Single floating-point subtract on both ps0 and ps1.
 +
  ps_sub    frD, frA, frB
  
  frD(ps0) = frA(ps0) / frB(ps0)
+
  frD(ps0) = frA(ps0) - frB(ps0)
  frD(ps1) = frA(ps1) / frB(ps1)
+
  frD(ps1) = frA(ps1) - frB(ps1)
  
 
=== ps_mul ===
 
=== ps_mul ===
  ps_mul   frD, frA, frC
+
Single floating-point multiply on both ps0 and ps1.
 +
  ps_mul     frD, frA, frC
  
 
  frD(ps0) = frA(ps0) * frC(ps0)
 
  frD(ps0) = frA(ps0) * frC(ps0)
 
  frD(ps1) = frA(ps1) * frC(ps1)
 
  frD(ps1) = frA(ps1) * frC(ps1)
  
=== ps_sub ===
+
=== ps_div ===
  ps_sub    frD, frA, frB
+
Single floating-point divide on both ps0 and ps1.
 +
  ps_div    frD, frA, frB
  
  frD(ps0) = frA(ps0) - frB(ps0)
+
  frD(ps0) = frA(ps0) / frB(ps0)
  frD(ps1) = frA(ps1) - frB(ps1)
+
  frD(ps1) = frA(ps1) / frB(ps1)
  
 
== Comparison ==
 
== Comparison ==
 
=== ps_cmpo0 ===
 
=== ps_cmpo0 ===
  ps_cmpo0 crfD, frA, frB
+
Ordered compare of ps0 values.
  ps_cmpu0 crfD, frA, frB
+
  ps_cmpo0   crfD, frA, frB
 +
  ps_cmpu0   crfD, frA, frB
  
 
  cfrD = frA(ps0) compare frB(ps0)
 
  cfrD = frA(ps0) compare frB(ps0)
  
 
=== ps_cmpo1 ===
 
=== ps_cmpo1 ===
  ps_cmpo1 crfD, frA, frB
+
Ordered compare of ps1 values.
  ps_cmpu1 crfD, frA, frB
+
  ps_cmpo1   crfD, frA, frB
 +
  ps_cmpu1   crfD, frA, frB
  
 
  cfrD = frA(ps1) compare frB(ps1)
 
  cfrD = frA(ps1) compare frB(ps1)
Line 128: Line 140:
 
These instructions multiply in complex ways
 
These instructions multiply in complex ways
 
=== ps_madd ===
 
=== ps_madd ===
  ps_madd   frD, frA, frC, frB
+
Single floating-point madd on both ps0 and ps1.
 +
  ps_madd   frD, frA, frC, frB
  
 
  frD(ps0) = frA(ps0) * frC(ps0) + frB(ps0)
 
  frD(ps0) = frA(ps0) * frC(ps0) + frB(ps0)
Line 134: Line 147:
  
 
=== ps_madds0 ===
 
=== ps_madds0 ===
  ps_madds0 frD, frA, frC, frB
+
Scalar-vector multiply-add using ps0 for scalar.
 +
  ps_madds0 frD, frA, frC, frB
  
 
  frD(ps0) = frA(ps0) * frC(ps0) + frB(ps0)
 
  frD(ps0) = frA(ps0) * frC(ps0) + frB(ps0)
Line 140: Line 154:
  
 
=== ps_madds1 ===
 
=== ps_madds1 ===
  ps_madds1 frD, frA, frC, frB
+
Scalar-vector multiply-add using ps1 for scalar.
 +
  ps_madds1 frD, frA, frC, frB
  
 
  frD(ps0) = frA(ps0) * frC(ps1) + frB(ps0)
 
  frD(ps0) = frA(ps0) * frC(ps1) + frB(ps0)
Line 146: Line 161:
  
 
=== ps_msub ===
 
=== ps_msub ===
  ps_msub   frD, frA, frC, frB
+
Single floating-point msub on both ps0 and ps1.
 +
  ps_msub   frD, frA, frC, frB
  
 
  frD(ps0) = frA(ps0) * frC(ps0) - frB(ps0)
 
  frD(ps0) = frA(ps0) * frC(ps0) - frB(ps0)
Line 152: Line 168:
  
 
=== ps_muls0 ===
 
=== ps_muls0 ===
  ps_muls0 frD, frA, frC
+
Scalar-vector multiply using ps0 for scalar.
 +
  ps_muls0   frD, frA, frC
  
 
  frD(ps0) = frA(ps0) * frC(ps0)
 
  frD(ps0) = frA(ps0) * frC(ps0)
Line 158: Line 175:
  
 
=== ps_muls1 ===
 
=== ps_muls1 ===
  ps_muls1 frD, frA, frC
+
Scalar-vector multiply using ps1 for scalar.
 +
  ps_muls1   frD, frA, frC
  
 
  frD(ps0) = frA(ps0) * frC(ps1)
 
  frD(ps0) = frA(ps0) * frC(ps1)
Line 164: Line 182:
  
 
=== ps_nmadd ===
 
=== ps_nmadd ===
  ps_nmadd frD, frA, frC, frB
+
Single floating-point nmadd on both ps0 and ps1.
 +
  ps_nmadd   frD, frA, frC, frB
  
 
  frD(ps0) = -(frA(ps0) * frC(ps0) + frB(ps0))
 
  frD(ps0) = -(frA(ps0) * frC(ps0) + frB(ps0))
Line 170: Line 189:
  
 
=== ps_nmsub ===
 
=== ps_nmsub ===
  ps_nmsub frD, frA, frC, frB
+
Single floating-point nmsub on both ps0 and ps1.
 +
  ps_nmsub   frD, frA, frC, frB
  
 
  frD(ps0) = -(frA(ps0) * frC(ps0) - frB(ps0))
 
  frD(ps0) = -(frA(ps0) * frC(ps0) - frB(ps0))
 
  frD(ps1) = -(frA(ps1) * frC(ps1) - frB(ps1))
 
  frD(ps1) = -(frA(ps1) * frC(ps1) - frB(ps1))
 +
 +
== Miscellaneous ==
 +
Whatever doesn't fit into the other categories.
 +
=== ps_merge00 ===
 +
Register move allowing swap/merge of ps0 values.
 +
ps_merge00 frD, frA, frB
 +
 +
frD(ps0) = frA(ps0)
 +
frD(ps1) = frB(ps0)
 +
 +
=== ps_merge01 ===
 +
Register move allowing swap/merge of ps0 and ps1 values.
 +
ps_merge01 frD, frA, frB
 +
 +
frD(ps0) = frA(ps0)
 +
frD(ps1) = frB(ps1)
 +
 +
=== ps_merge10 ===
 +
Register move allowing swap/merge of ps1 and ps0 values.
 +
ps_merge10 frD, frA, frB
 +
 +
frD(ps0) = frA(ps1)
 +
frD(ps1) = frB(ps0)
 +
 +
=== ps_merge11 ===
 +
Register move allowing swap/merge of ps0 values.
 +
ps_merge11 frD, frA, frB
 +
 +
frD(ps0) = frA(ps1)
 +
frD(ps1) = frB(ps1)
 +
 +
=== ps_sel ===
 +
Single floating-point select on both ps0 and ps1.
 +
ps_sel    frD, frA, frC, frB
 +
 +
if(frA(ps0) >= 0)
 +
        frD(ps0) = frC(ps0)
 +
else
 +
        frD(ps0) = frB(ps0)
 +
if(frA(ps1) >= 0)
 +
        frD(ps1) = frC(ps1)
 +
else
 +
        frD(ps1) = frB(ps1)
 +
 +
=== ps_sum0 ===
 +
Add a ps0 value to a ps1 value, result in ps0.
 +
ps_sum0    frD, frA, frC, frB
 +
 +
frD(ps0) = frA(ps0) + frB(ps1)
 +
frD(ps1) = frC(ps1)
 +
 +
=== ps_sum1 ===
 +
Add a ps0 value to a ps1 value, result in ps1.
 +
ps_sum1    frD, frA, frC, frB
 +
 +
frD(ps0) = frC(ps0)
 +
frD(ps1) = frA(ps0) + frB(ps1)
 +
 +
[[Category:Broadway Hardware]]

Latest revision as of 22:07, 18 March 2021

Paired singles are a unique part of the Gekko/Broadway processors used in the Gamecube and Wii. They provide fast vector math by keeping two single-precision floating point numbers in a single floating point register, and doing math across registers. This page will demonstrate how these instructions work.

Quantization and Dequantization

All numbers must be quantized before being put into Paired Singles. For conversion from non-floats, in order to allow for greater flexibility, there is a form of scaling implemented. All quantization is controlled by the GQRs (Graphics Quantization Registers). The GQRs are 32bit registers containing the conversion types and scaling factors for storing and loading. (During loading, it dequantizes. During storing, it quantizes.)

GQR
  31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
Access U R/W U R/W
Field L_Scale L_Type
  15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Access U R/W U R/W
Field S_Scale S_Type
Field Description
L_* Values for dequantization.
S_* Values for quantization.
Scale Signed. During dequantization divide the number by (2^scale). During quantization, multiply the number by (2^scale).
Type 0: Float (this does no scaling during de/quantization), 4: Unsigned 8bit, 5: Unsigned 16bit, 6: Signed 8bit, 7: Signed 16bit.


Loading and Storing

To load and store Paired-singles, one must use the psq_l and psq_st instructions respectively, or one of their variants.

psq_l

psq_l      frD, d(rA), W, I

This instruction dequantizes values from the memory address in d+(rA|0) and puts them into PS0 and PS1 in frD. If W is 1, however, it only dequantizes one number, and places that into PS0. PS1 is loaded with 1.0 always when W is 1. I specifies the GQR to use for dequantization parameters. The two numbers read from the memory are directly after each other, regardless of size (for example, if the GQR specified to load as a u16, you would have d+(rA|0) point to a two-element array of u16s)

psq_lx
psq_lx     frD, rA, rB, W, I

This instruction acts exactly like psq_l, except instead of (rA) being offset by d, it is offset by (rB).

psq_lu
psq_lu     frD, d(rA), W, I

This instruction acts exactly like psq_l, except rA cannot be 0, and d+(rA) is placed back into rA.

psq_lux
psq_lux    frD, rA, rB, W, I

This instruction acts exactly like psq_lx, except rA cannot be 0, and rB+(rA) is placed back into rA.

psq_st

psq_st     frD, d(rA), W, I

This instruction quantizes values from the Paired Singles in frD and places them in the memory address in d+(rA|0). If W is 1, however, it only quantizes PS0. I specifies the GQR to use for dequantization parameters. The two numbers written to memory are directly after each other, regardless of size (for example, if the GQR specified to store as a u16, d+(rA|0) would be treated as a two-element array of u16s)

psq_stx
psq_stx    frD, rA, rB, W, I

This instruction acts exactly like psq_st, except instead of (rA) being offset by d, it is offset by (rB).

psq_stu
psq_stu    frD, d(rA), W, I

This instruction acts exactly like psq_st, except rA cannot be 0, and d+(rA) is placed back into rA.

psq_stux
psq_stux   frD, rA, rB, W, I

This instruction acts exactly like psq_stx, except rA cannot be 0, and rB+(rA) is placed back into rA.

Single Parameter Operations

These functions operate on one FPR.

ps_abs

Single floating-point absolute value on both ps0 and ps1.

ps_abs     frD, frB
frD(ps0) = abs(frB(ps0))
frD(ps1) = abs(frB(ps1))

ps_mr

Move both ps0 and ps1 from one fpr to another.

ps_mr      frD, frB
frD(ps0) = frB(ps0)
frD(ps1) = frB(ps1)

ps_nabs

Single floating-point negative abs value on both ps0 and ps1.

ps_nabs    frD, frB
frD(ps0) = -abs(frB(ps0))
frD(ps1) = -abs(frB(ps1))

ps_neg

Single floating-point negate on both ps0 and ps1.

ps_neg     frD, frB
frD(ps0) = -frB(ps0)
frD(ps1) = -frB(ps1)

ps_res

Reciprocal of ps0 and ps1.

ps_res     frD, frB
frD(ps0) = -1/frB(ps0)
frD(ps1) = -1/frB(ps1)

Accurate to a precision of 1/4096.

ps_rsqrte

Single floating-point reciprocal sqrt estimate.

ps_rsqrte  frD, frB
frD(ps0) = -1/sqrt(frB(ps0))
frD(ps1) = -1/sqrt(frB(ps1))

Accurate to a precision of 1/4096.

Basic Math

Simple everyday math.

ps_add

Single floating-point add on both ps0 and ps1.

ps_add     frD, frA, frB
frD(ps0) = frA(ps0) + frB(ps0)
frD(ps1) = frA(ps1) + frB(ps1)

ps_sub

Single floating-point subtract on both ps0 and ps1.

ps_sub     frD, frA, frB
frD(ps0) = frA(ps0) - frB(ps0)
frD(ps1) = frA(ps1) - frB(ps1)

ps_mul

Single floating-point multiply on both ps0 and ps1.

ps_mul     frD, frA, frC
frD(ps0) = frA(ps0) * frC(ps0)
frD(ps1) = frA(ps1) * frC(ps1)

ps_div

Single floating-point divide on both ps0 and ps1.

ps_div     frD, frA, frB
frD(ps0) = frA(ps0) / frB(ps0)
frD(ps1) = frA(ps1) / frB(ps1)

Comparison

ps_cmpo0

Ordered compare of ps0 values.

ps_cmpo0   crfD, frA, frB
ps_cmpu0   crfD, frA, frB
cfrD = frA(ps0) compare frB(ps0)

ps_cmpo1

Ordered compare of ps1 values.

ps_cmpo1   crfD, frA, frB
ps_cmpu1   crfD, frA, frB
cfrD = frA(ps1) compare frB(ps1)

Complex Multiply

These instructions multiply in complex ways

ps_madd

Single floating-point madd on both ps0 and ps1.

ps_madd    frD, frA, frC, frB
frD(ps0) = frA(ps0) * frC(ps0) + frB(ps0)
frD(ps1) = frA(ps1) * frC(ps1) + frB(ps1)

ps_madds0

Scalar-vector multiply-add using ps0 for scalar.

ps_madds0  frD, frA, frC, frB
frD(ps0) = frA(ps0) * frC(ps0) + frB(ps0)
frD(ps1) = frA(ps1) * frC(ps0) + frB(ps1)

ps_madds1

Scalar-vector multiply-add using ps1 for scalar.

ps_madds1  frD, frA, frC, frB
frD(ps0) = frA(ps0) * frC(ps1) + frB(ps0)
frD(ps1) = frA(ps1) * frC(ps1) + frB(ps1)

ps_msub

Single floating-point msub on both ps0 and ps1.

ps_msub    frD, frA, frC, frB
frD(ps0) = frA(ps0) * frC(ps0) - frB(ps0)
frD(ps1) = frA(ps1) * frC(ps1) - frB(ps1)

ps_muls0

Scalar-vector multiply using ps0 for scalar.

ps_muls0   frD, frA, frC
frD(ps0) = frA(ps0) * frC(ps0)
frD(ps1) = frA(ps1) * frC(ps0)

ps_muls1

Scalar-vector multiply using ps1 for scalar.

ps_muls1   frD, frA, frC
frD(ps0) = frA(ps0) * frC(ps1)
frD(ps1) = frA(ps1) * frC(ps1)

ps_nmadd

Single floating-point nmadd on both ps0 and ps1.

ps_nmadd   frD, frA, frC, frB
frD(ps0) = -(frA(ps0) * frC(ps0) + frB(ps0))
frD(ps1) = -(frA(ps1) * frC(ps1) + frB(ps1))

ps_nmsub

Single floating-point nmsub on both ps0 and ps1.

ps_nmsub   frD, frA, frC, frB
frD(ps0) = -(frA(ps0) * frC(ps0) - frB(ps0))
frD(ps1) = -(frA(ps1) * frC(ps1) - frB(ps1))

Miscellaneous

Whatever doesn't fit into the other categories.

ps_merge00

Register move allowing swap/merge of ps0 values.

ps_merge00 frD, frA, frB
frD(ps0) = frA(ps0)
frD(ps1) = frB(ps0)

ps_merge01

Register move allowing swap/merge of ps0 and ps1 values.

ps_merge01 frD, frA, frB
frD(ps0) = frA(ps0)
frD(ps1) = frB(ps1)

ps_merge10

Register move allowing swap/merge of ps1 and ps0 values.

ps_merge10 frD, frA, frB
frD(ps0) = frA(ps1)
frD(ps1) = frB(ps0)

ps_merge11

Register move allowing swap/merge of ps0 values.

ps_merge11 frD, frA, frB
frD(ps0) = frA(ps1)
frD(ps1) = frB(ps1)

ps_sel

Single floating-point select on both ps0 and ps1.

ps_sel     frD, frA, frC, frB
if(frA(ps0) >= 0)
        frD(ps0) = frC(ps0)
else
        frD(ps0) = frB(ps0)
if(frA(ps1) >= 0)
        frD(ps1) = frC(ps1)
else
        frD(ps1) = frB(ps1)

ps_sum0

Add a ps0 value to a ps1 value, result in ps0.

ps_sum0    frD, frA, frC, frB
frD(ps0) = frA(ps0) + frB(ps1)
frD(ps1) = frC(ps1)

ps_sum1

Add a ps0 value to a ps1 value, result in ps1.

ps_sum1    frD, frA, frC, frB
frD(ps0) = frC(ps0)
frD(ps1) = frA(ps0) + frB(ps1)