Difference between revisions of "Paired single"

From WiiBrew
Jump to navigation Jump to search
m (more typo :[)
(Added complex multiplication)
Line 50: Line 50:
 
=== ps_abs ===
 
=== ps_abs ===
 
  ps_abs    frD, frB
 
  ps_abs    frD, frB
This instruction gets the absolute values of both paired-singles in '''frB''', and stores them in the paired-singles in '''frD'''.
+
 
 +
frD(ps0) = abs(frB(ps0))
 +
frD(ps1) = abs(frB(ps1))
 +
 
 
=== ps_mr ===
 
=== ps_mr ===
 
  ps_mr    frD, frB
 
  ps_mr    frD, frB
This instruction moves both paired-singles in '''frB''' into the paired-singles in '''frD'''.
+
 
 +
frD(ps0) = frB(ps0)
 +
frD(ps1) = frB(ps1)
 +
 
 
=== ps_nabs ===
 
=== ps_nabs ===
 
  ps_nabs  frD, frB
 
  ps_nabs  frD, frB
This instruction gets the negative absolute values of both paired-singles in '''frB''', and stores them in the paired-singles in '''frD'''.
+
 
 +
frD(ps0) = -abs(frB(ps0))
 +
frD(ps1) = -abs(frB(ps1))
 +
 
 
=== ps_neg ===
 
=== ps_neg ===
 
  ps_neg    frD, frB
 
  ps_neg    frD, frB
This instruction negates the values of both paired-singles in '''frB''', and stores them in the paired-singles in '''frD'''.
+
 
 +
frD(ps0) = -frB(ps0)
 +
frD(ps1) = -frB(ps1)
 +
 
 
=== ps_res ===
 
=== ps_res ===
 
  ps_res    frD, frB
 
  ps_res    frD, frB
This instruction gets an estimate of the reciprocals of both paired-singles in '''frB''' accurate to a precision of 1/4096, and stores them in the paired-singles in '''frD'''.
+
 
 +
frD(ps0) = -1/frB(ps0)
 +
frD(ps1) = -1/frB(ps1)
 +
Accurate to a precision of 1/4096.
 +
 
 
=== ps_rsqrte ===
 
=== ps_rsqrte ===
 
  ps_rsqrte frD, frB
 
  ps_rsqrte frD, frB
This instruction gets an estimate of the reciprocals of the square roots of both paired-singles in '''frB''' accurate to a precision of 1/4096, and stores them in the paired-singles in '''frD'''.
+
 
 +
frD(ps0) = -1/sqrt(frB(ps0))
 +
frD(ps1) = -1/sqrt(frB(ps1))
 +
Accurate to a precision of 1/4096.
  
 
== Basic Math ==
 
== Basic Math ==
Line 71: Line 90:
 
=== ps_add ===
 
=== ps_add ===
 
  ps_add    frD, frA, frB
 
  ps_add    frD, frA, frB
This instruction adds the paired-singles in '''frA''' to the ones in '''frB''', then stores the results in the paired-singles in '''frD'''.
+
 
 +
frD(ps0) = frA(ps0) + frB(ps0)
 +
frD(ps1) = frA(ps1) + frB(ps1)
 +
 
 
=== ps_div ===
 
=== ps_div ===
 
  ps_div    frD, frA, frB
 
  ps_div    frD, frA, frB
This instruction divides the paired-singles in '''frA''' by the ones in '''frB''', then stores the results in the paired-singles in '''frD'''.
+
 
 +
frD(ps0) = frA(ps0) / frB(ps0)
 +
frD(ps1) = frA(ps1) / frB(ps1)
 +
 
 
=== ps_mul ===
 
=== ps_mul ===
 
  ps_mul    frD, frA, frC
 
  ps_mul    frD, frA, frC
This instruction multiplies the paired-singles in '''frA''' by the ones in '''frC''', then stores the results in the paired-singles in '''frD'''.
+
 
 +
frD(ps0) = frA(ps0) * frC(ps0)
 +
frD(ps1) = frA(ps1) * frC(ps1)
 +
 
 
=== ps_sub ===
 
=== ps_sub ===
 
  ps_sub    frD, frA, frB
 
  ps_sub    frD, frA, frB
This instruction subtracts the paired-singles in '''frB''' from the ones in '''frA''', then stores the results in the paired-singles in '''frD'''.
+
 
 +
frD(ps0) = frA(ps0) - frB(ps0)
 +
frD(ps1) = frA(ps1) - frB(ps1)
 +
 
 +
== Comparison ==
 +
=== ps_cmpo0 ===
 +
ps_cmpo0  crfD, frA, frB
 +
ps_cmpu0  crfD, frA, frB
 +
 
 +
cfrD = frA(ps0) compare frB(ps0)
 +
 
 +
=== ps_cmpo1 ===
 +
ps_cmpo1  crfD, frA, frB
 +
ps_cmpu1  crfD, frA, frB
 +
 
 +
cfrD = frA(ps1) compare frB(ps1)
 +
 
 +
== Complex Multiply ==
 +
These instructions multiply in complex ways
 +
=== ps_madd ===
 +
ps_madd  frD, frA, frC, frB
 +
 
 +
frD(ps0) = frA(ps0) * frC(ps0) + frB(ps0)
 +
frD(ps1) = frA(ps1) * frC(ps1) + frB(ps1)
 +
 
 +
=== ps_madds0 ===
 +
ps_madds0 frD, frA, frC, frB
 +
 
 +
frD(ps0) = frA(ps0) * frC(ps0) + frB(ps0)
 +
frD(ps1) = frA(ps1) * frC(ps0) + frB(ps1)
 +
 
 +
=== ps_madds1 ===
 +
ps_madds1 frD, frA, frC, frB
 +
 
 +
frD(ps0) = frA(ps0) * frC(ps1) + frB(ps0)
 +
frD(ps1) = frA(ps1) * frC(ps1) + frB(ps1)
 +
 
 +
=== ps_msub ===
 +
ps_msub  frD, frA, frC, frB
 +
 
 +
frD(ps0) = frA(ps0) * frC(ps0) - frB(ps0)
 +
frD(ps1) = frA(ps1) * frC(ps1) - frB(ps1)
 +
 
 +
=== ps_muls0 ===
 +
ps_muls0  frD, frA, frC
 +
 
 +
frD(ps0) = frA(ps0) * frC(ps0)
 +
frD(ps1) = frA(ps1) * frC(ps0)
 +
 
 +
=== ps_muls1 ===
 +
ps_muls1  frD, frA, frC
 +
 
 +
frD(ps0) = frA(ps0) * frC(ps1)
 +
frD(ps1) = frA(ps1) * frC(ps1)
 +
 
 +
=== ps_nmadd ===
 +
ps_nmadd  frD, frA, frC, frB
 +
 
 +
frD(ps0) = -(frA(ps0) * frC(ps0) + frB(ps0))
 +
frD(ps1) = -(frA(ps1) * frC(ps1) + frB(ps1))
 +
 
 +
=== ps_nmsub ===
 +
ps_nmsub  frD, frA, frC, frB
 +
 
 +
frD(ps0) = -(frA(ps0) * frC(ps0) - frB(ps0))
 +
frD(ps1) = -(frA(ps1) * frC(ps1) - frB(ps1))

Revision as of 22:42, 10 July 2010

Paired singles are a unique part of the Gekko/Broadway processors used in the Gamecube and Wii. They provide fast vector math by keeping two single-precision floating point numbers in a single floating point register, and doing math across registers. This page will demonstrate how these instructions are to be used.

Quantization and Dequantization

All numbers must be quantized before being put into Paired Singles. For conversion from non-floats, in order to allow for greater flexibility, there is a form of scaling implemented. All quantization is controlled by the GQRs (Graphics Quantization Registers). The GQRs are 32bit registers containing the conversion types and scaling factors for storing and loading. (During loading, it dequantizes. During storing, it quantizes.)

GQR
  31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
Access U R/W U R/W
Field L_Scale L_Type
  15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Access U R/W U R/W
Field S_Scale S_Type
Field Description
L_* Values for dequantization.
S_* Values for quantization.
Scale Signed. During dequantization divide the number by (2^scale). During quantization, multiply the number by (2^scale).
Type 0: Float (this does no scaling during de/quantization), 4: Unsigned 8bit, 5: Unsigned 16bit, 6: Signed 8bit, 7: Signed 16bit.


Loading and Storing

To load and store Paired-singles, one must use the psq_l and psq_st instructions respectively, or one of their variants.

psq_l

psq_l     frD, d(rA), W, I

This instruction dequantizes values from the memory address in d+(rA|0) and puts them into PS0 and PS1 in frD. If W is 1, however, it only dequantizes one number, and places that into PS0. PS1 is loaded with 1.0 always when W is 1. I specifies the GQR to use for dequantization parameters. The two numbers read from the memory are directly after each other, regardless of size (for example, if the GQR specified to load as a u16, you would have d+(rA|0) point to a two-element array of u16s)

psq_lx
psq_lx    frD, rA, rB, W, I

This instruction acts exactly like psq_l, except instead of (rA) being offset by d, it is offset by (rB).

psq_lu
psq_lu    frD, d(rA), W, I

This instruction acts exactly like psq_l, except rA cannot be 0, and d+(rA) is placed back into rA.

psq_lux
psq_lux   frD, rA, rB, W, I

This instruction acts exactly like psq_lx, except rA cannot be 0, and d+(rA) is placed back into rA.

psq_st

psq_st    frD, d(rA), W, I

This instruction quantizes values from the Paired Singles in frD and places them in the memory address in d+(rA|0). If W is 1, however, it only quantizes PS0. I specifies the GQR to use for dequantization parameters. The two numbers written to memory are directly after each other, regardless of size (for example, if the GQR specified to store as a u16, d+(rA|0) would be treated as a two-element array of u16s)

psq_stx
psq_stx   frD, rA, rB, W, I

This instruction acts exactly like psq_st, except instead of (rA) being offset by d, it is offset by (rB).

psq_stu
psq_stu   frD, d(rA), W, I

This instruction acts exactly like psq_st, except rA cannot be 0, and d+(rA) is placed back into rA.

psq_stux
psq_stux  frD, rA, rB, W, I

This instruction acts exactly like psq_stx, except rA cannot be 0, and d+(rA) is placed back into rA.

Single Parameter Operations

These functions operate on one FPR.

ps_abs

ps_abs    frD, frB
frD(ps0) = abs(frB(ps0))
frD(ps1) = abs(frB(ps1))

ps_mr

ps_mr     frD, frB
frD(ps0) = frB(ps0)
frD(ps1) = frB(ps1)

ps_nabs

ps_nabs   frD, frB
frD(ps0) = -abs(frB(ps0))
frD(ps1) = -abs(frB(ps1))

ps_neg

ps_neg    frD, frB
frD(ps0) = -frB(ps0)
frD(ps1) = -frB(ps1)

ps_res

ps_res    frD, frB
frD(ps0) = -1/frB(ps0)
frD(ps1) = -1/frB(ps1)

Accurate to a precision of 1/4096.

ps_rsqrte

ps_rsqrte frD, frB
frD(ps0) = -1/sqrt(frB(ps0))
frD(ps1) = -1/sqrt(frB(ps1))

Accurate to a precision of 1/4096.

Basic Math

Simple everyday math.

ps_add

ps_add    frD, frA, frB
frD(ps0) = frA(ps0) + frB(ps0)
frD(ps1) = frA(ps1) + frB(ps1)

ps_div

ps_div    frD, frA, frB
frD(ps0) = frA(ps0) / frB(ps0)
frD(ps1) = frA(ps1) / frB(ps1)

ps_mul

ps_mul    frD, frA, frC
frD(ps0) = frA(ps0) * frC(ps0)
frD(ps1) = frA(ps1) * frC(ps1)

ps_sub

ps_sub    frD, frA, frB
frD(ps0) = frA(ps0) - frB(ps0)
frD(ps1) = frA(ps1) - frB(ps1)

Comparison

ps_cmpo0

ps_cmpo0  crfD, frA, frB
ps_cmpu0  crfD, frA, frB
cfrD = frA(ps0) compare frB(ps0)

ps_cmpo1

ps_cmpo1  crfD, frA, frB
ps_cmpu1  crfD, frA, frB
cfrD = frA(ps1) compare frB(ps1)

Complex Multiply

These instructions multiply in complex ways

ps_madd

ps_madd   frD, frA, frC, frB
frD(ps0) = frA(ps0) * frC(ps0) + frB(ps0)
frD(ps1) = frA(ps1) * frC(ps1) + frB(ps1)

ps_madds0

ps_madds0 frD, frA, frC, frB
frD(ps0) = frA(ps0) * frC(ps0) + frB(ps0)
frD(ps1) = frA(ps1) * frC(ps0) + frB(ps1)

ps_madds1

ps_madds1 frD, frA, frC, frB
frD(ps0) = frA(ps0) * frC(ps1) + frB(ps0)
frD(ps1) = frA(ps1) * frC(ps1) + frB(ps1)

ps_msub

ps_msub   frD, frA, frC, frB
frD(ps0) = frA(ps0) * frC(ps0) - frB(ps0)
frD(ps1) = frA(ps1) * frC(ps1) - frB(ps1)

ps_muls0

ps_muls0  frD, frA, frC
frD(ps0) = frA(ps0) * frC(ps0)
frD(ps1) = frA(ps1) * frC(ps0)

ps_muls1

ps_muls1  frD, frA, frC
frD(ps0) = frA(ps0) * frC(ps1)
frD(ps1) = frA(ps1) * frC(ps1)

ps_nmadd

ps_nmadd  frD, frA, frC, frB
frD(ps0) = -(frA(ps0) * frC(ps0) + frB(ps0))
frD(ps1) = -(frA(ps1) * frC(ps1) + frB(ps1))

ps_nmsub

ps_nmsub  frD, frA, frC, frB
frD(ps0) = -(frA(ps0) * frC(ps0) - frB(ps0))
frD(ps1) = -(frA(ps1) * frC(ps1) - frB(ps1))