PMADDWD--Packed Multiply and Add

Opcode

Instruction

Description

0F F5 /r

PMADDWD mm, mm/m64

Multiply the packed words in mm by the packed words in mm/m64. Add the 32-bit pairs of results and store in mm as doubleword

66 0F F5 /r

PMADDWD xmm1, xmm2/m128

Multiply the packed word integers in xmm1 by the packed word integers in xmm2/m128, and add the adjacent doubleword results.

Description

Multiplies the individual signed words of the destination operand (first operand) by the corresponding signed words of the source operand (second operand), producing temporary signed, doubleword results. The adjacent doubleword results are then summed and stored in the destination operand. For example, the corresponding low-order words (15-0) and (31-16) in the source and destination operands are multiplied by one another and the doubleword results are added together and stored in the low doubleword of the destination register (31-0). The same operation is performed on the other pairs of adjacent words. (Figure 3-6 shows this operation when using 64-bit operands.) The source operand can be an MMX™ technology register or a 64-bit memory location, or it can be an XMM register or a 128-bit memory location. The destination operand can be an MMX or an XMM register.

The PMADD instruction wraps around only in one situation: when the 4 words being operated on in a group are all 8000H. In this case, the result wraps around to 80000000H.

Figure 3-6. PMADDWD Execution Model

Operation

PMADDWD instruction with 64-bit operands:
DEST[31..0] (DEST[15..0] * SRC[15..0]) + (DEST[31..16] * SRC[31..16]);
DEST[63..32] (DEST[47..32] * SRC[47..32]) + (DEST[63..48] * SRC[63..48]);

PMADDWD instruction with 128-bit operands:
DEST[31..0] (DEST[15..0] * SRC[15..0]) + (DEST[31..16] * SRC[31..16]);
DEST[63..32] (DEST[47..32] * SRC[47..32]) + (DEST[63..48] * SRC[63..48]);
DEST[95..64) (DEST[79..64) * SRC[79..64)) + (DEST[95..80) * SRC[95..80));
DEST[127..96) (DEST[111..96) * SRC[111..96)) + (DEST[127..112) * SRC[127..112));

Intel(R) C++ Compiler Intrinsic Equivalent

PMADDWD __m64 _mm_madd_pi16(__m64 m1, __m64 m2)

PMADDWD __m128i _mm_madd_epi16 ( __m128i a, __m128i b)

Flags Affected

None.

Protected Mode Exceptions

#GP(0) - If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. (128-bit operations only.) If memory operand is not aligned on a 16-byte boundary, regardless of segment.

#SS(0) - If a memory operand effective address is outside the SS segment limit.

#UD - If EM in CR0 is set. (128-bit operations only.) If OSFXSR in CR4 is 0. (128-bit operations only.) If CPUID feature flag SSE-2 is 0.

#NM - If TS in CR0 is set.

#MF (64-bit operations only.) - If there is a pending x87 FPU exception.

#PF(fault-code) - If a page fault occurs.

#AC(0) (64-bit operations only.) - If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3.

Real-Address Mode Exceptions

#GP(0) (128-bit operations only.) - If memory operand is not aligned on a 16-byte boundary, regardless of segment. If any part of the operand lies outside of the effective address space from 0 to FFFFH.

#UD - If EM in CR0 is set. (128-bit operations only.) If OSFXSR in CR4 is 0. (128-bit operations only.) If CPUID feature flag SSE-2 is 0.

#NM - If TS in CR0 is set.

#MF (64-bit operations only.) - If there is a pending x87 FPU exception.

Virtual-8086 Mode Exceptions

Same exceptions as in Real Address Mode.

#PF(fault-code) - For a page fault.

#AC(0) (64-bit operations only.) - If alignment checking is enabled and an unaligned memory reference is made.

Numeric Exceptions

None.

 

 

 

 

 

 

 


For details, see Volume 2A and Volume 2B of the Intel(R) 64 and IA-32 Intel Architecture Software Developer's Manual. For the latest updates on the instruction set information, go to the web site.