RCOpArg::ExtractWithByteOffset is only used in one place: a special case
of rlwinmx. ExtractWithByteOffset first stores the value of the
specified register into m_ppc_state (unless it's already there), and
then returns an offset into m_ppc_state. Our use of this function has
two undesirable properties (except in the trivial case `offset == 0`):
1. ExtractWithByteOffset calls StoreFromRegister without going through
any of the usual functions. This violated an assumption I made when
working on my constant propagation PR and led to a hard-to-find bug.
2. If the specified register is in a host register and is dirty,
ExtractWithByteOffset will store its value to m_ppc_state even when
it's unnecessary. In particular, this always happens when rlwinmx
uses the same register as input and output, since rlwinmx always
allocates a host register for the output and marks it as dirty.
Since ExtractWithByteOffset is only used in one place, I figure we might
as well inline it there. This commit does that, and also alters
rlwinmx's logic so the special case code is only triggered when the
input is already in m_ppc_state.
Input in `m_ppc_state`, before (11 bytes):
mov esi, dword ptr [rbp-104]
mov dword ptr [rbp-104], esi
movzx esi, byte ptr [rbp-101]
Input in `m_ppc_state`, after (5 bytes):
movzx esi, byte ptr [rbp-101]
Input in host register, before (8 bytes):
mov dword ptr [rbp-104], esi
movzx esi, byte ptr [rbp-101]
Input in host register, after (3 bytes):
shr edi, 0x18