generating unaligned vector load instructions using gcc

+1 vote

I wonder how one could get the compiler to generate the "movdqu" instruction, since the vector extensions always seem to assume that everything will be aligned to 16 byte.
I tried using a packed struct and this dint help much. Of course one can always resort to inline assembly but this should not be necessary

Compile with:

gcc -O2 -S -msse2 testvecs.c

Using built-in specs.

Target: i486-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 4.7.2-5' 
--enable-languages=c,c++,go,fortran,objc,obj-c++ --prefix=/usr 
--program-suffix=-4.7 --enable-shared --enable-linker-build-id 
--with-system-zlib --libexecdir=/usr/lib --without-included-gettext 
--enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.7 
--libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu 
--enable-libstdcxx-debug --enable-libstdcxx-time=yes 
--enable-gnu-unique-object --enable-plugin --enable-objc-gc 
--enable-targets=all --with-arch-32=i586 --with-tune=generic 
--enable-checking=release --build=i486-linux-gnu --host=i486-linux-gnu 
Thread model: posix
gcc version 4.7.2 (Debian 4.7.2-5)
posted Sep 18, 2013 by Jagan Mishra

1 Answer

+1 vote

I do see a movdqu, over a range of gcc (64-bit) versions from 4.4.6 to 4.9. Some of the compilers are complaining about mixed data type arithmetic on lines 29 and 42.
I don't know whether it applies here, but splitting an unaligned memory move is likely to be the right thing on platforms up through Intel Westmere, so you would want to specify -march=native to optimize for newer ones.

answer Sep 18, 2013 by Ahmed Patel
