0
I Use This!
Inactive

Commits : Listings

Analyzed about 2 hours ago. based on code collected about 2 hours ago.
Jan 25, 2025 — Jan 25, 2026
Commit Message Contributor Files Modified Lines Added Lines Removed Code Location Date
Merge pull request #14 from GPU-correlators/feature/dp4a More... over 7 years ago
Updates to README More... over 7 years ago
Updated README for dp4a and added table of GPU architectures supported More... over 7 years ago
DP4A-enabled requires NTIME_PIPE is a multiple of 16: set default values to satisfy this and catch bad dimensions More... over 7 years ago
Fix bug - initial swizzling only being done over first chunk of data time data. Make error output from xgpuCheckResult more readable More... over 7 years ago
Merge branch 'master' of github.com:GPU-correlators/xGPU into feature/dp4a More... over 7 years ago
DP4A enablment now done through Makefile command line option DP4A=yes. Set default default compile option to sm_35 and remove abi=no to give correct compilation out of the box with CUDA 9.x More... over 7 years ago
Merge pull request #11 from david-macmahon/master More... over 8 years ago
Merge pull request #12 from GPU-correlators/hotfix/round More... over 8 years ago
Swapped round to rintf More... over 8 years ago
When doing 8-bit texture loaded data (FIXED_POINT), always round the intermediate result before accumulating to the long term accumulator. This prevents eventual propagation of trailing floating point ULP errors, and ensures all numbers are represented as integers. More... over 9 years ago
Merge remote-tracking branch 'GPU-correlators/master' More... over 9 years ago
Optimized dp4a path: keep complexity second fastest running index to allow for single load of int2 instead 2x int loads per thread to reduce load and indexing instruction overhead More... over 9 years ago
Added support 2-d textures to dp4a path. Fix bugs with checking texture dimension limits. More... over 9 years ago
Replace use of char4 with int to guarantee alignment. Use separate accumulators for positive and negative imaginary components since dp4a instruction can only add and not subtract. More... over 9 years ago
Use signed char to remove potential ambiguity More... over 9 years ago
Added initial support for dp4a. On CUDA_ARCH < sm_61 the computation is emulated. Enabled with the macro DP4A in xgpu.h. More... over 9 years ago
Added support for sm_52 target and flush denormals to zero for small performance boost. More... over 11 years ago
Fixed comparison between CPU and GPU results. More... almost 12 years ago
Patched block indexing for NSTATION >= 512 using double-precision sqrt. More... almost 12 years ago
Added alternative 8-byte shared-memory ordering enabled with -DSTRUCT_OF_ARRAY. More... almost 12 years ago
Added sm_50 target in Makefile. More... almost 12 years ago
GPU id info is displayed when library is initialized. More... almost 12 years ago
Fixes for compilation on OSX Mavericks - must specify OSTYPE=osx. More... about 12 years ago
Make uninstall target more thorough More... over 12 years ago
Add xgpu.m4 for use by autotools-based clients More... over 12 years ago
Merge remote-tracking branch 'origin/master' More... almost 13 years ago
Disabled cpu test when POWER_LOOP is defined More... almost 13 years ago
Reordering of multiply-adds in shared_transfer_8.cuh to improve performance on GK110. More... almost 13 years ago
Reordering with TWO_BY_TWO_COMPUTE for slight performance increase on GK110. More... almost 13 years ago