X

xGPU

Settings | Report Duplicate

0

I Use This!

Inactive

Commits : Listings

Analyzed 1 day ago. based on code collected 1 day ago.

Commit Message	Contributor	Files Modified	Lines Added	Lines Removed	Code Location	Date
Feb 25, 2025 — Feb 25, 2026 Showing page 1 of 7 Search / Filter on:
Merge pull request #14 from GPU-correlators/feature/dp4a	david-macmahon	More...				over 7 years ago
Updates to README	maddyscientist	More...				over 7 years ago
Updated README for dp4a and added table of GPU architectures supported	maddyscientist	More...				over 7 years ago
DP4A-enabled requires NTIME_PIPE is a multiple of 16: set default values to satisfy this and catch bad dimensions	maddyscientist	More...				over 7 years ago
Fix bug - initial swizzling only being done over first chunk of data time data. Make error output from xgpuCheckResult more readable	maddyscientist	More...				over 7 years ago
Merge branch 'master' of github.com:GPU-correlators/xGPU into feature/dp4a	maddyscientist	More...				over 7 years ago
DP4A enablment now done through Makefile command line option DP4A=yes. Set default default compile option to sm_35 and remove abi=no to give correct compilation out of the box with CUDA 9.x	maddyscientist	More...				over 7 years ago
Merge pull request #11 from david-macmahon/master	maddyscientist	More...				over 8 years ago
Merge pull request #12 from GPU-correlators/hotfix/round	maddyscientist	More...				over 8 years ago
Swapped round to rintf	maddyscientist	More...				over 8 years ago
When doing 8-bit texture loaded data (FIXED_POINT), always round the intermediate result before accumulating to the long term accumulator. This prevents eventual propagation of trailing floating point ULP errors, and ensures all numbers are represented as integers.	maddyscientist	More...				over 9 years ago
Merge remote-tracking branch 'GPU-correlators/master'	David MacMahon	More...				over 9 years ago
Optimized dp4a path: keep complexity second fastest running index to allow for single load of int2 instead 2x int loads per thread to reduce load and indexing instruction overhead	M Clark	More...				over 9 years ago
Added support 2-d textures to dp4a path. Fix bugs with checking texture dimension limits.	M Clark	More...				over 9 years ago
Replace use of char4 with int to guarantee alignment. Use separate accumulators for positive and negative imaginary components since dp4a instruction can only add and not subtract.	M Clark	More...				over 9 years ago
Use signed char to remove potential ambiguity	M Clark	More...				over 9 years ago
Added initial support for dp4a. On CUDA_ARCH < sm_61 the computation is emulated. Enabled with the macro DP4A in xgpu.h.	maddyscientist	More...				over 9 years ago
Added support for sm_52 target and flush denormals to zero for small performance boost.	M Clark	More...				over 11 years ago
Fixed comparison between CPU and GPU results.	M Clark	More...				almost 12 years ago
Patched block indexing for NSTATION >= 512 using double-precision sqrt.	M Clark	More...				almost 12 years ago
Added alternative 8-byte shared-memory ordering enabled with -DSTRUCT_OF_ARRAY.	M Clark	More...				almost 12 years ago
Added sm_50 target in Makefile.	M Clark	More...				almost 12 years ago
GPU id info is displayed when library is initialized.	M Clark	More...				almost 12 years ago
Fixes for compilation on OSX Mavericks - must specify OSTYPE=osx.	M Clark	More...				about 12 years ago
Make uninstall target more thorough	David MacMahon	More...				over 12 years ago
Add xgpu.m4 for use by autotools-based clients	David MacMahon	More...				over 12 years ago
Merge remote-tracking branch 'origin/master'	David MacMahon	More...				almost 13 years ago
Disabled cpu test when POWER_LOOP is defined	M Clark	More...				almost 13 years ago
Reordering of multiply-adds in shared_transfer_8.cuh to improve performance on GK110.	M Clark	More...				almost 13 years ago
Reordering with TWO_BY_TWO_COMPUTE for slight performance increase on GK110.	M Clark	More...				almost 13 years ago

←
1
2
3
4
5
6
7
→