GPU: Simplify empty-A SpMV path; add setZero(cudaStream_t) overload

The empty-A branch of spmv_device_exec was open-coding the
waitReady/cudaMemsetAsync/recordReady triple. Pull it into a new
setZero(cudaStream_t) overload on DeviceMatrix and have the existing
setZero(Context&) delegate, so the same idiom isn't duplicated across
DeviceDispatch.h and GpuSparseContext.h.

Also trim the multi-line narration on the empty-A comment to the WHY:
SparseContext owns no cuBLAS handle, so beta != 0 with empty A is
delegated to the caller.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3 files changed