Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Native profiler output is limited to around 100 columns when printing to a file #2130

Closed
johnbcoughlin opened this issue Oct 26, 2023 · 1 comment · Fixed by #2131
Closed
Labels
bug Something isn't working

Comments

@johnbcoughlin
Copy link

Describe the bug

The native profiler table output is truncated for no good reason when writing to a file.

To reproduce

open("example.txt", "w") do filehandle
    CUDA.@profile io=filehandle CUDA.rand(10, 10)
end
$ cat example.txt
Profiler ran for 443.47 ms, capturing 536 events.

Host-side activity: calling CUDA APIs took 88.12 ms (19.87% of the trace)
┌──────────┬───────────┬───────┬───────────┬───────────┬───────────┬────────────
│ Time (%) │      Time │ Calls │  Avg time │  Min time │  Max time │ Name      ⋯
├──────────┼───────────┼───────┼───────────┼───────────┼───────────┼────────────
│   19.64% │  87.08 ms │     2 │  43.54 ms │   2.23 ms │  84.85 ms │ cudaLaunc ⋯
│    0.09% │ 380.99 µs │     1 │ 380.99 µs │ 380.99 µs │ 380.99 µs │ cudaFree  ⋯
│    0.02% │  80.11 µs │     1 │  80.11 µs │  80.11 µs │  80.11 µs │ cudaDevic ⋯
│    0.01% │  37.67 µs │     1 │  37.67 µs │  37.67 µs │  37.67 µs │ cudaMallo ⋯
│    0.00% │  19.79 µs │     1 │  19.79 µs │  19.79 µs │  19.79 µs │ cuDeviceG ⋯
│    0.00% │  10.73 µs │     1 │  10.73 µs │  10.73 µs │  10.73 µs │ cuMemAllo ⋯
│    0.00% │  10.49 µs │     1 │  10.49 µs │  10.49 µs │  10.49 µs │ cudaGetDe ⋯
│    0.00% │   5.01 µs │     3 │   1.67 µs │ 238.42 ns │   4.29 µs │ cudaDevic ⋯
│    0.00% │    3.1 µs │     4 │ 774.86 ns │ 238.42 ns │   1.43 µs │ cudaGetLa ⋯
│    0.00% │   2.38 µs │     1 │   2.38 µs │   2.38 µs │   2.38 µs │ cuInit    ⋯
│    0.00% │ 715.26 ns │     1 │ 715.26 ns │ 715.26 ns │ 715.26 ns │ cuDeviceG ⋯
│    0.00% │ 476.84 ns │     1 │ 476.84 ns │ 476.84 ns │ 476.84 ns │ cuDriverG ⋯
│    0.00% │ 476.84 ns │     1 │ 476.84 ns │ 476.84 ns │ 476.84 ns │ cuDeviceT ⋯
│    0.00% │ 238.42 ns │     1 │ 238.42 ns │ 238.42 ns │ 238.42 ns │ cuModuleG ⋯
│    0.00% │ 238.42 ns │     1 │ 238.42 ns │ 238.42 ns │ 238.42 ns │ cuDeviceG ⋯
│    0.00% │    0.0 ns │     1 │    0.0 ns │    0.0 ns │    0.0 ns │ cuDeviceG ⋯
└──────────┴───────────┴───────┴───────────┴───────────┴───────────┴────────────
                                                                1 column omitted

Device-side activity: GPU was busy for 94.41 µs (0.02% of the trace)
┌──────────┬──────────┬───────┬──────────┬──────────┬──────────┬────────────────
│ Time (%) │     Time │ Calls │ Avg time │ Min time │ Max time │ Name          ⋯
├──────────┼──────────┼───────┼──────────┼──────────┼──────────┼────────────────
│    0.02% │ 92.51 µs │     1 │ 92.51 µs │ 92.51 µs │ 92.51 µs │ _Z20generate_ ⋯
│    0.00% │  1.91 µs │     1 │  1.91 µs │  1.91 µs │  1.91 µs │ _Z13gen_seque ⋯
└──────────┴──────────┴───────┴──────────┴──────────┴──────────┴────────────────
                                                                1 column omitted
Paste your Manifest.toml here, or accurately describe which version of CUDA.jl and its dependencies (GPUArrays.jl, GPUCompiler.jl, LLVM.jl) you are using.

Expected behavior

A clear and concise description of what you expected to happen.

Version info

Details on Julia:

julia> versioninfo()
Julia Version 1.9.0
Commit 8e630552924 (2023-05-07 11:25 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 80 × Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, cascadelake)
  Threads: 2 on 80 virtual cores
Environment:
  LD_LIBRARY_PATH = /sw/contrib/stf-src/petsc/mpich/3.15.1/lib:/sw/contrib/stf-src/hdf5/mpich/1.12.0/lib:/sw/contrib/stf-src/mpich/4.0a2/lib:/sw/contrib/stf-src/mpich/4.0a2/include:/sw/ucx/1.10.0/lib:/sw/contrib/stf-src/gdb/10.2/lib:/sw/gcc/10.2.0/lib64:/sw/gcc/10.2.0/lib
  JULIA_REVISE_POLL = 1
  JULIA_DEPOT_PATH = /gscratch/aaplasma/johnbc/.julia
  JULIA_PROJECT = .

Details on CUDA:

julia> CUDA.versioninfo()
CUDA runtime 12.2, artifact installation
CUDA driver 12.2
NVIDIA driver 535.104.12

CUDA libraries:
- CUBLAS: 12.2.5
- CURAND: 10.3.3
- CUFFT: 11.0.8
- CUSOLVER: 11.5.2
- CUSPARSE: 12.1.2
- CUPTI: 20.0.0
- NVML: 12.0.0+535.104.12

Julia packages:
- CUDA: 5.0.0
- CUDA_Driver_jll: 0.6.0+4
- CUDA_Runtime_jll: 0.9.2+4

Toolchain:
- Julia: 1.9.0
- LLVM: 14.0.6
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2, 7.3, 7.4, 7.5
- Device capability support: sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86

1 device:
  0: Quadro RTX 6000 (sm_75, 19.855 GiB / 24.000 GiB available)

@johnbcoughlin johnbcoughlin added the bug Something isn't working label Oct 26, 2023
@maleadt
Copy link
Member

maleadt commented Oct 26, 2023

I guess we shouldn't limit the nocrop case to IOBuffer:

CUDA.jl/src/profile.jl

Lines 573 to 579 in 50136fb

crop = if io isa IOBuffer
# when emitting to a string, render all content (e.g., for the tests)
:none
else
:horizontal
end

As a workaround, render it to an IOBuffer and write that to your file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants