Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fuse more kernels in SSPKnoth #327

Open
charleskawczynski opened this issue Nov 13, 2024 · 2 comments
Open

Fuse more kernels in SSPKnoth #327

charleskawczynski opened this issue Nov 13, 2024 · 2 comments

Comments

@charleskawczynski
Copy link
Member

I realized that now we default to SSPKnoth in ClimaAtmos, one reason why fieldvector kernels take up so much time in CliMA/ClimaCore.jl#2067 (comment), is that we are not sufficiently fusing kernels in SSPKnoth.

@charleskawczynski
Copy link
Member Author

i.e., step_u!(int, cache::RosenbrockCache{Nstages}) where {Nstages}

@charleskawczynski
Copy link
Member Author

Quantitatively, this build shows that the 4 gpu dry baro wave spends 27% of its time in fieldvector operations, which are mostly done in the timestepper:

[ Info: Statistics across 299016 total kernels
                                  Kernel duration percentage
                          ┌                                        ┐
                     fill ┤ 0.0794944                               
       dss_load_from_recv ┤ 0.263896                                
     dss_fill_send_buffer ┤ 0.274443                                
                dss_ghost ┤ 0.369122                                
          dss_local_ghost ┤ 0.453303                                
              CUDA memcpy ┤■ 1.05338                                
              CUDA memset ┤■■ 1.5149                                
                dss_local ┤■■ 1.76417                               
          dss_untransform ┤■■■ 2.66246                              
            dss_transform ┤■■■ 2.71636                              
                   copyto ┤■■■■■■ 6.05631                           
single_field_solve_kernel ┤■■■■■■■■ 8.18527                         
                 spectral ┤■■■■■■■■■■■■■■■■■■ 17.5483               
          CuKernelContext ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 27.1507     
                  stencil ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 29.9078  
                          └                                        ┘

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant