Comments (9)
nothing fancy. Most interesting thing to me is difference in performance between compilers and compiler options for do concurrent, select, a basic assign, ADE integers versus CHARACTER, .... some compilers that are often the fastest for numeric calculations to very poorly with I/O, is one observation. So what might seem like a great algorithm developed with one compiler can be very bad with another. For whatever reasons the one you used does not seem to be horrible with anyone I tested with, and is either at the top or near it.
wget http://urbanjost.altervista.org/REMOVE/time_case.tgz
PS: not all the compilers actually do the DO CONCURRENT in parallel at all, or require special compiler options to do so.
from stdlib.
Just looking at reasonable contenders using ifort, ifx, and gfortran with optimization on Linux upper7 (your case with int8 kind) was a clear winner for consistency and performance. Although several others outperformed it on compiler or in some cases (very long strings, string needing no change, string needing all characters changed) it is a clear winner with the ones tested.
The only one substantially beating it is if the DO CONCURRENT one is run in parallel with strings much longer than typical text lines. Since upper7 is no worse than 60 percent of the benchmark speed ( a simple assign), and is the most consistent assuming something is desired that is portable across compilers it looks good.
The rather interesting information is that what is a top performer on one compiler is many times slower with another with very short simple procedures, and that using ADE representations is usually much faster even all these years after CHARACTER variables were introduced, and some bugs encountered with arrays of function pointers across compilers that ended up making it just easier to do brute-force tests (a more sophisticated test exists but it encounters a lot of unrelated compiler issues).
It would be of particular interest to compare the stdlib version (upper10) and your proposed replacement (upper7) and see how it goes with other compilers as where the stdlib version performed very significantly worse was with ifort, which is being supplanted by ifx.
from stdlib.
Got my hands on nvfortran23 in wsl following the install from https://developer.nvidia.com/hpc-sdk-downloads and reran the benchmark. Didn't check for any special compiler directive, just used -03 as for the others. The results are in the same google spreadsheet.
Have to check what will happen if I offload the computation to the GPU... And I think the do concurrent implementation can be modified.
Results change a bit from one run to another but indeed upper7 comes as quite a robust approach. With nvfortran added upper7 with do concurrent, results are slight better but close (I'm betting more on some randomness than an actual improvement) not sure this actually gives any advantage for such short test.
from stdlib.
so nvfortran and gfortran give consistent speeds but the ifort/ifx can be a factor of 50x. That is bad enough for me to look at the generated code and bring it up on the Intel Fortran forum. I saw that in the past when SELECT CASE was a new feature for a lot of compilers but especially for such a simple select that is horrible. Thanks for the tests. The upper1 being the one you proposed it holds up once again as a good general choice. If anyone can the stdlib procedure versus the one proposed here it would be very informative for nagfor, crayftn, ... for stdlib in general it would be nice to have a cd/ci interface to macos, windows, openbsd, linux, ... and various compilers so if you just set up and pushed something set up to do a "fpm test --compiler $NAME --profile release it would run on a lot of platforms. I have a .github directory on https://github.com/urbanjost/easy that was my attempt at that that works with ifort on linux, gfortran on macos, gfortran on linux, and gfortran on several MSWindows configurations that can easily run different gfortran versions as well but I am not a CD/CI wiz on github; that should be able to do nvfortran too.
It seems like there could be an empty github repository that if you pushed a mini fpm project at it it could run a bunch of environments just using "fpm test" so the scripts would not have to change. A simple case like this shows how huge the variation can be in the performance of the same code
from stdlib.
An FYI that I compared some case conversions and got unexpected speed
variations between compilers and sound some TRANSFER() functions really
slow, disappointing speeds using actual CHARACTER expressions instead of
reverting to ADEs, and all kinds of surprises but your method was the most
consistent. Yours is the same as UPPER3. UPPER-1 is just for a baseline
and assigns output to input. All are functions, although subroutines
can potentially be faster. So yours is the fastest with -O3 with ifort,
but several methods are faster with other compilers, as the gfortran
shows. But if you use INT8 integers (upper7) it ties with the fastest
ones I had with gfortran and still comes in a strong second with ifort. I do
not have the results from the other compilers and do not have access to them
at the moment, but I found upper7 either best or near best so at least on
some platforms using 1-byte integers was a significant gain; albeit it was
slower on others.
The scores for each compiler are relative to the assign. The times are in seconds.
Surprising the difference in speed for just the plain assign between gfortran and
ifort, isn't it?
$ fpm run .. -flag -O3 --compiler gfortran
Hello, time_case!
upper-1 : score: 1000 time: 0.47156000000000003E-1 AbCdEfGhIjKlMnOpQrStUvWxYz
upper5 : score: 0655 time: 0.71913000000000338E-1 ABCDEFGHIJKLMNOPQRSTUVWXYZ
upper8 : score: 0655 time: 0.71990999999999694E-1 ABCDEFGHIJKLMNOPQRSTUVWXYZ
upper7 : score: 0650 time: 0.72481000000000684E-1 ABCDEFGHIJKLMNOPQRSTUVWXYZ
upper3 : score: 0499 time: 0.94342000000000148E-1 ABCDEFGHIJKLMNOPQRSTUVWXYZ
upper4 : score: 0499 time: 0.94362000000000279E-1 ABCDEFGHIJKLMNOPQRSTUVWXYZ
upper2 : score: 0210 time: 0.22387399999999946 ABCDEFGHIJKLMNOPQRSTUVWXYZ
upper1 : score: 0186 time: 0.25304299999999991 ABCDEFGHIJKLMNOPQRSTUVWXYZ
upper6 : score: 0175 time: 0.26923099999999955 ABCDEFGHIJKLMNOPQRSTUVWXYZ
upper0 : score: 0009 time: 4.7819500000000001 ABCDEFGHIJKLMNOPQRSTUVWXYZ
$ fpm run .. -flag -O3 --compiler ifort
[100%] Project compiled successfully.
Hello, time_case!
upper-1 : score: 1000 time: .6184600000000001E-01 AbCdEfGhIjKlMnOpQrStUvWxYz
upper3 : score: 0779 time: .7930900000000030E-01 ABCDEFGHIJKLMNOPQRSTUVWXYZ
upper7 : score: 0702 time: .8806700000000056E-01 ABCDEFGHIJKLMNOPQRSTUVWXYZ
upper2 : score: 0321 time: .1925020000000002 ABCDEFGHIJKLMNOPQRSTUVWXYZ
upper4 : score: 0262 time: .2352109999999996 ABCDEFGHIJKLMNOPQRSTUVWXYZ
upper6 : score: 0236 time: .2616720000000008 ABCDEFGHIJKLMNOPQRSTUVWXYZ
upper1 : score: 0027 time: 2.277227000000000 ABCDEFGHIJKLMNOPQRSTUVWXYZ
upper5 : score: 0027 time: 2.247931000000000 ABCDEFGHIJKLMNOPQRSTUVWXYZ
upper8 : score: 0027 time: 2.246852000000001 ABCDEFGHIJKLMNOPQRSTUVWXYZ
upper0 : score: 0014 time: 4.220628000000000 ABCDEFGHIJKLMNOPQRSTUVWXYZ
So this is your method except with some compilers I found wp=>int8 sped things up
pure function upper7(str) Result(string)
use,intrinsic :: iso_fortran_env, only : wp=>int8
Character(*), Intent(In) :: str
Character(LEN(str)) :: string
integer(kind=wp), parameter :: ADE_A = iachar('a'), ADE_Z = iachar('z')
integer(kind=wp), parameter :: CASE_DIFF = iachar('a')-iachar('A')
Integer(kind=wp) :: ADE_char
Integer :: i
do i = 1, len(str)
ADE_char = iachar(str(i:i),wp) ! ASCII Decimal Equivalent
if (ADE_char >= ADE_A .and. ADE_char <= ADE_Z) ADE_char = ADE_char - CASE_DIFF
string(i:i) = achar(ADE_char)
enddo
end function upper7
The biggest surprise to me with variants of yours was that initially just doing string=str
and then only changing the lower-case characters was much slower than just doing
it for all characters.
Another surprise was Number 8, which uses a select, was a top performer with gfortran, and pretty dismal with ifort.
It would be interesting to see timing differences with crayfort, nagfor, nvfortran, ...
from stdlib.
This is very interesting! When I was working on fast string to double conversion I also found that using 1-byte integers gave me a boost in performance. Would you mind sharing your benchmark fpm project? I can test on windows with ifort19, ifort23, ifx23 and on Linux with ifort19 and gfortran. I'm trying to get nvfortran, not there yet.
from stdlib.
Here a full comparison using your benchmark with the compilers I have at hand (increased the loop to 10000000, otherwise some would come out with elapsed time = 0)
https://docs.google.com/spreadsheets/d/1C4HUHNXtSj-qZ2JnQw2B_pG6JE9TCYdIN5fhuXQlTsU/edit?usp=sharing
It was also interesting to see that with ifort upper7 came on top, but there is quite a difference between ifort19 on windows and ifort19 on linux(wsl) (being the same machine).
Upper10 (same as stdlib) came on top only with ifx23.
from stdlib.
The upper6-style tests and code itself indicate it is not particularly sensitive to input being
all uppercase or all lowercase, being unusually long, or input strings
being all "a" versus all "z" characters so it does appear to be robust.
One of the oddest things found in exercising the variants is finding how
SELECT is so much slower than other conditionals with one compiler. The
three very similiar routines in the attached sample program produce almost
identical timing results with several compilers, but vary by a factor of
50x with one.
For instance using "gfortran-11 -O3 xx.f90" generates nearly identical speeds
upper1 0.12178899999999999 GDSTBHWGEJXIFQZCHODHJHGPPULKHYIUOZKSQKTDFIEXGULFKISDYOGLDYECVFFVIEBXIQ
upper1 0.11874299999999999 EMRWHTBCIFFGFIIFCYFFIHXETAHVSHJWWBDDXGYMIQUZNRYZZLHGLTIHOIXWOKOUIEITXS
upper1 0.11874199999999999 FIITOWKRARTRPZRBHKKBXJUYNETBFVAEFAGGEAUWMULVISLYGBOWIDBQCADEWVVXKTXLKY
upper2 0.11873000000000000 JKTFUTMWELFECCATZWYVMBWQNIMYIUJZTPAYBDZXWRERVRFLQQPMEUJUINBCXIWUZIQKEF
upper2 0.11889000000000000 NEDVFZXVQOHILVNWTDWOYIKYCWABLWVOFQVXUOBNYERZLFEPMYKRCVNCCMUBEAJRJFVJVW
upper2 0.11880399999999991 HPWHTMEYFJCHMUDCTUEHWDLSXFRDUFIMVDUKRZFTJEWQSCSVRSDVVIEDPNNNCRWSECAXNB
upper3 0.11881100000000000 FZVYQAKJANYFWUSKOMRRBVBMGODZYXRCEMVTFEWHUHVPSCKSTUMXOFPHASOQSXHTBMEAEQ
upper3 0.11873899999999993 UGOHQOXEDONSRNTGSHWASEFNDSIUAQCHQWCBPYVVWFHRVMEUKKSJXWHNSLUBODDCUBXTDP
upper3 0.11875000000000013 DCQXXWFCEBJNLVWJGBDMOZEUTXEWTHDIDQRXIRKYVOTLSMSSQNOSDINMCQRWVPXRTCQOTU
But with one compiler testing with more complex cases shows a continuing
lack of optimization in SELECT CASE I so far have not seen with any
other compiler. Curious if that is the case with any other compilers.
upper1 .1277940000000000 IBOYRMGCBPRTKMNIHWYAWQFJVBBDTZKFHYMIYTOYHQMOWLICNJRSXFGKFKNPTISXKIKXFM
upper1 .1221970000000000 FDGRGVREJPKUKOJTFTNLFTHFQITLCRYUEFLWUXBGMIKOWWSHNOMQPQJOTRALYCSICVKBXI
upper1 .1221530000000000 YKFXWOKTDYEKMJVGYNSQBEYFVGPVORGQWXCVVMROURMUUGDSOJJFGWGFGJSIRKLJVTRWVZ
upper2 4.660762000000000 JQISSDTHYABHYPTLNNRKJRICWQVXCLXCLJDLGSOZQFSQUUUXOMWSBUHIEIEUDBDYNRMXQN
upper2 4.387915000000000 DMJORPNRNPGOZRUYMOVJNNIVKJIRVSSYBJSVPGDFHWUKYLZPRDYJWVJCCXGBBBCLNOCQOT
upper2 4.302734000000001 IIMFYEFONONPFYHCGRUFHZRMNRPCKBWHOUYEVNBJJUHRXYXUDKEXJVTXJKTTDMRLYCIGTH
upper3 .5937310000000000 NROPNTNQAPRATPGPBTBALEGKQORSLAGFTDIRWMOLFDADTIUNUIWDHGHMIBUJBJSBQCMDNU
upper3 .4668369999999999 RIHVWCUQKFPYVNWJSPSXDJZYTHXXSEMPZWPEWIWYEPWEQMPLWTHYAFCJCMPTFSVKUHMYEZ
upper3 .5521539999999998 EYNGKUDPMBPHHBIITVWQTUSYEPIWQVBVDTMLTJBRIMKNDLVKDOEMNHXVHCDJBSFOIBQNBT
[details="Example reproducer code"]
module M_upper
use, intrinsic :: iso_fortran_env, only: stdout => output_unit, stderr => error_unit, stdin => input_unit
use, intrinsic :: iso_fortran_env, only: int8, int16, int32, int64, real32, real64, real128
use, intrinsic :: iso_fortran_env, only: compiler_version, compiler_options
implicit none
private
public :: say_hello, random_string, upper1, upper2, upper3
contains
subroutine say_hello
print *, "Hello, time_case!"
print '(4a)', &
'This file was compiled by ', &
compiler_version(), &
' using the options ', &
compiler_options()
end subroutine say_hello
pure function upper1(str) result(string)
character(*), intent(in) :: str
character(len(str)) :: string
integer :: i
integer(kind=int8), parameter :: ade_a = iachar('a'), ade_z = iachar('z')
integer(kind=int8), parameter :: diff = iachar('A', kind=int8) - iachar('a', kind=int8)
integer(kind=int8) :: ade_char
do concurrent(i=1:len(str))
ade_char = iachar(str(i:i), int8)
if (ade_char >= ade_a .and. ade_char <= ade_z) ade_char = ade_char + diff
string(i:i) = achar(ade_char)
end do
if (len(str) .eq. 0) string = str
end function upper1
pure function upper2(str) result(string)
character(*), intent(in) :: str
character(len(str)) :: string
integer :: i
integer(kind=int8), parameter :: diff = iachar('A', kind=int8) - iachar('a', kind=int8)
do concurrent(i=1:len(str))
select case (str(i:i))
case ('a':'z'); string(i:i) = achar(iachar(str(i:i), kind=int8) + diff)
case default; string(i:i) = str(i:i)
end select
end do
if (len(str) .eq. 0) string = str
end function upper2
pure function upper3(str) result(string)
character(*), intent(in) :: str
character(len(str)) :: string
integer :: i
integer(kind=int8) :: ch
integer(kind=int8), parameter :: diff = iachar('A', kind=int8) - iachar('a', kind=int8)
integer(kind=int8), parameter :: ade_a = iachar('a'), ade_z = iachar('z')
do concurrent(i=1:len(str))
ch = iachar(str(i:i), kind=int8)
select case (ch)
case (ade_a:ade_z); string(i:i) = achar(ch + diff)
case default; string(i:i) = str(i:i)
end select
end do
if (len(str) .eq. 0) string = str
end function upper3
function random_string(chars, length) result(out)
character(len=*), intent(in) :: chars
integer, intent(in) :: length
character(len=:), allocatable :: out
real :: x
integer :: ilen
integer :: which
integer :: i
ilen = len(chars)
out = ''
if (ilen .gt. 0) then
do i = 1, length
call random_number(x)
which = nint(real(ilen - 1)*x) + 1
out = out//chars(which:which)
end do
end if
end function random_string
end module M_upper
program main
use, intrinsic :: iso_fortran_env, only: real64
use M_upper, only: upper1, upper2, upper3, random_string,say_hello
implicit none
!! define an abstract template defining the procedures
abstract interface
function func (str)
character(len=*), intent (in) :: str
character(len=len(str)) :: func
end function func
end interface
!! define a pointer of the abstract type
procedure (func), pointer :: f_ptr => null()
call say_hello()
f_ptr => upper1; call timeit('upper1')
f_ptr => upper2; call timeit('upper2')
f_ptr => upper3; call timeit('upper3')
contains
subroutine timeit(name)
character(len=*),intent(in) :: name
character(len=:),allocatable :: str, in
integer,parameter :: calls=1000000
integer :: i, j
real(kind=real64) :: time_start, time_finish
character(len=*),parameter :: gen='(*(g0,1x))'
! use some random calls to help prevent compiler from optimzing away loops
do j=1,3
call setin(j,in)
call cpu_time(time_start)
do i = 1, calls
str = f_ptr(in)
if(i.eq.calls/2) call setin(j,in)
end do
call cpu_time(time_finish)
print gen, name, time_finish - time_start, ' ', str
enddo
end subroutine timeit
subroutine setin(j,dat)
character(len=:),intent(out),allocatable :: dat
integer,intent(in) :: j
select case(j)
case(1) ;dat = random_string('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ', 70)
case(2) ;dat = random_string('abcdefghijklmnopqrstuvwxyz', 70)
case(3) ;dat = random_string('ABCDEFGHIJKLMNOPQRSTUVWXYZ', 70)
case default;dat = random_string('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ', 70)
end select
end subroutine setin
end program main
[/details]
from stdlib.
Yes, this is not the first time I have been disappointed with select case. I do like it for high level branching, in some cases the code is easier to read. but for low level computations I avoid it at all costs. With a colleague we tested it once to replace a long list of IFs statements in an intensive computation and the results were just disappointing. So yes, I just use it for non critical code.
I took your last test and tried out:
<style> </style>gfortran9.4.0 (linux) | gfortran12.2.0 (linux) | ifort19 (linux) | nvfortran23 (linux) | ifort19 (windows) | ifort23 (windows) | ifx23 (windows) | |
---|---|---|---|---|---|---|---|
upper1 | 2.556E-02 | 2.544E-02 | 2.851E-02 | 1.006E-01 | 3.125E-02 | 3.125E-02 | 1.563E-02 |
upper1 | 2.499E-02 | 2.547E-02 | 2.662E-02 | 9.864E-02 | 3.125E-02 | 3.125E-02 | 1.563E-02 |
upper1 | 2.477E-02 | 2.598E-02 | 2.814E-02 | 1.006E-01 | 3.125E-02 | 3.125E-02 | 1.563E-02 |
upper2 | 2.592E-02 | 2.620E-02 | 9.370E-01 | 1.091E-01 | 1.000E+00 | 9.219E-01 | 8.438E-01 |
upper2 | 2.466E-02 | 2.685E-02 | 7.957E-01 | 1.304E-01 | 9.063E-01 | 7.813E-01 | 7.813E-01 |
upper2 | 2.469E-02 | 2.598E-02 | 7.466E-01 | 9.055E-02 | 8.438E-01 | 7.344E-01 | 7.344E-01 |
upper3 | 2.472E-02 | 2.580E-02 | 1.064E-01 | 1.017E-01 | 1.094E-01 | 1.094E-01 | 7.813E-02 |
upper3 | 2.488E-02 | 2.585E-02 | 8.743E-02 | 9.393E-02 | 9.375E-02 | 7.813E-02 | 6.250E-02 |
upper3 | 2.497E-02 | 2.574E-02 | 1.023E-01 | 8.955E-02 | 1.094E-01 | 1.094E-01 | 6.250E-02 |
from stdlib.
Related Issues (20)
- hash_functions test fails on i386: `Segmentation fault - invalid memory reference`
- Request to upgrade Intel-classic compiler in macOS CI
- Add `library` configuration to `stdlib-fpm`
- Massive slow down in docs generation HOT 4
- Unexpected performance of hash maps HOT 8
- python preprocessor HOT 11
- add topic tags `lapack`, `blas`, `linear-algebra` HOT 1
- Improve descriptions of rotm, rotmg, stdlib_srotm, stdlib_srotmg
- Don't repeat names of procedures in descriptions
- stdlib_io_npy, FPM and Rank > 4 HOT 5
- Missing documentation for `LAPACK`-related functions HOT 6
- Adding the logarithmic derivative of the gamma function (digamma) to stdlib_specialfunctions_gamma HOT 1
- example_starts_with prints logical results as binary
- Nonstandard forward reference to 'lk' is not allowed in the same specification part causes compilation errors HOT 4
- Issue/Question about the output of `lstsq` HOT 3
- Building and testing stdlib on WSL1 with two fails HOT 5
- Adding matrix norms to the `stdlib_linalg` module. HOT 8
- Extend `sort_index` interface to allow `int32` index argument HOT 1
- Bug in the complex least-squares solver HOT 6
- Conflict between the pure function `stdlib_iparam2stage` and `openmp` HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from stdlib.