Code Monkey home page Code Monkey logo

Comments (14)

louthy avatar louthy commented on June 11, 2024 1

It's also funny to me that someone with your level of expertise can still be baffled by the workings of the code 😆. Shows that the more we learn, the less we know. It seems that even with SequenceSerial, there is some concurrency going on, because it seemed to consistently take 6.5s on my machine.

The main reason I'm baffled is because the implementation is so basic:

        public static EitherAsync<L, Seq<B>> TraverseSerial<L, A, B>(this Seq<EitherAsync<L, A>> ma, Func<A, B> f)
        {
            return new EitherAsync<L, Seq<B>>(Go(ma, f));
            async Task<EitherData<L, Seq<B>>> Go(Seq<EitherAsync<L, A>> ma, Func<A, B> f)
            {
                var rb = new B[ma.Count];
                var ix = 0;
                foreach (var a in ma)
                {
                    var mb = await a;
                    if (mb.IsBottom) return EitherData.Bottom<L, Seq<B>>();
                    if (mb.IsLeft) return EitherData.Left<L, Seq<B>>(mb.LeftValue);
                    rb[ix] = f(mb.RightValue);
                    ix++;
                }

                return EitherData.Right<L, Seq<B>>(Seq.FromArray<B>(rb));
            };
        }

You can see from the inner for loop that it awaits each item in the Seq:

foreach (var a in ma)
{
    var mb = await a;
    if (mb.IsBottom) return EitherData.Bottom<L, Seq<B>>();
    if (mb.IsLeft) return EitherData.Left<L, Seq<B>>(mb.LeftValue);
    rb[ix] = f(mb.RightValue);
    ix++;
}

My understanding of await is the next line of code won't run until the thing it's awaiting has completed. And therefore the next item in the Seq can't be evaluated. Concurrency, of course, can happen once you await, which is the whole point, but not of your current task; the task scheduler should only allow other tasks queued up for your current-thread to run whilst awaiting. So, unless there's some crazy compiler inference going on here (that changes the semantics of the code), I am quite simply, baffled!

Unfortunately, right now I need to focus on my v5 branch, because it's at: ~123,000 lines of code added, ~174,000 lines deleted, in ~1,200 changed files! I need to get this project out of my head before I go digging too deep into anything else! :D

from language-ext.

louthy avatar louthy commented on June 11, 2024

I've been trying to avoid making any other changes before I do the big v5 release, but this seems like something that should be improved sooner, so I have: v4.4.8 is on nu-get now.

It will:

  • Keep the sequence in order
  • Do the parallel part that it promises

The reason the 'parallel' version took longer is because your tasks weren't launched on new threads - and so it was running n tasks concurrently. To run things in parallel you needed to use Task.Run (which queues your task on the thread-pool).

Seeing as the method gives the impression it's doing the 'parallel bit' for you, I've updated the internals of SequenceParallel to do this automatically. Internally it will launch a number of tasks on the thread-pool to process the sequence in parallel.

Note, by default it uses Environment.ProcessorCount / 2 tasks. So, given a chance it will thrash half of your machine's resource. You can set the windowSize as an optional parameter to SequenceParallel to limit the amount of parallelism. This could end up with the processing taking longer, but will obviously thrash the machine less. With your example it might take longer than the ~6 seconds if you have fewer than 42 processors available (21 items in the list, means using defaults you'd only get 21 parallel processes if you have 42 or more processors available). On my 128 logical core machine it finishes in ~6 seconds as expected.

If you don't want to pass a windowSize to SequenceParallel, but you're also not happy with the default of using half the processors in the machine, you can set the SysInfo.DefaultAsyncSequenceParallelism property (min: 4). That's what SequenceParallel uses as its default).

I am very surprised that SequenceSerial finished so quickly, that is quite odd because it will literally await each item in the sequence - so it should take the sum-total number-of-seconds to complete. I have confirmed this myself and it's very odd. The only cause I can think of is that I know Task.Delay to be highly inaccurate, but even then, it's wildly out. Very odd.

from language-ext.

JosVroegindeweijSoluso avatar JosVroegindeweijSoluso commented on June 11, 2024

Thanks for the quick response!

Amazing that the issue got resolved so soon. I also love the extra explanation, I'll make sure to remember that!

It's also funny to me that someone with your level of expertise can still be baffled by the workings of the code 😆. Shows that the more we learn, the less we know. It seems that even with SequenceSerial, there is some concurrency going on, because it seemed to consistently take 6.5s on my machine. If it was highly inaccurate (to the point of half a minute) then it should vary per run at least.

Maybe it's also a good idea to include some async unittests like these (but maybe improved 😄) in the Sequence and Traverse unittests?

from language-ext.

JosVroegindeweijSoluso avatar JosVroegindeweijSoluso commented on June 11, 2024

That is indeed quite strange, that implementation looks like it should await each item before moving on to the next one. The only reason that I could think of would be that the call to Go is not awaited. Maybe that makes the entire function forget that it's async? That would still be really weird though!

from language-ext.

louthy avatar louthy commented on June 11, 2024

Ok, this is the itch that won't scratch itself, so I created a standalone test app that records the start time for each iteration as well as the duration.

using System.Diagnostics;
using LanguageExt;
using static LanguageExt.Prelude;

var seq = Seq(1, 2, 3, 2, 5, 1, 1, 2, 3, 2, 1, 2, 4, 2, 1, 5, 6, 1, 3, 6, 2);

await Test.Run(seq);

public class Test
{
    public static Stopwatch All;
    static AtomSeq<int> Started = AtomSeq<int>();
    static AtomSeq<int> Duration = AtomSeq<int>();

    public static async Task Run(Seq<int> input)
    {
        Started = AtomSeq<int>();
        Duration = AtomSeq<int>();
        
        All = Stopwatch.StartNew();
        var ma = input.Select(DoDelay).SequenceSerial();
        var right = (Seq<int>)await ma;
        Debug.Assert(right.SequenceEqual(input));
        All.Stop();
        
        ShowResults(input);
    }

    static EitherAsync<string, int> DoDelay(int seconds)
    {
        return F(seconds).ToAsync();

        static async Task<Either<string, int>> F(int seconds)
        {
            Started.Add((int)All.ElapsedMilliseconds);
            var sw = Stopwatch.StartNew();
            await Task.Delay(seconds * 1000);
            sw.Stop();
            Duration.Add((int)sw.ElapsedMilliseconds);
            Console.Write("X ");
            return seconds;
        }
    }

    public static void ShowResults(Seq<int> input)
    {
        Console.WriteLine();
        Console.WriteLine($"Total elapsed time: {All.Elapsed}");
        var expectedStart = 0;
        
        Console.WriteLine($"Inputs: {input.Length}");
        Console.WriteLine($"Starts: {Started.Length}");
        Console.WriteLine($"Durtns: {Duration.Length}");
        
        foreach (var result in input.Zip(Started.Zip(Duration)).Map(t => (t.Item1, t.Item2.Item1, t.Item2.Item2)))
        {
            var expectedDuration = result.Item1 * 1000;
            var actualDuration = result.Item3;
            var actualStart = result.Item2;
            Console.WriteLine($"Expected start: {expectedStart}, Actual start: {actualStart} | Expected duration: {expectedDuration}, Actual duration: {actualDuration} [{result.Item1}]");
            expectedStart += expectedDuration;
        }
    }
}

The results are utterly bizarre:

Expected start: 0, Actual start: 17 | Expected duration: 1000, Actual duration: 1034 
Expected start: 1000, Actual start: 29 | Expected duration: 2000, Actual duration: 1034 
Expected start: 3000, Actual start: 30 | Expected duration: 3000, Actual duration: 1034 
Expected start: 6000, Actual start: 30 | Expected duration: 2000, Actual duration: 1034 
Expected start: 8000, Actual start: 30 | Expected duration: 5000, Actual duration: 1033 
Expected start: 13000, Actual start: 31 | Expected duration: 1000, Actual duration: 1997 
Expected start: 14000, Actual start: 31 | Expected duration: 1000, Actual duration: 2001 
Expected start: 15000, Actual start: 31 | Expected duration: 2000, Actual duration: 2000 
Expected start: 17000, Actual start: 31 | Expected duration: 3000, Actual duration: 2000 
Expected start: 20000, Actual start: 31 | Expected duration: 2000, Actual duration: 2000 
Expected start: 22000, Actual start: 31 | Expected duration: 1000, Actual duration: 2000 
Expected start: 23000, Actual start: 31 | Expected duration: 2000, Actual duration: 1999 
Expected start: 25000, Actual start: 31 | Expected duration: 4000, Actual duration: 2998 
Expected start: 29000, Actual start: 31 | Expected duration: 2000, Actual duration: 3001 
Expected start: 31000, Actual start: 31 | Expected duration: 1000, Actual duration: 3000 
Expected start: 32000, Actual start: 32 | Expected duration: 5000, Actual duration: 3999 
Expected start: 37000, Actual start: 32 | Expected duration: 6000, Actual duration: 4998 
Expected start: 43000, Actual start: 32 | Expected duration: 1000, Actual duration: 5000 
Expected start: 44000, Actual start: 32 | Expected duration: 3000, Actual duration: 5998 
Expected start: 47000, Actual start: 32 | Expected duration: 6000, Actual duration: 5998 

Note how each iteration is started almost immediately. The other thing to note is that, as I mentioned in an earlier post, the actual durations are all over the place compared to the expected. Confirming that Task.Delay is not to be trusted.

I have a meeting now, so I'll pick this up later. Investigating the IL might give some answers.

from language-ext.

JosVroegindeweij avatar JosVroegindeweij commented on June 11, 2024

Interesting! Your result section is not completely matching the input Seq, which does confuse me a bit. However, as far as I can see, the Actual duration list does line up quite nicely with the individual ints in the input Seq, just in a different order. I do count 2 6's in both, 2 5's , 1 4, 3 3's. Only the ones and twos seem to be one off, but again, the result section doesn't exactly line up with the input. So the weird thing to me seems not to be the actual timings of Task.Delay, but rather that it seems like it's executing a Task.Delay(3000) when it's handling seconds == 2 and vice versa. I don't believe that it's magically by basically exactly a second off on all of these occasions.

Really interested to see what comes out of your next tests! :)

from language-ext.

louthy avatar louthy commented on June 11, 2024

Your result section is not completely matching the input Seq

The use of Seq wasn't thread-safe, so I've changed it to AtomSeq

Expected start: 0, Actual start: 25 | Expected duration: 1000, Actual duration: 1022 [1]
Expected start: 1000, Actual start: 37 | Expected duration: 2000, Actual duration: 1023 [2]
Expected start: 3000, Actual start: 37 | Expected duration: 3000, Actual duration: 1023 [3]
Expected start: 6000, Actual start: 37 | Expected duration: 2000, Actual duration: 1023 [2]
Expected start: 8000, Actual start: 37 | Expected duration: 5000, Actual duration: 1030 [5]
Expected start: 13000, Actual start: 38 | Expected duration: 1000, Actual duration: 1048 [1]
Expected start: 14000, Actual start: 38 | Expected duration: 1000, Actual duration: 2000 [1]
Expected start: 15000, Actual start: 38 | Expected duration: 2000, Actual duration: 2001 [2]
Expected start: 17000, Actual start: 38 | Expected duration: 3000, Actual duration: 2001 [3]
Expected start: 20000, Actual start: 38 | Expected duration: 2000, Actual duration: 2001 [2]
Expected start: 22000, Actual start: 38 | Expected duration: 1000, Actual duration: 2002 [1]
Expected start: 23000, Actual start: 38 | Expected duration: 2000, Actual duration: 2003 [2]
Expected start: 25000, Actual start: 38 | Expected duration: 4000, Actual duration: 2002 [4]
Expected start: 29000, Actual start: 38 | Expected duration: 2000, Actual duration: 2997 [2]
Expected start: 31000, Actual start: 38 | Expected duration: 1000, Actual duration: 2998 [1]
Expected start: 32000, Actual start: 38 | Expected duration: 5000, Actual duration: 2999 [5]
Expected start: 37000, Actual start: 39 | Expected duration: 6000, Actual duration: 3997 [6]
Expected start: 43000, Actual start: 39 | Expected duration: 1000, Actual duration: 4997 [1]
Expected start: 44000, Actual start: 39 | Expected duration: 3000, Actual duration: 4998 [3]
Expected start: 47000, Actual start: 39 | Expected duration: 6000, Actual duration: 5997 [6]
Expected start: 53000, Actual start: 39 | Expected duration: 2000, Actual duration: 5997 [2]

from language-ext.

JosVroegindeweij avatar JosVroegindeweij commented on June 11, 2024

This shows basically the same results, but now accurate to the point where there are exactly as many ones, twos, etc. in the input Seq as in the list of actual durations, just ordered differently. From there, wouldn't it make more sense to assume that the Actual duration in these logs does not match the Expected duration (does not match as in they are from different Task.Delay executions), for whatever reason, rather than assuming that a 5000ms delay suddenly takes only 1030ms? It seems weirdly coincidental to me that the list of Actual duration is ordered perfectly.

The weirdness about not waiting for the previous iteration to finish still remains of course.

from language-ext.

louthy avatar louthy commented on June 11, 2024

Yeah, the numbers are wrong because there's a degree of concurrency that shouldn't be happening!

from language-ext.

harrhp avatar harrhp commented on June 11, 2024

The results are utterly bizarre:

Expected start: 0, Actual start: 17 | Expected duration: 1000, Actual duration: 1034 
Expected start: 1000, Actual start: 29 | Expected duration: 2000, Actual duration: 1034 
Expected start: 3000, Actual start: 30 | Expected duration: 3000, Actual duration: 1034 
Expected start: 6000, Actual start: 30 | Expected duration: 2000, Actual duration: 1034 
Expected start: 8000, Actual start: 30 | Expected duration: 5000, Actual duration: 1033 
Expected start: 13000, Actual start: 31 | Expected duration: 1000, Actual duration: 1997 
Expected start: 14000, Actual start: 31 | Expected duration: 1000, Actual duration: 2001 
Expected start: 15000, Actual start: 31 | Expected duration: 2000, Actual duration: 2000 
Expected start: 17000, Actual start: 31 | Expected duration: 3000, Actual duration: 2000 
Expected start: 20000, Actual start: 31 | Expected duration: 2000, Actual duration: 2000 
Expected start: 22000, Actual start: 31 | Expected duration: 1000, Actual duration: 2000 
Expected start: 23000, Actual start: 31 | Expected duration: 2000, Actual duration: 1999 
Expected start: 25000, Actual start: 31 | Expected duration: 4000, Actual duration: 2998 
Expected start: 29000, Actual start: 31 | Expected duration: 2000, Actual duration: 3001 
Expected start: 31000, Actual start: 31 | Expected duration: 1000, Actual duration: 3000 
Expected start: 32000, Actual start: 32 | Expected duration: 5000, Actual duration: 3999 
Expected start: 37000, Actual start: 32 | Expected duration: 6000, Actual duration: 4998 
Expected start: 43000, Actual start: 32 | Expected duration: 1000, Actual duration: 5000 
Expected start: 44000, Actual start: 32 | Expected duration: 3000, Actual duration: 5998 
Expected start: 47000, Actual start: 32 | Expected duration: 6000, Actual duration: 5998 

This is because Started and Duration do not correlate with input. I added index to them and sorted by index before printing results and got expected results

using System.Diagnostics;
using LanguageExt;
using static LanguageExt.Prelude;

var seq = Seq(1, 2, 3, 2, 5, 1, 1 , 2, 3, 2, 1, 2, 4, 2, 1, 5, 6, 1, 3, 6, 2);

await Test.Run(seq);

public class Test
{
    public static Stopwatch All;
    static AtomSeq<(int, int)> Started = AtomSeq<(int, int)>();
    static AtomSeq<(int, int)> Duration = AtomSeq<(int, int)>();

    public static async Task Run(Seq<int> input)
    {
        All = Stopwatch.StartNew();
        var ma = Seq(input.Select(DoDelay)).SequenceSerial();
        var right = (await ma).RightToArray()[0];
        Debug.Assert(right.SequenceEqual(input));
        All.Stop();

        ShowResults(input);
    }

    static EitherAsync<string, int> DoDelay(int seconds, int index)
    {
        return F(seconds, index).ToAsync();

        static async Task<Either<string, int>> F(int seconds, int index)
        {
            Started.Add(((int)All.ElapsedMilliseconds, index));
            var sw = Stopwatch.StartNew();
            await Task.Delay(seconds * 1000);
            sw.Stop();
            Duration.Add(((int)sw.ElapsedMilliseconds, index));
            Console.Write("X ");
            return seconds;
        }
    }

    public static void ShowResults(Seq<int> input)
    {
        Console.WriteLine();
        Console.WriteLine($"Total elapsed time: {All.Elapsed}");
        var expectedStart = 0;

        Console.WriteLine($"Inputs: {input.Length}");
        Console.WriteLine($"Starts: {Started.Length}");
        Console.WriteLine($"Durtns: {Duration.Length}");

        foreach (var result in input.Zip(Started.OrderBy(x => x.Item2).Zip(Duration.OrderBy(x => x.Item2)))
                     .Map(t => (t.Item1, t.Item2.Item1, t.Item2.Item2)))
        {
            var expectedDuration = result.Item1 * 1000;
            var actualDuration = result.Item3;
            var actualStart = result.Item2;
            Console.WriteLine(
                $"Expected start: {expectedStart}, Actual start: {actualStart} | Expected duration: {expectedDuration}, Actual duration: {actualDuration} [{result.Item1}]");
            expectedStart += expectedDuration;
        }
    }
}
X X X X X X X X X X X X X X X X X X X X X 
Total elapsed time: 00:00:06.0156019
Inputs: 21
Starts: 21
Durtns: 21
Expected start: 0, Actual start: (2, 0) | Expected duration: 1000, Actual duration: (1003, 0) [1]
Expected start: 1000, Actual start: (4, 1) | Expected duration: 2000, Actual duration: (2008, 1) [2]
Expected start: 3000, Actual start: (4, 2) | Expected duration: 3000, Actual duration: (3009, 2) [3]
Expected start: 6000, Actual start: (4, 3) | Expected duration: 2000, Actual duration: (2008, 3) [2]
Expected start: 8000, Actual start: (4, 4) | Expected duration: 5000, Actual duration: (5004, 4) [5]
Expected start: 13000, Actual start: (4, 5) | Expected duration: 1000, Actual duration: (1001, 5) [1]
Expected start: 14000, Actual start: (4, 6) | Expected duration: 1000, Actual duration: (1001, 6) [1]
Expected start: 15000, Actual start: (4, 7) | Expected duration: 2000, Actual duration: (2008, 7) [2]
Expected start: 17000, Actual start: (4, 8) | Expected duration: 3000, Actual duration: (3009, 8) [3]
Expected start: 20000, Actual start: (4, 9) | Expected duration: 2000, Actual duration: (2008, 9) [2]
Expected start: 22000, Actual start: (4, 10) | Expected duration: 1000, Actual duration: (1001, 10) [1]
Expected start: 23000, Actual start: (4, 11) | Expected duration: 2000, Actual duration: (2008, 11) [2]
Expected start: 25000, Actual start: (4, 12) | Expected duration: 4000, Actual duration: (4013, 12) [4]
Expected start: 29000, Actual start: (4, 13) | Expected duration: 2000, Actual duration: (2008, 13) [2]
Expected start: 31000, Actual start: (4, 14) | Expected duration: 1000, Actual duration: (1001, 14) [1]
Expected start: 32000, Actual start: (4, 15) | Expected duration: 5000, Actual duration: (5004, 15) [5]
Expected start: 37000, Actual start: (4, 16) | Expected duration: 6000, Actual duration: (6005, 16) [6]
Expected start: 43000, Actual start: (4, 17) | Expected duration: 1000, Actual duration: (1001, 17) [1]
Expected start: 44000, Actual start: (4, 18) | Expected duration: 3000, Actual duration: (3009, 18) [3]
Expected start: 47000, Actual start: (4, 19) | Expected duration: 6000, Actual duration: (6005, 19) [6]
Expected start: 53000, Actual start: (4, 20) | Expected duration: 2000, Actual duration: (2008, 20) [2]

also, if you use var ma = Enumerable.Select(input, DoDelay).SequenceSerial(); you'll get expected one by one execution, so Seq.Select/Map are not really lazy like Enumerable extensions

from language-ext.

louthy avatar louthy commented on June 11, 2024

so Seq.Select/Map are not really lazy like Enumerable extensions

Yes they are lazy in exactly the same way as enumerables. Seq supports both lazy and strict sequences and Map (Select) and other operators all work in a lazy way, like enumerables.

Here's the map function which shows exactly how they're like lazy enumerables ... because internally SeqLazy iterates an IEnumerable!

    public Seq<B> Map<B>(Func<A, B> f)
    {
        return new Seq<B>(new SeqLazy<B>(Yield(this)));
        IEnumerable<B> Yield(Seq<A> items)
        {
            foreach (var item in items)
            {
                yield return f(item);
            }
        }
    }

and got expected results

No, you didn't. The expected result is that the process takes 55 seconds - yours takes 6 seconds just like all the previous attempts. That is what I meant by "the results are utterly bizarre" - the process should take 55 seconds, if each iteration was truly done in sequence, but there seems to be some concurrency coming in that isn't requested (potentially a compiler optimisation, but who knows - I'll dig more once I've finished my v5 mega work).

from language-ext.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.