Should the BCL add an IEnumerableWithCount<T>?

NB: I’ve written and thrown away this blog post twice before in the past, as I would convince myself it was in the category of micro-optimization, but I’m finally going to post it this time, if only to get feedback to confirm that it’s a bad idea. Smile

There’s been a few times when it would be nice to have something be IEnumerable<T> but keep the data on how many items will be in the ‘stream’.  Certainly there are many cases where that’s not known ahead of time, and for those our existing IEnumerable<T> works perfectly. Smile

One thing in particular about IEnumerableWithCount<T> is that it could still be covariant like IEnumerable<T> is since we would only need to add a property with a getter.  That is a big difference from ICollection<T>, which is the current ‘lowest-level’ option for a collection interface that includes count.  ICollection<T> means you have to support lots of ‘write’ operations (Add/Clear/Remove) so it’s not what we want if we just want to have a ‘stream’ that has the additional ‘metadata’ of the count of items.

Even more more importantly, IEnumerableWithCount<T> would still support deferred-execution since it inherits from IEnumerable<T>

One key bit is that this interface will be able to ‘go between’ IEnumerable<T> and ICollection<T> – ICollection<T> (and therefore IList<T> and so forth) could also implement IEnumerableWithCount<T> (would just need to declare it, since it already has a Count property that matches).  This means most (if not all) of the collections we’re used to will get this interface ‘for free’.

Somewhat unfortunately, the canonical example of where an IEnumerableWithCount would have benefit is in the Select method of LINQ.

Much of LINQ-to-Objects includes interface checks to optimize performance if the input object supports a higher-level interface than the method requires.

For instance, let’s look at ToArray:

The real work is done in the ctor of Buffer, which is where we can see the first place that would use our new IEnumerableWithCount<T>:

This is where we see one of the places that could change if we added the new interface.

Currently the perf optimization is based on checking for ICollection<TElement> – if that interface is implemented by the source, then we get to skip all the reallocations (since we don’t know the final size) and create the final array directly.

The code also uses ICollection<T>’s CopyTo method to copy the actual elements, and since that seems out of scope for an IEnumerableWithCount, we’d probably just have 2 paths and IEnumerableWithCount would foreach, so that situation would be a little less CPU-efficient (but still better than today’s existing IEnumerable path in the code, since , but saving the reallocations (and generated garbage) is still a big potential perf win.

For the LINQ benefit in particular, we’d likely want a static System.Linq.EnumerableWithCount (similar to Enumerable and Queryable) that implemented the particular methods that can be done ‘better’ with count information.e

What methods would our EnumerableWithCount static class support?  These are the methods being used with LINQ-to-Objects that would benefit from having the compiler using an implementation by our EnumerableWithCount class instead of the current implementation in Enumerable.  If it’s not in this list, then falling back to the Enumerable version is fine (much like some methods aren’t implemented by Queryable), which is what the compiler will do for us.

  • Any method that doesn’t affect the number of items when it acts on the enumerable, of course
    • Select is the method that sticks out, certainly
    • OrderBy / OrderByDescending
    • ThenBy / ThenByDescending
    • Reverse
    • Cast
  • Concat can also benefit, as we’d be able to know the target item count in such a situation without enumerating (so concat’ing multiple inputs and then ToArray would also save all those reallocations)
  • Similarly, Skip and Take would benefit, since we’d know the resulting count due to both – in paging scenarios, for instance, this is very common
  • Count/LongCount is similar in its benefit, as it would be able to just return the Count property without actually enumerating.
  • DefaultIfEmpty would know if it’s already empty or not
  • Zip would know the final count (since it’s Min of the input counts)
  • SequenceEqual could check count mismatch first
  • Various methods that could either throw or return default(T) based on the count instead of having to iterate first, like:
    • ElementAt/ElementAtOrDefault
    • First/FirstOrDefault
    • Last/LastOrDefault
    • Single/SingleOrDefault (which would be able to throw for count > 1 without needing to enumerate twice)
  • ElementAt could throw on “index out of range” immediately without iterating
  • ElementAtOrDefault could similarly know whether to return default immediately
  • First/FirstOrDefault could similarly throw or return default without iterating
  • ToDictionary would know the final count and could preallocate buckets better
  • Range/Repeat/Empty would all be able to create this interface since the final count is known
  • Average would know the count to divide by without incrementing a counter 🙂

Some remaining questions:

  • If the compiler supported IEnumerableWithCount<T> (using the same ‘yield return’ syntax), how would it work?  Would the method declaration need an out param for count?
  • Should we create a matching IEnumeratorWithCount<T> as well?  I don’t think it’s necessary, but not really sure at this point.
Advertisements

Python gets right what Java already has, and .NET still hasn't

In the Java world, you have the base type of java.lang.Throwable, and the subclasses of java.lang.Error (you basically never catch this, at least not in application code) and java.lang.Exception (this you can use to your heart’s content).

In .NET, for various reasons (and yes, I’ve had threads internally about this, including with Chris), there’s no such split of the hierarchy.  There’s the base type System.Exception.  There’s also a subclass called ApplicationException, where the idea was that we’d do our exception split there with a SystemException and ApplicationException split.  However, even we couldn’t keep that straight, so now even we consider it useless.

Why bother bringing this all up now?

Because it looks like Python 3.0 is going to get it right.  They’re doing what I think .NET should do at this point – give Exception a new superclass.  Is it trivial?  No.  Will it break existing source?  Yes.  Is it the right long-term answer?  I think so.

http://www.onlamp.com/pub/a/python/2006/10/26/python-25.html

In addition to the deprecation of raising strings as exceptions, Python 2.5 also rearranged the exception class hierarchy. The new BaseException class is the base class for Exception. You should now derive all your custom exception classes from Exception, or from a class derived from Exception.

Two other exception classes, KeyboardInterrupt and SystemExit, also extend BaseException, but not Exception. Thus, you can trap your own exceptions, and then trap all other exceptions derived from Exception. This will trap every exception except KeyboardInterrupt and SystemExit. Why would you not want to trap those two exceptions? That way your program will end when the user presses Ctrl+C without you having to manually catch and respond to Ctrl+C, or without you having to manually skip catching Ctrl+C to let the system handle it. The same is true with the SystemExit exception, which should force the program to stop.

This particular new feature, then, is less of a feature and more of a requirement: Recode your exception classes to extend Exception. Then you’ll be fine when Python 3.0 arrives.

.net reflection of privates

On IRC, there was someone who had come to the (incorrect) conclusion that unlike Java, good ol .NET reflection could only see public members.  Nevermind that reflection would arguably be useless in such situations (outside of dynamic language-type needs), but the (simple) answer is to use the BindingFlags parameter to specify Instance (default is static) and NonPublic (default is public) when looking for targets with those properties.

23:40 [ Zero] I think I’ve found the problem.  Reflection can only see public members.

Not sure how this will copy-paste, but hopefully you’ll get the point:

using System;
using System.Reflection;

namespace Sample
{
    class Program
    {
        static void Main(string[] args)
        {
            Type targetType = typeof(HasPrivateMethodFoo);
            ConstructorInfo ctor = targetType.GetConstructor(BindingFlags.NonPublic | BindingFlags.Instance, null, new Type[0], null);
            MethodInfo method = targetType.GetMethod(“Foo”, BindingFlags.Instance | BindingFlags.NonPublic);
            object instance = ctor.Invoke(new object[0]);
            method.Invoke(instance, new object[] { “string to print” });
        }
    }

    class HasPrivateMethodFoo
    {
        private HasPrivateMethodFoo()
        {
            // do nothing
        }
        private void Foo(string bar)
        {
            Console.WriteLine(bar);
        }
    }
}

explaining why ClickOnce isn't working in FireFox

Great (full) explanation of exactly what’s going on.  Unfortunately the lack of ETA makes this a bit painful, but at least people know they’re working on it 🙂

ClickOnce and FireFox

In the V2.0 release of the Framework, ClickOnce does not have support for FireFox.
[…]
When a user clicks on a .application in FireFox the FireFox equivalent
of the Open/Save dialog comes up. Once the .application file is
downloaded to the local macine (to the FireFox cache on Open and to a
user specified location on Save) it is run form there firing up
ClickOnce. ClickOnce now parses the locally downloaded .application and
tries to download the actual application manifest it refers to. If the
.application contains a relative path to the application manifest
ClickOnce will
try to find it relative to the .application in the
FireFox temp folder and fail. If it is a full Url to the application
manifest ClickOnce fails anyway
, this time due to a security check we
have that does not allow the .application and the corresponding
application bits to be in different security zones.

Technorati Tags: , ,

software as a service – the coming wave

While the debate continues on what percentage of the overall market SaaS will be, it’s clearly something to keep an eye on 🙂

SaaS is a journey, walk with us

For now, I want to highlight what Microsoft is bringing to the table in terms of SaaS architecture guidance.

If I can try to summarize the key areas where architects should spend their time, it would be the following:

  • Scale the application
  • Enable multi-tenant data
  • Facilitate customization

ASP.NET parameters – User.Identity.Name?

I’m not sure what the actual answer is (yet), but I wonder whether I can bind User.Identity.Name as a parameter like I can profile/session/form/etc. parameters in ASP.NET 2.0.

Currently I just read it into a session var called UserName and bind that.  Simple enough, just makes me wonder 🙂

Technorati Tags: ,

ClickOnce and permission elevation

I’ve been coding up some simple apps that I’m likely to share as ClickOnce apps, and I was very glad to hear that the RTM bits don’t require Authenticode!

ClickOnce and permission elevation prompting in the internet zone

The Decision – 
With the .Net Framework V2.0 release of ClickOnce,
any ClickOnce App deployed from the internet zone can prompt the user
for permission elevation.
[…]
Let’s consider the scenario below …
Jen is a .Net entusiast and a golf fanatic. She writes a .Net Golf Handicap calculator that unfortuantely needs Intranet (Not Internet) zone permissions to run. Jen wants to share this App on her homepage with her golfing friends and would also like them to get updates as she adds new functionality to her program; ClickOnce is the ideal choice of deployment technology for her.