Monday, 21 April 2008

Compressing UpdatePanel output

Update: I've posted the source and binaries for this control on Github. Free to use and change. Please comment on it.

This is the story of a control that shrinks the content sent from an UpdatePanel to down as 2% without using compression algorithms. I am willing to share the code with whoever wants it and I only ask in return to tell me if and where it went wrong so I can find a solution. Even if you are not interested in the control, the article describes a little about the inner workings of the ASP.Net Ajax mechanism.

You know the UpdatePanel, the control that updates its content asynchronously in ASP.Net, allowing you to easily transform a normal postback based application in a fully fledged Ajax app. Well, the only problem with that control is that you have to either put a lot of them on the page in order to update only what changes (making the site be also fast as you would expect from an Ajax application) but hard to maintain afterwards, or put a big UpdatePanel on the entire page (maybe in the MasterPage) and allow for large chunks of data to be passed back and forth and also other clear disadvantages, some detailed in this blog entry.

Not anymore! I have made a small class, in the form of a Response.Filter, that caches the previously rendered content and instructs the browser to do the same, then sends only a small fraction of the data from the server to the browser, mainly what has changed. There is still the issue of the speed it takes the browser to render the content, which is the same no matter what I do, like when rendering a huge table. It doesn't matter that I send only the changes in one cell, the browser must still render the huge grid. Also, if, for some reason, the update fails, I catch it and I send to the server that the updatepanel must be updated again, the old way.

Enough; let's talk code. I first had to tap into the content that was sent to the browser. That can only be done at Page render level (or PageAdapter, or Response.Filter and other things that can access the rendered content). So I did catch the rendered content in a filter, I recognized it as Ajax by its distinctive format, and I only processed the updatePanel type of token.

Here I had a few problems. First I replaced the updatePanel token with a scriptBlock token that changed the innerHTML of the update panel div element. It all seemed to work until I tested it a little. I discovered that the _updatePanel javascript method of the PageRequestManager object used by the normal ajax rendering on the browser was doing a few extra things, so I used that one instead of just replacing the innerHTML, resulting in a lower speed. But that didn't help either, because it failed when using validators. Even if I did replace the updatePanel token with a correct javascript block, it still got executed a bit later than it should have.

The only solution I had was to replace the _updatePanel method with my own. Itself having a small block of code that disposed some scriptblocks and some other stuff, then a plain innerHTML replace, I could not 'override' it, since it would change the innerHTML with some meaningless stuff (the thing I would send from the server), then I would parse and change the innerHTML again, resulting in bad performance, flickering, nonsense on screen, etc. So I just copy pasted the first part and added my own ApplyPatch function instead of the html replace code line.

Now, here I met another issue. The innerHTML property of an html element is not a simple string. It gets parsed immediately when set and it recreates when read, as explained in this previous article of mine. The only solution for that was create my own property of the update panel div element that remembers the string that was set. This solved a lot more problems, because it meant I could identify the content to be replaced by simple position markers rather than through regular expressions (as was my initial idea). That property would not get changed by custom local javascript either, so I was safe to use it.

About the regular expression engine in Javascript: it has no Singleline option. That means you can only change the content line by line. I could have used Steve Levithan's beautiful work, but with the solution found above, I would not need regular expressions at all.

The only other issue was with UpdatePanels inside UpdatePanels. I found out that in this case, only parent UpdatePanels are being rendered. That meant that the custom property I added to the child panel would disappear and break my code. Therefore I had to keep a tree of the updatepanels in the page and clear all the children cached content when the parents were being updated. So I did that, too.

What else? What if somehow the property would get deleted, changed, or something else happened, like someone decided to recreate the update panel div object or something like that? For that I made a little HttpHandler that would receive an UpdatePanel id and it would clear its cached content. Then, on return from the asynchronous call, the javascript would just push another update panel refresh using __doPostBack(updatePanelId,""). I don't particularily like this approach, since it could back fire with multiple UpdatePanels (as you know, only one postback at a time is supported), but I didn't find a better solution yet. Besides, this event should normally not happen.

So, the mechanism was all in place, all I had to do was make the actual patching mechanism, the one that would find the difference between previously rendered content and current content, then send only the changed part. First thing I did was remove the start and end of the strings that were identical. As you can imagine, that's the most common scenario: a change in the UpdatePanel means all the content up to the change remains unchanged and the same goes for the content after the change. But I was testing the filter with a large grid that would randomly change one cell to a random string. That meant two changes: the previous position and the last. Assuming the first change was in one of the starting cells and the last was in one of the cells at the end, then the compression would be meaningless. So I've googled for an algorithm that would give me the difference between two files/strings and I found Diff! Well, I knew about it so I actually googled for Diff :) It was in the Longest Common Substring algorithm category.

Anyway, the algorithm was nice, clear, explained, with code, perfect for what I wanted and completely useless, since it needed an array of m*n to get what I needed. It was slow, a complete memory hog and I couldn't possibly use an array of 500000x500000. I bet they were optimizations that covered this problem, but I was miserable so I just patched up my own messy algorithm. What would it do? It would randomly select a 100 characters long string from the current content and search for it in the previous content. If it found it, it would expand the selection and consider it a Reasonably Long Common Substring and work from then on recursively. If it didn't find it, it would search a few times other randomly chosen strings then give up. Well, actually is the same algorithm, made messy and with no extra memory requirements.

It worked far better than I had expected, even if it clearly could have used some optimizations. The code was clear enough in detriment of speed, but it still worked acceptably fast, with no noticeable overhead. After a few tweaks and fixes, the class was ready. I've tested it on a project we are working on, a big project, with some complex pages, it worked like a charm.

One particular page used a control I have made that allows for grid rows and columns to have children that can be shown/hidden at will. When collapsing a column (that means that every row gets some cells removed) the compressed size was still above 50% in up to 100 patch fragments. When colapsing a row, meaning some content from the middle of the grid would just vanish, the size went down to 2%! Of course, putting the ViewState into the session also helped. Gzip compression on the server would complement this nicely, shrinking the output even more.

So, I have demonstrated incredible compression of UpdatePanel content sent through the network with something as small and reusable as a response filter that can be added once in the master page. You could use it for customers that have network bandwidth issues or for sites that pay for sent out content. It would with sites made with one big UpdatePanel placed in the MasterPage as well :).

If you want to use it in your sites, please let me know how it performs and what problems you've encountered.

12 comments:

  1. Why not just share your discovery with everyone? If your intent is to sell it, then you could state that clearly here; that you're looking for beta testers, rather than offer it in goodwill but in some closed fashion.

    ReplyDelete
  2. It is not my intent to sell it and I want to know the people that are interested in my code, if any. And since it is work in progress and I am too lazy to get a sourceforge account or whatever, I offer it on demand. Happy?

    ReplyDelete
  3. I'd love to take a look at this and see how it stands up!

    ReplyDelete
  4. Well, I've finally moved off the couch and created an archive that is not dependent on any of my libraries and has some extra comments in it and the code doesn't completely suck.

    Check out the updated post content, at the end there is a link to the project with sources and everything.

    ReplyDelete
  5. Pretty please tell me if you have found any improvements, problems, success stories, etc.

    ReplyDelete
  6. This looks fantastic! I've not had an in-depth look as of yet, but I am quite impressed with the amount of work you have obviouly put into this.

    ReplyDelete
  7. Interesting concept. I wonder where you got the idea from?!? What problem necessitated this solution?

    I've studied the .NET code and your ApplyPatch method and I have a couple comments.

    First, you're right in saying that this would be extremely useful for those who pay by the MB for data transfer. If my previous content is "abc" and my new content is "aec" all you send down is the "e" with information about the lengths of the sections. Then using, the maintained client information you rebuild the UpdatePanel's (div tag) full innerHTML and apply it to the innerHTML property. Your 98% savings doesn't sound impossible.

    The downside for this is server processing time and memory and client processing time and memory.

    On the client, each div element has to maintain a string for its previous content. That increases the browser's memory footprint (how much depends on the string size). Also on the client, I have to recombine the different pieces on the client to recreate the div's innerHTML. That increases client processing time (how much depends on how many things changed) and I don't save rendering time because I'm still replacing all of the innerHTML. (Use Sys.Stringbuilder to minimize the reallocation of the string's memory in that method.)

    How about trying to find the parent node of the content you're going to replace rather than recombining the entire div's innerHTML and updating its innerHTML? It might be possible some of the time if there was a DOM element id you could look for to help you find the correct place to grab the node or something.

    So if I have x.innerHTML == "abc" and the new content will read x.innerHTML == "aec" rather than do x.innerHTML = "aec" I do a.innerHTML = "e". That would be seriously awesome because then even for those of us who while concerned with data transfer size hold overall performance as the most important aspect, this would be useful as the rendering time would be dramatically reduced.

    As for the server, memory is cheap and so is processing power. I'd be curious how much a hit on requests/sec the response filter causes, though.

    ReplyDelete
  8. Oh, I was so used to getting emails from the comments I got that I completely missed this one.
    Sorry, Joel! And thanks for the comment. I will try to reply as best as I can.

    The idea came from a site that said putting the entire page inside a giant updatePanel was a bad practice. I wondered, why? Why can't the UpdatePanel know what it sent, then send only updates to the client?

    So my first attempt was by simply updating innerHTML. However, that property is actually recomputed after the html is parsed. Not to mention that it is not a DOM specification property. So the innerHTML was not usable. Since I have done so much work already, I dirty fixed it by sending the initial content as a div property.

    The second problem issues directly from this, because I can't control the div property from a normal page load. So this shrink would only work from the second asynchronous postback on. I could not find a good solution for this.

    The memory footprint of the client... I thought about it. But what are the chances that someone would do frequent ajax updates on something that is so big it can't fit twice in memory?

    The speed of the client recombining is not a big issue. I have tested it with 500 chunks and it worked ok. Besides, that was before the big "Javascript revolution" that promises speeds of up to 3 times faster and graphic rendering in Javascript :)

    Good idea for the Sys.Stringbuilder thing. I will look into it when having a little spare time.

    I also thought of looking for the parent node of the change I want to send, but that, besides meaning a lot of HTML parsing on both client and server (right now it's all fast string operations), it would mean trying to figure out what controls keep their state and which don't. Because recreating the control tree of an UpdatePanel means that the elements that were present before are destroyed, along with their object values, then recreated. So it can't be done. More than that, the problem is thorny with certain types of html elements, like tables. One cannot add elements or innerHtml to a TR.

    I never tried the solution on a production site. It might slow things a little, but then I don't believe to do it by much. Even the diff algorithm I've used is something I cooked up to work partially and fast rather than completely and slow.

    Hope to hear from you again.

    ReplyDelete
  9. Hi, Could someone help me how to use the UpdatePanelShrinkerFilter class.I used on Page_Load but I am not seeing any thing cached. I appreciate if any one can help to use this.

    ReplyDelete
  10. Sure, I can help. What exactly did you do? Did you run the site in debug mode and watched the Output window in Visual Studio?

    ReplyDelete
  11. Hi Siderite, I am tapploo who just asked your help in my last post. I have a page where all the controls are in updatepanel. We have written postback code (OnSelectedIndexChanged event) in code behind when a user selects a value from dropdown and populate some area on screen. When user again change the value in dropdown it repopulate the valus on the screen without giving user postback experience. Now I implemented UpdatePanelShrinkerFilter on OnInit() in my page. I can debug the code and see that code is executed fine. Now how can I see the difference between using a filetr and without a filter. I am using HttpWatch tool to monitor Request-Response..I appreciate your help.

    ReplyDelete
  12. The only output that UpdatePanelShrinker is giving is in the Debug Output window of Visual Studio, meaning that it writes using Debug.WriteLine the memory footprint in size and number of items (cached pages), patch fragment count (how many pieces of information were sent to be reassembled) and shrinkage (percentage of data sent to the client from the size that was normally sent).

    The shrinker uses a command handler installed in Web.Config. If you didn't add it, it will try to add it itself, given it has write rights to web.config. That also means the site will get rebuilt, so it's not the most elegant solution. Best is to add it based on the instructions on CodePlex.

    Please let me know if there are any other problems you have and if not, tell me if you are satisfied or need something changed in the tool.

    ReplyDelete