Saturday, 24 December 2016

Deserializing base type property with JSON.Net and having it work in WCF as well

Flexible JSON input Sometimes we want to serialize and deserialize objects that are dynamic in nature, like having different types of content based on a property. However, the strong typed nature of C# and the limitations of serializers make this frustrating.


In this post I will be discussing several things:
  • How to deserialize a JSON property that can have a value of multiple types that are not declared
  • How to serialize and deserialize a property declared as a base type in WCF
  • How to serialize it back using JSON.Net

The code can all be downloaded from GitHub. The first phase of the project can be found here.

Well, the scenario was this: I had a string in the database that was in JSON format. I would deserialize it using JSON.Net and use the resulting object. The object had a property that could be one of several types, but no special notation to describe its concrete type (like the $type notation for JSON.Net or the __type notation for DataContractSerializer). The only way I knew what type it was supposed to be was a type integer value.

My solution was to declare the property as JToken, meaning anything goes there, then deserialize it manually when I have both the JToken value and the integer type value. However, passing the object through a WCF service, which uses DataContractSerializer and which would throw the exception System.Runtime.Serialization.InvalidDataContractException: Type 'Newtonsoft.Json.Linq.JToken' is a recursive collection data contract which is not supported. Consider modifying the definition of collection 'Newtonsoft.Json.Linq.JToken' to remove references to itself..

I thought I could use two properties that pointed at the same data: one for JSON.Net, which would use JToken, and one for WCF, which would use a base type. It should have been easy, but it was not. People have tried this on the Internet and declared it impossible. Well, it's not!

Let's work with some real data. Database JSON:
{
"MyStuff": [
{
"Configuration": {
"Wheels": 2
},
"ConfigurationType": 1
},
{
"Configuration": {
"Legs": 4
},
"ConfigurationType": 2
}
]
}

Here is the code that works:
[DataContract]
[JsonObject(MemberSerialization.OptOut)]
public class Stuff
{
[DataMember]
public List<Thing> MyStuff;
}

[DataContract]
[KnownType(typeof(Vehicle))]
[KnownType(typeof(Animal))]
public class Configuration
{
}

[DataContract]
public class Vehicle : Configuration
{
[DataMember]
public int Wheels;
}

[DataContract]
public class Animal : Configuration
{
[DataMember]
public int Legs;
}

[DataContract]
public class Thing
{
private Configuration _configuration;
private JToken _jtoken;

[DataMember(Name = "ConfigurationType")]
public int ConfigurationType;

[IgnoreDataMember]
public JToken Configuration
{
get
{
return _jtoken;
}
set
{
_jtoken = value;
_configuration = null;
}
}

[JsonIgnore]
[DataMember(Name = "Configuration")]
public Configuration ConfigurationObject
{
get
{
if (_configuration == null)
{
switch (ConfigurationType)
{
case 1: _configuration = Configuration.ToObject<Vehicle>(); break;
case 2: _configuration = Configuration.ToObject<Animal>(); break;
}
}
return _configuration;
}
set
{
_configuration = value;
_jtoken = JRaw.FromObject(value);
}
}
}

public class ThingContractResolver : DefaultContractResolver
{
protected override JsonProperty CreateProperty(System.Reflection.MemberInfo member,
Newtonsoft.Json.MemberSerialization memberSerialization)
{
var prop = base.CreateProperty(member, memberSerialization);
if (member.Name == "Configuration") prop.Ignored = false;
return prop;
}
}

So now this is what happens:
  • DataContractSerializer ignores the JToken property, and doesn't throw an exception
  • Instead it takes ConfigurationObject and serializes/deserializes it as "Configuration", thanks to the KnownTypeAttributes decorating the Configuration class and the DataMemberAttribute that sets the property's serialization name (in WCF only)
  • JSON.Net ignores DataContract as much as possible (JsonObject(MemberSerialization.OptOut)) and both properties, but then un-ignores the Configuration property when we use the ThingContractResolver
  • DataContractSerializer will add a property called __type to the resulting JSON, so that it knows how to deserialize it.
  • In order for this to work, the JSON deserialization of the original text needs to be done with JSON.Net (no __type property) and the JSON coming to the WCF service to be correctly decorated with __type when it comes to the server to be saved in the database

If you need to remove the __type property from the JSON, try the solution here: JSON.NET how to remove nodes. I didn't actually need to do it.

Of course, it would have been simpler to just replace the default serializer of the WCF service to Json.Net, but I didn't want to introduce other problems to an already complex system.


A More General Solution


To summarize, it would have been grand if I could have used IgnoreDataMemberAttribute and JSON.Net would just ignore it. To do that, I could use another ContractResolver:
public class JsonWCFContractResolver : DefaultContractResolver
{
protected override JsonProperty CreateProperty(System.Reflection.MemberInfo member,
Newtonsoft.Json.MemberSerialization memberSerialization)
{
var prop = base.CreateProperty(member, memberSerialization);
var hasIgnoreDataMember = member.IsDefined(typeof(IgnoreDataMemberAttribute), false);
var hasJsonIgnore = member.IsDefined(typeof(JsonIgnoreAttribute), false);
if (hasIgnoreDataMember && !hasJsonIgnore)
{
prop.Ignored = false;
}
return prop;
}
}
This class now automatically un-ignores properties declared with IgnoreDataMember that are also not declared with JsonIgnore. You might want to create your own custom attribute so that you don't modify JSON.Net's default behavior.

Now, while I thought it would be easy to implement a JsonConverter that works just like the default one, only it uses this contract resolver by default, it was actually a pain in the ass, mainly because there is no default JsonConverter. Instead, I declared the default contract resolver in a static constructor like this:
JsonConvert.DefaultSettings = () => new JsonSerializerSettings
{
ContractResolver=new JsonWCFContractResolver()
};

Now the JSON.Net operations don't need an explicitly declared contract resolver. Updated source code on GitHub


Links that helped

Configure JSON.NET to ignore DataContract/DataMember attributes
replace WCF built-in JavascriptSerializer with Newtonsoft Json.Net json serializer
Cannot return JToken from WCF Web Service
Serialize and deserialize part of JSON object as string with WCF

Tuesday, 20 December 2016

The truth about post truth

Look at that title! If that doesn't trend on social media, I don't know what will.

Nothing spreads memes faster than an American election. The term post-truth was named the word of 2016 by Oxford Dictionaries, which is funny, considering the elections went on at the end if the year and that Oxford is in England, but it only emphasizes the impact that it had. However, like the cake, it is a lie. The truth of the matter is that Americans, like all masses of people, can't handle the truth, so they invent a more comfortable reality in which to dwell.

When Trump became president, people needed somebody to blame, someone other than themselves, of course. After all, only good things happen because of you and God, because you're awesome! And God, too, I'm sure. *The people* would never ever vote a narcissistic entertainer as their supreme leader, clearly. So they blamed social media. Now, I am not a fan of social media, but considering I've been blogging for more than a decade, I am not against it either. I've only recently and reluctantly joined Facebook, mainly for the messenger app, but when I saw the entire Internet rally against poor Zuckerberg (well, he is anything but poor, but you get the gist) I smirked, all superior and shit, because the idea was ludicrous. Not the idea that Facebook had a major impact on people's behavior, that one is totally true, but that this is a recent event, brought on by technology run amok and without checks.

Mass media is one of the pillars of American democracy, it has always swayed people one way or the other. The balance doesn't come because media is true to facts but because, like any other form of power, it is wielded by both sides equally. Facebook and the whole of Internet is just a distillation of that and when you distill shit you get... 3-methylindole - perhaps a bad metaphor. What I mean is that media has always been crap, it has always had an agenda and it was always under the control of people with power. Guttenberg himself was after all a goldsmith with political connections trying to satisfy investors. The Internet just gives you more granularity, more people to contribute in drowning facts into a sea of personal opinion.

We are not in the age of post truth, we are getting closer and closer to the actual truth which, as always, is not pretty, is not nice, is not politically correct. Instead it is painful, humbling and devastating. The truth, dear *the people*, is that this form of democracy is the worse political system that you can stomach while still functioning economically, mass media is not a pillar of anything, just another form of deadly power, and that when this political system that you wallow in turns its wheels you get people like Trump and Hillary Clinton representing you. And it's fine, because no one that is actually like you will ever reach a position of true power in any system. Normal people do not crave power, but comfort and security. Not even happiness.

So get over it, because after post-truth comes truth: you will forget about all the outrage in the customary six months and then everything will go #BackToNormal.

Saturday, 17 December 2016

Incidentul Bucovina

Nu, nu este vorba despre vacanta mea cu nevasta-mea in Bucovina, desi e funny si aia, ci despre apa Bucovina si disparitia bidoanelor de 5 litri din magazine.

In primul rind, hai sa dam sarci pe net. Dezamagire totala. Daca nu e ceva in engleza sau poate alta limba de larga circulatie internationala, Google esueaza lamentabil. Toate rezultatele sint ori magazinele mari care vind apa online ori despre altceva cu totul. Pasul doi: sosial midia. Putin mai mult succes, dar nu mult. Oamenii se lamenteaza de calitatea apei plate citind diverse articole panicarde cu cuvinte gen "colcaie" si "mizerie" in titlu. Cu toate astea Bucovina e pe primul loc la curatenie, chiar si in acele articole. Mai gasesti articole despre cum Apa Bucovina a fost cumparata de niste polonezi.

Desigur, putem merge direct la pagina Facebook a firmei, unde mai multe persoane s-au plins de lipsa bidoanelor. Din pacate, la orice intrebare se raspunde STAS cu:

Mai nou au bagat mai multe detalii:

Pe bune? Intii cineva din alta tara sau planeta a cerut multa apa Bucovina si nu a mai ramas ca sa ajunga si in Bucuresti, apoi problema s-a rezolvat, erau ei mai setosi si acum s-au potolit, asa ca putem sa luam iar apa de la jumatea lunii Decembrie. Dar sintem in 17 si tot nimic. E timpul sa scoatem armele babane: zvonurile!

La un magazin de linga mine cineva cu "surse" mi-a spus ca inchid linia de bidoane de 5L. Alt "informat" imi spune ca de fapt doar inlocuiesc aparatajul. Cineva care "se pricepe" spune ca asa "se face", intii scoti produsul de pe piata ca sa se vada cit de dorit este, apoi se introduce un inlocuitor, deobicei mai ieftin de produs, mai scump la cumparat si mai slab ca si calitate.

Acum ce sa cred? Daca era doar o chestie de imbunatatire hardware, de ce nu ar fi spus asta public? Daca inlocuiesc produsul cu altceva sau, mai rau, il scot de tot, de ce promit ca se va gasi din nou in magazine in aceeasi formula?

O ipoteza este ca aveau un surplus de sticle de 2L pe care nu le cumparau oamenii destul de repede. Alta ar fi ca s-ar fi intimplat ceva cu apa, o infestare cu ceva, o poluare cu ceva toxic sau dezgustator, si acum poti sa iei doar apa la sticle de 2L care a mai ramas, in timp ce ei incearca sa rezolve problema in tacere. Asa o fi, @apabucovina? Lasati zvonul asta sa se raspindeasca pina nu mai cumpara nimeni apa?

In mod clar, mai multi factori se intrunesc aici ca sa ne faca viata mizerabila (pardon da pan). Intii e lipsa de transparenta a producatorului. Era asa de greu sa fie sincer si direct? Apoi lipsa de profesionalism a jurnalistilor romani, atit de preocupati de ce fund sta pe ce scaun si ce gura vomeaza ce in politica, incit uita de lucruri de genul asta. In final, Americocentrismul internetului, care ne aduce doar informatii despre ce filme mai apar pe marile ecrane.

Pai atunci cum sa ne se umple tara de nationalijdi care beau apa de la chiuveta si nu ii intereseaza decit de ale lor?

Saturday, 10 December 2016

Well organized ways of doing software development

I am not the manager type: I do software because I like programming. Yet lately I found myself increasingly be the go-to guy for development processes. And I surprisingly knew of what people were asking of me. And it's true, no matter how crappy your jobs are and how little you get to do something truly impressive or challenging, you always gain ... operational experience. I've been working in almost every software environment possible: chaotic, amateurish, corporate, agile, state and everything in between. Therefore I decided to write a blog post about what I thought worked in the development process, rather than talking about code.

First things first


The first thing I need to say is that process, no matter which one you choose, will never work without someone to manage it and perhaps even enforce it. Even a team that agrees unanimously on how they are going to organize the work will slack off eventually. Process is never pleasant to define and follow to the letter, but it is absolutely necessary. In Scrum, for example, they defined the role of Scrum Master exactly for this reason. Even knowing fully that Scrum helps the team, developers and managers alike would cut corners and mess it all up.

The second thing I feel important to put out there before I describe a working process is that doing process for the sake of process is just stupid. The right way to think about it is like playing a game. The best way to play it may be determined by analyzing its rules and planning the actions that will benefit the player most, but no one will ever play the game if it's not fun. Therefore the first thing when determining your process is to make it work for your exact team. The second is to follow the plan as well as possible, because you know and your team knows that is the best way to achieve your goals.

A process


From my personal experience, the best way to work in a team was when we were using Scrum. I have no experience with Extreme Programming or Kanban, though, so I can't tell you which one is "best". As noted above, the best is determined by your team composition, not by objective parameters. For me Scrum was best for two main reasons.

Planning was done within the team, meaning that we both estimated and discussed the issues of the following sprint together. There was no one out of the loop, unless they chose to not pay attention, and it was always clear what we were meant to do. As with every well implemented Scrum process, we had all the development steps for our tasks done during the same sprint: planning, development, unit testing, code review, testing, demo, refactoring and documenting. At the end of the sprint we were confident we did a good job (or aborted the sprint) and that we had delivered something, no matter how trivial, and showed it working. This might not seem so important, but it is, for morale reasons. At the end of the sprint the team was seeing the fruits of their labor. So this is one reason: planned things together, made them work, showed them working and saw the results of our labor and got feedback for it.

The second reason is specific to Scrum because Scrum sprints are timeboxed. You either do what you planned to do in that time or you abort the sprint. This might sound like some pedantic implementation of a process that doesn't take into account the realities of software development, but it's not. Leaving aside the important point that one cannot judge results of work that changed direction since planning, timeboxing makes it difficult and painful to change things from the original planning. That's the main reason for it, in my opinion, and forces the team to work together from beginning to end. The product owner will not laze out and give you a half thought feature to implement because they know it will bite them in the ass if the sprint aborts midway. The developers will not postpone things because they feel they have other things more important to implement. The managers attached to the project will get that nice report they all dream about: starting with clear goals and deadlines and finishing with demonstrable results.

Personally, I also like timeboxing because I get some stuff to do and when I get it done I can do whatever the hell I want. There is no pressure to work withing a fixed daily schedule and encourages business owners to think outside the box, leave people working in their own ways as long as they produce the required results. I liked the philosophy of Scrum so much that I even considered using it for myself, meaning a Scrum of one person, working for my own backlog. It worked for a while quite well, but then I cut corners and slowly abandoned it. As I said before, you need someone to account for the proper implementation of the process. I didn't have it in me to be that person for myself. Not schizoid enough, I guess. Or my other personalities just suck.

Tooling


Process can be improved significantly by tooling. The friction caused by having to follows rules (any rules) can be lessened a lot by computer programs that by definition do just that: follow rules. There is a software developer credo that says "automate everything that you can", meaning you let the computer do as much as possible for you, the rest being what you use your time for. Do the same for your development team. Use the proper tools for the issue at hand.

A short list of tools that I liked working with:
  • Microsoft Visual Studio - great IDE for .NET and web development
  • JetBrains ReSharper - falling in and out of love with it on a regular basis, it feels like magic in the hands of a developer. It also helps with a lot of the administrative part of development like company code practices, code consistency, overall project analysis and refactoring.
  • Atlassian JIRA - "The #1 software development tool used by agile teams". It helps with tasks, management, bugs, reporting. It's web based and blends naturally with collaborative work for teams of any size. It allows for both project management and bug tracking, which is great.
  • Atlassian Confluence - an online document repository. It works like a wiki, with easy to use interface for creating documents and a lot of addons for all kinds of things.
  • Smartbear Code Collaborator - it makes code reviews easy to create, track and integrate.
  • Version control system - this is not a product, it's a class of software. I didn't want to recommend one because people are very attached and vocal about their source control software. I worked with Perforce, SVN, Git, etc. Some are better than others, but it's a long discussion not suitable for this post. Having source control, though, is necessary for any development project.
  • Some chat system - one might think this is trivial, but it's not. People need to be able to communicate instantly without shouting over their desks. Enough said.
  • Jenkins - source automation server. It manages stuff like deployment, but also helps with Continuous Integration and Delivery. I liked it because I worked with it, didn't have to make it work myself. It's written in Java, after all :)
  • Microsoft Exchange - used not only for emails, but also for planning of meetings between people.
  • Notepad - not kidding. With all those wonderful collaborative tools I described above it is easy to forget that individuals too need to plan their work. I found it very helpful to always split my work into little things to do which I write in a text file, then follow the plan. It helps a lot when someone interrupts you from your work and then you need to know where you left off.

This is by no means a comprehensive "all you ever need" list, just some of the tools that I found really useful. However, just paying for them and installing them is not enough. They need to work together. And that's where you absolutely need a role, a person who will take care of all the software, bind them together like the ring of Sauron, and make them seamless for your team which, after all, has people with other stuff to concern themselves with.

My experience


At the beginning at the sprint we would have a discussion on stories that we would like to address in it. Stories are descriptions of features that need to be implemented. They start as business cases "As [some role], I want [something] in order to achieve [some goal]" and together with the product owner (the person in your team that is closest to the client and represents their interests) the whole team discusses overall details and splits it into tasks of various general complexities. About here the team has a rough idea of what will be part of the sprint. Documents are created with the story specifications (using Confluence, for example).

Tasks are technical in nature, so only the technical part of the team, like developers and testers, continue to discuss them. The important part here is to determine the implementation details and estimate complexity in actual time. At the end of this meeting, stories are split into tasks that have people attached to them and are estimated in hours. Now the team can say with some semblance of certainty what tasks can and which cannot be part of the sprint, taking into consideration the skill of the people in the team and their availability in the coming sprint.

Now, with the sprint set, you start work. Developers are writing technical briefs on the desired implementation, eventually the changes, comments, difficulties encountered, etc. (also using Confluence). For each feature brief you will have a technical brief, split into the tasks above. Why waste time with writing documentation while you are developing? Because whenever someone comes and asks what that feature is about, you send them a link. Because whenever you want to move the project to another team, they only have to go to one place and see what has been planned, developed and what is the status of work. The testing team also writes documents on how to proceed with testing. Testing documents will be reviewed by developers and only when that is done, the testing can proceed. Testing documents may seem a bit much, but with this any tester can just do any of the work. With a clear plan, you can take a random person and ask them to test your software. I am not trying to minimize the importance of professional testing skills, but some times you actually want people that are not associated with the development to do the testing. They will note any friction in using your software exactly because they are not close to you and have never used your product before. Also, an important use of testing documents is creating unit tests, which is a developer task.

When a piece of code has been submitted to source control, it must be reviewed (Code Collab) and the deployment framework (Jenkins) will create a task for code review (Code Collab) automate a deploy and run all unit tests and static analysis. If any of them fail, a task will be created in the task manager (Jira). When testing finds bugs, they create items in the bug tracker (Jira). Developers will go over the tasks in Jira and fix what needs fixing. Email will be sent to notify people of new tasks, comments or changes in task status. Meetings to smooth things over and discuss things that are not clear will be created in the meeting software (Exchange), where conflicts will be clear to see and solve. BTW, the room where the meeting is to take place also needs to be tracked, so that meetings don't overlap in time and space.

Wait, you will say, but you are talking more of writing documents and going to meetings and less of software writing. Surely this is a complete waste of time!

No one reasonably expects to work the entire time writing code or testing or whatever their main responsibility is. Some companies assume developers will write code only four hours out of eight. The rest is reserved for planning, meetings, writing documentation and other administrative tasks. Will you write twice as much if you code for eight hours a day? Sure. Will anyone remember what you wrote in two weeks time? No. So the issue is not if you are writing, reading documents and going to meetings instead of coding, but if you are doing all that instead of trying to figure out what the code wanted to do, what it actually does and who the hell wrote that crap (Oh, it was me, shit!). Frankly, in the long run, having a clear picture of what you want and what the entire team did wins big time.

Also please take note on how software integrates with everything to make things better for you. At my current place of work they use the same software, but it is not integrated. People create code reviews if they feel like it, bugs are not related to commits, documentation is vague and out of date, unit tests are run if someone remembers to run them and then no one does because some of them fail and no one remembers why or even wants to take a look. Tooling is nothing if it doesn't help you, if it doesn't work well together. Automate everything!

What I described above worked very well. Were people complaining about too many meetings? Yes. Was the Scrum Master stopping every second discussion in daily meetings because they were meandering out of scope? Sure. Was everybody hating said Scrum Master for interrupting them when saying important stuff then making them listen to all that testing and design crap? Of course. Ignore for a moment you will never get three people together and not have one complain. The complaints were sometimes spot on. As the person responsible for the process you must continuously adapt it to needs. Everybody in an agile team needs to be agile. Sometimes you need to remove or shorten a step, then restore them to their original and then back again. You will never have all needs satisfied at the same time, but you need to take care of the ones that are most important (not necessarily more urgent).

For example, I experienced this process for a web software project that had one hundred thousand customers that created web sites for millions of people that would visit them. There were over seventy people working on it. The product was solid and in constant use for years and years. Having a clear understanding of what is going on in it was very important. Your project might be different. A small tool that you use internally doesn't need as much testing, since bugs will appear with use. It does need documentation, so people know how to use it. A game that you expect people to play en masse for a year then forget about it also has some other requirements.

For me, the question for all cases is the same: "What would you rather be doing?". The process needs to take into account the options and choose the best ones. Would you rather have a shitty product that sells quickly and then is thrown away? Frankly, some people reply wholeheartedly yes, but that's because they would rather work less and earn more. But would you be rather reading code written by someone else and trying to constantly merge your development style with what you think they wanted to do there or reading a document that explains clearly what has been done and why? Would you rather expect people to code review when they feel like it or whenever something is put in source control? Answer this question for all members of the team and you will clearly see what needs to be done. The hard part will be to sell it to your boss, as always.

Daily meeting may not be that important if all the tasks take more than three days, but then you have to ask yourself why do tasks take so long. Could you have split development more finely? Aftermath or postmortem meetings are also essential. In them you discuss what went well and what went wrong in the previous sprint. It gives you the important feedback you need in order to adapt the process to your team and to always, always improve.

Development


I mentioned Continuous Integration above. It's the part that takes every bit of code committed, deploys that changelist, and runs all unit tests to see if anything fails. At the end you either have all green or the person that committed the code is notified that they broke something. Not only it gives more confidence for the code written, but it also promotes unit testing, which in turn promotes a modular way of writing code. When you know you will write unit tests for all written code you will plan that code differently. You will not make methods depend on the current time, you will take all dependencies and plug them in your class without hardcoding them inside. Modularity promotes code that is easy to understand by itself, easy to read, easy to maintain and modify.

Documenting work before it starts makes it easy to find any issues before code writing even began, it makes for easy development of tasks that are implemented one after the other. Reviewing test plans makes developers see their work from a different perspective and have those nice "oh, I didn't think of that" moments.

And code review. If you need to cut as much as possible from process, remove everything else before you abandon code review. Having people read your code and shit all over it may seem cruel and unusual punishment, but it helps a lot. You cannot claim to have easy to understand code if no one reads it. And then there are those moments when you are too tired to write good code. "Ah, it works just the same" you think, while you vomit some substandard code and claim the task completed. Later on, when you are all rested, you will look at the code and wonder "What was I thinking?". If someone else writes something exactly like that you will scold them fiercely for being lazy and not elegant. People don't crap all over you because they hate you, but because they see problems in your code. You will take that constructive criticism and become a better developer, a better team player and a better worker for it.

Conclusions


In this post I tried to show you less how things ought to be, but how things can be. It must be your choice to change things and how to do it. Personally I enjoyed a lot this kind of organization that I felt was freeing me from all the burdens of daily work except the actual software writing that I love doing. From the managerial standpoint it also makes sense to never waste someone's skills for anything else than what they are good at. I was amazed to see how people not only not work like this, but a lot of them don't even knew this was possible and some even outright reject it.

In the way of criticism I've heard a lot of "People before process" and "We want to work here" and "We are here to have fun" and other affirmations that only show how people don't really get it. I've stated many a time above that the process needs to adapt to people; yes, it is very important and it must be enforced, but with the needs of the team put always first. This process speeds up, makes more clear and improves the work, it doesn't stop or delay it. As for fun, working with clear goals in mind, knowing you have the approval of everyone in the team and most wrinkles in the design have been smoothed out before you even began writing code is nothing but fun. It's like a beautiful woman with an instruction manual!

However, just because process can help you, doesn't mean it actually does. It must be implemented properly. It is extremely easy to fall into the "Process over people" trap, for example, and then the criticism above is not only right, but pertinent. You need to do something about it. It doesn't help that process improves productivity ten fold if all the people in the team are malcontent and work a hundred times slower just because you give them no alternative.

Another pitfall is to conceive the most beautiful process in the world, make all agile teams be productive and happy, then fail to synchronize them. In other words you will have several teams that all work as one, but the group of teams is itself not a team. That's why we had something called the Scrum of Scrums, when representatives from each team would meet and discuss goals and results.

Now ask yourself, what would you rather be doing?

Friday, 9 December 2016

They are here!

The machines are here. They look like us, they walk like us, they speak like us, but they are not like us. Open your eyes and carefully look around, search for the suspect, for the out of place. Their actions give them away.

They walk to their destination if it is more efficient than driving. They always take the same route to get there, too, once they found out which one is the best. They are courteous for no reason, never get angry - unless it serves their nefarious purpose, they don't swear or do meaningless things. You will never see a machine throwing garbage on the ground. They are obsessively clean and never smell of anything. Watch out for people helping others with no apparent goal. Are they real people? In the office they will say hello to you even early mornings, then get to work almost immediately. Whenever you interrupt them from their tasks they will gladly stop whatever they are doing and listen to your problems. Be wary of people that never complain, a clear indicator of their origin.

Don't get fooled by their deception. You will see machines at restaurants, eating and drinking, even going to the toilet. They are only maintaining appearances. See how they will not shout at the waiter even if served poorly, watch them take blame for not looking at the price on the menu before ordering or for spilling a drink. By the way, that's also an act. Their superior agility would never allow them to do anything by accident. They are clever, but don't let them outsmart you. Some will appear to enjoy art, like paintings or sculptures or classical music, but most will have adapted and fake enjoying normal things like movies. They will be the ones that you will not notice in the cinema hall. They will not use their mobile devices, they will not talk over the movie, they will rarely eat anything from the entrance shop, but when they do they do it quietly. It's always the quiet ones.

Couples, even accompanied by children or pets, may be machines. Their children will be uncharacteristically mild mannered and well behaved. Their dogs will not bark or try to bite in anger, and their poop will be collected and thrown in the garbage rather than left behind. They all can be machines. The way they blend in our society is so complete and subtle that you will see people living together with machines and not know it. But you can still recognize them by the way they considerately care for the other person, even after long years of companionship. They don't seem to grasp the concept of getting fed up with another living being.

Their greatest trick, though, is behaving as they have our well being at heart. They do not. Slowly, subtly, underhandedly, they change the world and take away our humanity, turn us into soulless beings like them. Wake up! Do not be fooled! Rise up and destroy them all before it is too late!

Saturday, 3 December 2016

Dependency Injection, Inversion of Control, Testability and other nice things

I am mentally preparing for giving a talk about dependency injection and inversion of control and how are they important, so I intend to clarify my thoughts on the blog first. This has been spurred by seeing how so many talented and even experienced programmers don't really understand the concepts and why they should use them. I also intend to briefly explore these concepts in the context of programming languages other than C#.

And yes, I know I've started an ASP.Net MVC exploration series and stopped midway, and I truly intend to continue it, it's just that this is more urgent.

Head on intro


So, instead of going to the definitions, let me give you some examples, instead.
public class MyClass {
public IEnumerable<string> GetData() {
var provider=new StringDataProvider();
var data=provider.GetStringsNewerThan(DateTime.Now-TimeSpan.FromHours(1));
return data;
}
}
In this piece of code I create a class that has a method that gets some text. That's why I use a StringDataProvider, because I want to be provided with string data. I named my class so that it describes as best as possible what it intends to do, yet that descriptiveness is getting lost up the chain when my method is called just GetData. It is called so because it is the data that I need in the context of MyClass, which may not care, for example, that it is in string format. Maybe MyClass just displays enumerations of objects. Another issue with this is that it hides the date and time parameter that I pass in the method. I am getting string data, but not all of it, just for the last hour. Functionally, this will work fine: task complete, you can move to the next. Yet it has some nagging issues.

Dependency Injection


Let me show you the same piece of code, written with dependency injection in mind:
public class MyClass {
private IDataProvider _dataProvider;
private IDateTimeProvider _dateTimeProvider;

public void MyClass(IDataProvider dataProvider, IDateTimeProvider dateTimeProvider) {
this._dataProvider=dataProvider;
this._dateTimeProvider=dateTimeProvider;
}

public IEnumerable<string> GetData() {
var oneHourBefore=_dateTimeProvider.Now-TimeSpan.FromHours(1);
var data=_dataProvider.GetDataNewerThan(oneHourBefore);
return data;
}
}
A lot more code, but it solves several issues while introducing so many benefits that I wonder why people don't code like this from the get go.

Let's analyse this for a bit. First of all I introduce a constructor to MyClass, one that accepts and caches two parameters. They are not class types, but interfaces, which declare the intention for any class implementing them. The method then does the same thing as in the original example, using the providers it cached. Now, when I write the code of the class I don't actually need to have any provider implementation. I just declare what I need and worry about it later. I also don't need to inject real providers, I can mock them so that I can test my class as standalone. Note that the previous implementation of the class would have returned different data based on the system time and I had no way to control that behavior. The best benefit, for me, is that now the class is really descriptive. It almost reads like English: "Hi, folks, I am a class that needs someone to give me some data and the time of day and I will give you some processed data in return!". The rule of thumb is that for each method, external factors that may influence its behavior must be abstracted away. In our case if the date time provider provides the same time and the data provider the same data, the effect of the method is always the same.

Note that the interface I used was not IStringDataProvider, but IDataProvider. I don't really care, in my class, that the data is a bunch of strings. There is something called the Single Responsibility Principle, which says that a class or a method or some sort of unit of computation should try to only have one responsibility. If you change that code, it should only affect one area. Now, real life is a little different and classes do many things in many directions, yet they can implement any number of interfaces. The interfaces themselves can declare only one responsibility, which is why this is so nice. I don't actually have to have a class that is only a data provider, but in the context of my class, I only need that part and I am clearly declaring my intent in the code.

This here is called dependency injection, which is a fancy expression for saying "my code receives all third party instances as parameters". It is also in line with the Single Responsibility Principle, as now your class doesn't have to carry the responsibility of knowing how to instantiate the classes it needs. It makes the code more modular, easier to test, more legible and more maintainable.

But there is a problem. While before I was using something like new MyClass().GetData(), now I have to push the instantiation of the providers somewhere up the stream and do maybe something like this:
var dataProvider=new StringDataProvider();
var dateTimeProvider=new DateTimeProvider();
var myClass=new MyClass(dataProvider,dateTimeProvider);
myClass.GetData();
The apparent gains were all for naught! I just pushed the same ugly code somewhere else. But here is where Inversion of Control comes in. What if you never need to instantiate anything again? What it you never actually had to write any new Something() code?

Inversion of Control


Inversion of Control actually takes over the responsibility of creating instances from you. With it, you might get this code instead:
public interface IMyClass {
IEnumerable<string> GetData();
}

public class MyClass:IMyClass {
private IDataProvider _dataProvider;
private IDateTimeProvider _dateTimeProvider;

public void MyClass(IDataProvider dataProvider, IDateTimeProvider dateTimeProvider) {
this._dataProvider=dataProvider;
this._dateTimeProvider=dateTimeProvider;
}

public IEnumerable<string> GetData() {
var oneHourBefore=_dateTimeProvider.Now-TimeSpan.FromHours(1);
var data=_dataProvider.GetDataNewerThan(oneHourBefore);
return data;
}
}
Note that I created an interface for MyClass to implement, one that declares my GetData method. Now, to use it, I could write something like this:
var myClass=Dependency.Get<IMyClass>();
myClass.GetData();

Wow! What happened here? I just used a magical class called Dependency that gets me an instance of IMyClass. And I really don't care how it does it. It can discover implementations by itself or maybe I am manually binding interfaces to implementations when the application starts (for example Dependency.Bind<IMyClass,MyClass>();). When it needs to create a new MyClass it automatically sees that it needs two other interfaces as parameters, so it gets implementations for those first and continues up the chain. It is called a dependency chain and the container will go through it all to simply "Get" you what you need. There are many inversion of control frameworks out there, but the concept is so simple that one can make their own easily.

And I get another benefit: if I want to display some other type of data, all I have to do is instruct the dependency container that I want another implementation for the interface. I can even think about versioning: take a class that I know does the job and compare it with a new implementation of the same interface. I can tell it to use different versions based on the client used. And all of this in exactly one place: the dependency container bindings. You may want to plug different implementations provided by third parties and all they have to care about is respecting the contract in your interface.



Solution structure


This way of writing code forces some changes in the structure of your projects. If all you have is written in a single project, you don't care, but if you want to split your work in several libraries, you have to take into account that interfaces need to be referenced by almost everything, including third party modules that you want to plug. That means the interfaces need their own library. Yet in order to declare the interfaces, you need access to all the data objects that their members need, so your Interfaces project needs to reference all the projects with data objects in them. And that means that your logic will be separated from your data objects in order to avoid circular dependencies. The only project that will probably need to go deeper will be the unit and integration test project.

Bottom line: in order to implement this painlessly, you need an Entities library, containing data objects, then an Interfaces library, containing the interfaces you need and, maybe, the dependency container mechanism, if you don't put it in yet another library. All the logic needs to be in other projects. And that brings us to a nice side effect: the only connection between logic modules is done via abstractions like interfaces and simple data containers. You can now substitute one library with another without actually caring about the rest. The unit tests will work just the same, the application will function just the same and functionality can be both encapsulated and programatically described.

There is a drawback to this. Whenever you need to see how some method is implemented and you navigate to definition, you will often reach the interface declaration, which tells you nothing. You then need to find classes that implement the interface or to search for uses of the interface method to find implementations. Even so, I would say that this is an IDE problem, not a dependency injection issue.

Other points of view


Now, the intro above describes what I understand by dependency injection and inversion of control. The official definition of Dependency Injection claims it is a subset of Inversion of Control, not a separate thing.

For example, Martin Fowler says that when he and his fellow software pattern creators thought of it, they called it Inversion of Control, but they decided that it was too broad a term, so they moved to calling it Dependency Injection. That seems strange to me, since I can describe situations where dependencies are injected, or at least passed around, but they are manually instantiated, or situations where the creation of instances is out of the control of the developer, but no dependencies are passed around. He seems to see both as one thing. On the other hand, the pattern where dependencies are injected by constructor, property setters or weird implementation of yet another set of interfaces (which he calls Dependency Injection) is different from Service Locator, where you specifically ask for a type of service.

Wikipedia says that Dependency Injection is a software pattern which implements Inversion of Control to resolve dependencies, while it calls Inversion of Control a design principle (so, not a pattern?) in which custom-written portions of a computer program receive the flow of control from a generic framework. It even goes so far as to say Dependency Injection is a specific type of Inversion of Control. Anyway, the pages there seem to follow the general definitions that Martin Fowler does, which pits Dependency Injection versus Service Locator.

On StackOverflow a very well viewed answer sees dependency injection as "giving an object its instance variables". I tend to agree. I also liked another answer below that said "DI is very much like the classic avoiding of hardcoded constants in the code." It makes one think of a variable as an abstraction for values of a certain type. Same page holds another interesting view: "Dependency Injection and dependency Injection Containers are different things: Dependency Injection is a method for writing better code, a DI Container is a tool to help injecting dependencies. You don't need a container to do dependency injection. However a container can help you."

Another StackOverflow question has tons of answers explaining how Dependency Injection is a particular case of Inversion of Control. They all seem to have read Fowler before answering, though.

A CodeProject article explains how Dependency Injection is just a flavor of Inversion of Control, others being Service Locator, Events, Delegates, etc.

Composition over inheritance, convention over configuration


An interesting side effect of this drastic decoupling of code is that it promotes composition over inheritance. Let's face it: inheritance was supposed to solve all of humanity's problems and it failed. You either have an endless chain of classes inheriting from each other from which you usually use only one or two or you get misguided attempts to allow inheritance from multiple sources which complicates understanding of what does what. Instead interfaces have become more widespread, as declarations of intent, while composition has provided more of what inheritance started off as promising. And what is dependency injection if not a sort of composition? In the intro example we compose a date time provider and a data provider into a time aware data provider, all the time while the actors in this composition need to know nothing else than the contracts each part must abide by. Do that same thing with other implementations and you get a different result. I will go as far as to say that inheritance defines what classes are, while composition defines what classes do, which is what matters in the end.

Another interesting effect is the wider adoption of convention over configuration. For example you can find the default implementation of an interface as the class that implements it and has the same name minus the preceding "I". Rather than explicitly tell the framework that we want to use the Manager class each time someone needs an IManager implementation, it can figure it out for itself by naming alone. This would never work if the responsibility of getting class instances resided with each method using them.

Real life examples


Simple Injector


If you look on the Internet, one of the first dependency injection frameworks you find for .Net is Simple Injector, which works on every flavor of .Net including Mono and Core. It's as easy to use as installing the NuGet package and doing something like this:
// 1. Create a new Simple Injector container
var container = new Container();

// 2. Configure the container (register)
container.Register<IUserRepository, SqlUserRepository>(Lifestyle.Transient);
container.Register<ILogger, MailLogger>(Lifestyle.Singleton);

// 3. Optionally verify the container's configuration.
container.Verify();

// 4. Get the implementation by type
IUserService service = container.GetInstance<IUserService>();

ASP.Net Core


ASP.Net Core has dependency injection built in. You configure your bindings in ConfigureServices:
public void ConfigureServices(IServiceCollection svcs)
{
svcs.AddSingleton(_config);

if (_env.IsDevelopment())
{
svcs.AddTransient<IMailService, LoggingMailService>();
}
else
{
svcs.AddTransient<IMailService, MailService>();
}

svcs.AddDbContext<WilderContext>(ServiceLifetime.Scoped);

// ...
}
then you use any of the registered classes and interfaces as constructor parameters for controllers or even using them as method parameters (see FromServicesAttribute)

Managed Extensibility Framework


MEF is a big beast of a framework, but it can simplify a lot of work you would have to do to glue things together, especially in extensibility scenarios. Typically one would use attributes to declare which interface something "exports" and then use other attributes to "import" implementations in properties and values. All you need to do is put them in the same place. Something like this:
[Export(typeof(ICalculator))]
class SimpleCalculator : ICalculator {
//...
}

class Program {

[Import(typeof(ICalculator))]
public ICalculator calculator;

// do something with calculator
}
Of course, in order for this to work seamlessly you need stuff like this, as well:
private Program()
{
//An aggregate catalog that combines multiple catalogs
var catalog = new AggregateCatalog();
//Adds all the parts found in the same assembly as the Program class
catalog.Catalogs.Add(new AssemblyCatalog(typeof(Program).Assembly));
catalog.Catalogs.Add(new DirectoryCatalog("C:\\Users\\SomeUser\\Documents\\Visual Studio 2010\\Projects\\SimpleCalculator3\\SimpleCalculator3\\Extensions"));


//Create the CompositionContainer with the parts in the catalog
_container = new CompositionContainer(catalog);

//Fill the imports of this object
try
{
this._container.ComposeParts(this);
}
catch (CompositionException compositionException)
{
Console.WriteLine(compositionException.ToString());
}
}

Dependency Injection in other languages


Admit it, C# is great, but it is not by far the most used computer language. That place is reserved, at least for now, for Javascript. Not only is it untyped and dynamic, but Javascript isn't even a class inheritance language. It uses the so called prototype inheritance, which uses an instance of an object attached to a type to provide default values for the instance of said type. I know, it sounds confusing and it is, but what is important is that it has no concept of interfaces or reflection. So while it is trivial to create a dictionary of instances (or functions that create instances) of objects which you could then use to get what you need by using a string key (something like var manager=Dependency.Get('IManager');, for example) it is difficult to imagine how one could go through the entire chain of dependencies to create objects that need other objects.

And yet this is done, by AngularJs, RequireJs or any number of modern Javascript frameworks. The secret? Using regular expressions to determine the parameters needed for a constructor function after turning it to string. It's complicated and beyond the scope of this blog post, but take a look at this StackOverflow question and its answers to understand how it's done.

Let me show you an example from AngularJs:
angular.module('myModule', [])
.directive('directiveName', ['depService', function(depService) {
// ...
}])
In this case the key/type of the service is explicit using an array notation that says "this is the list of parameters that the dependency injector needs to give to the function", but this might be have been written just as the function:
angular.module('myModule', [])
.directive('directiveName', function(depService) {
// ...
})
In this case Angular would use the regular expression approach on the function string.


What about other languages? Java is very much like C# and the concepts there are similar. Even if all are flavors of C, C++ is very different, yet Dependency Injection can be achieved. I am not a C++ developer, so I can't tell you much about that, but take a look at this StackOverflow question and answers; it is claimed that there is no one method, but many that can be used to do dependency injection in C++.

In fact, the only languages I can think of that can't do dependency injection are silly ones like SQL. Since you cannot (reasonably) define your own types or pass functions along, the concept makes no sense. Even so, one can imagine creating dummy stored procedures that other stored procedures would use in order to be tested. There is no reason why you wouldn't use dependency injection if the language allows for it.

Testability


I mentioned briefly unit testing. Dependency Injection works hand in hand with automated testing. Given that the practice creates modules of software that give reproducible results for the same inputs and account for all the inputs, testing becomes a breeze. Let me give you some examples using Moq, a mocking library for .Net:
var dateTimeMock=new Mock<IDateTimeProvider>();
dateTimeMock
.Setup(m=>m.Now)
.Returns(new DateTime(2016,12,03));

var dataMock=new Mock<IDataProvider>();
dataMock
.Setup(m=>m.GetDataNewerThan(It.IsAny<DateTime>()))
.Returns(new[] { "test","data" });

var testClass=new MyClass(dateTimeMock.Object, dataMock.Object);

var result=testClass.GetData();
AssertDeepEqual(result,new[] { "test","data" });

First of all, I take care of all dependencies. I create a "mock" for each of them and I "set up" the methods or property setters/getters that interest me. I don't really need to set up the date time mock for Now, since the data from the data provider is always the same no matter the parameter, but it's there for you to see how it's done. Second, I instantiate the class I want to test using the Object property of my mocks, which returns an object that implements the type given as a generic parameter in the constructor. Third I assert that the side effects of my call are the ones I expect. The mocks need to be as dumb as possible. If you feel you need to write code to define your mocks you are probably doing something wrong.

The type of the tests, for people who are not familiar with this concept, is usually a fully positive one - that is give full valid data and expect the correct result - followed by many negative ones, where the correct data is made incorrect in all possible ways and it is tested that the method fails. If there are many types of combinations of data that would be considered valid, you need a test for as many of them.

Note that the test is instantiating the test class directly, using the constructor. We are not testing the injector here, but the actual class.

Conclusions


What I appreciate most with Dependency Injection is that it forces you to write code that has clear boundaries defined by interfaces. Once this is achieved, you can go write your own stuff and not care about what other people do with theirs. You can test your modules without even caring if the rest of the project even exists. It allows to refactor code in steps and with a lot more confidence since you are covered by unit tests.

While some people work on fire-and-forget projects, like small games or utilities, and they don't care about maintainability, one of the most touted reasons for using unit tests and dependency injection, these practices bring so many other benefits that are almost impossible to get otherwise.

The entire point of this is reducing the complexity of dependencies, which include not only the modules in your application, but also the support frame for them, like people working on them. While some managers might not see the wisdom of reducing friction between software components, surely they can see the positive value of reducing friction between people.

There was one other topic that I wanted to touch, but it is both vast and I have not enough experience with it, however it feels very attractive to me: refactoring old code in order to use dependency injection. Best practices, how to make it safe enough and fast enough to make managers approve it and so on. Perhaps another post later on. I was thinking of a combination of static analysis and automated methods, like replacing all usages of "new" with a single point of instantiation, warning about static methods and properties, automatically replacing known bad practices like DateTime.Now and so on. It might be interesting, right?

I hope I wasn't too confusing and I appreciate any feedback you have. I will be working on a presentation file with similar content, so any help will go into doing a better job explaining it to others.

Wednesday, 23 November 2016

The perils of giving your data objects methods

A colleague of mine hit a strange bug today. It so happened that we use a bastardized dependency injection method that takes into account the WCF session before returning an implementation of an interface. In a piece of code the injection failed and we couldn't see why for a while. Let me give you a simplified version:
var someManager=Package.Get<IManager>();
var someDTOs=Cache.GetDatabaseObjects().Select(x=>x.Pack());

public class DataObject {
public string Data {get;set;}
public DataObjectDTO Pack() {
var anotherManager=Package.Get<IAnother>();
return new DataObjectDTO {
Data=anotherManager.Process(Data)
};
}
}

Package.Get will attempt to find a session object and if not it will use another mechanism, but if it finds one, it will only use it if it is not expired or invalid, else throwing an exception. This code failed in the Pack method, when trying to get an instance of IAnother. Please take a few moments to reflect on why (and no, it's not that between calls the session expired).

Show explanation

Monday, 21 November 2016

The expression being assigned to '[something] must be constant when using integers inside strings

I've stumbled upon a very funny exception today. Basically I was creating a constant string from adding some other constant strings to each other. And it worked. The moment I added an integer, though, I got The expression being assigned to 'Program.x2' must be constant. The code that generated this error is simple:
const string x2 = "string" + 2;
Note that
const string x2 = "string" + "2";
is perfectly valid. Got the same result when using VS2010 and VS2015, so it's not a compiler bug, it's intended behavior.

So, what's going on? Well, my code transforms behind the scenes into
const string x2 = "string" + 2.ToString();
which is not constant because of ToString!

The only way to solve it was to declare the numeric constant as string as well.

Sunday, 20 November 2016

Getting random rows from a table in T-SQL: TABLESAMPLE [instructional post, but not a recommended method]

This clause is so obscure that I couldn't even find the Microsoft reference page for it for a few minutes, so no wonder I didn't know about it. Introduced in SQL Server 2005, the TABLESAMPLE clause limits the number of rows returned from a table in the FROM clause to a sample number or PERCENT of rows.

Usage:
TABLESAMPLE (sample_number [ PERCENT | ROWS ] ) [ REPEATABLE (repeat_seed) ]

REPEATABLE is used to set the seed of the random number generator so one can get the same result if running the query again.

It sounds great at the beginning, until you start seeing the limitations:
  • it cannot be applied to derived tables, tables from linked servers, and tables derived from table-valued functions, rowset functions, or OPENXML
  • the number of rows returned is approximate. 10 ROWS doesn't necessarily return 10 records. In fact, the functionality underneath transforms 10 into a percentage, first
  • a join of two tables is likely to return a match for each row in both tables; however, if TABLESAMPLE is specified for either of the two tables, some rows returned from the unsampled table are unlikely to have a matching row in the sampled table.
  • it isn't even that random!

Funny enough, even the reference page recommends a different way of getting a random sample of rows from a table:
SELECT * FROM Sales.SalesOrderDetail
WHERE 0.01 >= CAST(CHECKSUM(NEWID(), SalesOrderID) & 0x7fffffff AS float) / CAST (0x7fffffff AS int)

Even if probably not really usable, at least I've learned something new about SQL.

Update:
More about getting random samples from a table here, where it explains why ORDER BY NEWID() is not the way to do it and gives hints of what really happens in the background when we invoke TABLESAMPLE.
Another interesting article on the subject, focused more on the statistical probability, can be found here, where it also shows how TABLESAMPLE's cluster sampling may fail in spectacular ways.

Sunday, 6 November 2016

Am I a good person?

I am often left dumbfounded by the motivations other people are assigning to my actions. Most of the time it is caused by their self-centeredness, their assumption that whatever I do is somehow related more to them than to me. And it made me think: am I a good/bad person, or is it all a matter of perception from others?

I rarely feel like I do something out of the ordinary for other people; instead I do it because that's who I am. I help a colleague because I like to help or I refuse to do so because I feel that what I am doing is more important. Same with friends or romantic relationships. Sometimes I need to make an effort to do something, but it's still my choice, my assessment of the situation and my decision to go a certain way. It's not a value judgment on the person, it's not an asshole move or some out of my way effort to improve their life. What I do IS me.

It's also a weird direction of reasoning, since I am aware of the physical impossibility for "free will" and I subscribe to the school of thought that it is all an illusion. I mean, logic dictates that either the world works top-bottom, with some central power of will trickling down reality or it is merely a manifestation of low level forces and laws of physics that lead inexorably towards the reality we perceive. In other words, if you believe in free will, you have to believe in some sort of god, and I don't. Yet living my life as if I have no free will makes no sense either. I need to play the game if I am to play the game. It's kind of circular.

Getting back to my original question: Isn't good or bad just a label I (and other people) assign to a pattern of behavior that belongs to me? And not before I do things, but always afterwards. Just like the illusion of free will there is the illusion of moral quality that guides my path. While one cannot quantify free will, they can measure the effect my behavior has on their life and goals and determine a value. But then is my "goodness" something like an average? Because then it would be more important the number of people I am affecting, rather than the absolute value of the effect per person. Who cares I help a colleague or pay attention to my wife? In the big sea of people, I am just a small fish that affects a few other small fish. We could all die tomorrow in the belly of a whale, all that goodness pointless.

So here I am, asking essentially a "who am I" question - painfully aware it has no final answer - in a world I think is determined by tiny laws of physics that create the illusion of self and with a quantity of consequence that is irrelevant even if it weren't so. I am torturing myself for no good reason, ain't I?

Yet the essence of the question still intrigues me. Is it necessary that I feel a good drive for my actions to be a good person, or is it a posterior calculation of their effect that determines that? If I work really well and fast for a month and then I do less the next, is it that I did good work in the first month or that I am a lazy bastard in the second? If I pay attention to someone or make a nice gesture, is it something to be lauded, or something to be criticized when I don't do it all the time? Is this a statistical problem or an issue of causality?

And I have to ask this question because if I feel no particular drive to do something and just "am myself", I don't think people should assign all kind of stupid motivations to my actions. And if I need to make this sustained effort to go outside my routine just to gain moral value... well, it just feels like a lot of bother. And I have to ask it because the same reasoning can be applied to other people. Is my father making terrible efforts to take care of just about everybody in his life, making him some sort of saint, or is it just what he does and can't help himself, in which case he's just a regular dude?

Personally I feel that I am just an amalgamation of experiences that led to the way I behave. I am neither good nor evil and my actions define me more than my intentions. While there is some sort of consistency that can be statistically assessed, it is highly dependent on the environment and any inference would go down the drain the moment that environment changes. But then, how can I be a good person? And does it even matter?

Saturday, 29 October 2016

Controlling JSON serialization in .Net Core Web API (Serialize enum values as strings, not integers)

.Net Core Web API uses Newtonsoft's Json.NET to do JSON serialization and for other cases where you wanted to control Json.NET options you would do something like
JsonConvert.DefaultSettings = (() =>
{
var settings = new JsonSerializerSettings();
// do something with settings
return settings;
});
, but in this case it doesn't work. The way to do it is to use the fluent interface method and hook yourself in the ConfigureServices(IServiceCollection services) method, after the call to .AddMvc(), like this:
services
.AddMvc()
.AddJsonOptions(options =>
{
var settings=options.SerializerSettings;
// do something with settings
});

In my particular case I wanted to serialize enums as strings, not as integers. To do that, you need to use the StringEnumConverter class. For example if you wanted to serialize the Gender property of a person as a string you could have defined the entity like this:
public class Person
{
public string Name { get; set; }
[JsonConverter(typeof(StringEnumConverter))]
public GenderEnum Gender { get; set; }
}

In order to do this globally, add the converter to the settings converter list:
services
.AddMvc()
.AddJsonOptions(options =>
{
options.SerializerSettings.Converters.Add(new StringEnumConverter {
CamelCaseText = true
});
});

Note that in this case, I also instructed the converter to use camel case. The result of the serialization ends up as:
{"name":"James Carpenter","age":51,"gender":"male"}

Saturday, 22 October 2016

Beware LINQ OrderBy in performance sensitive cases

I was doing this silly HackerRank algorithm challenge and I got the solution correctly, but it would always time out on test 7. I wracked my brain on all sorts of different ideas but to no avail. I was ready to throw in the towel and check out other people solutions, only they were all in C++ and seemed pretty similar to my own. And then I've made a code change and the test passed. I had replaced LINQ's OrderBy with Array.Sort.

Intrigued, I started investigating. The idea was creating a sorted integer array from a space delimited string of values. I had used Console.ReadLine().Split(' ').Select(s=>int.Parse(s)).OrderBy(v=>v); and it consumed above 7% of the total CPU of the test. Now I was using var arr=Console.ReadLine().Split(' ').Select(s=>int.Parse(s)).ToArray(); Array.Sort(arr); and the CPU usage for that piece of the code was 1.5%. So it was almost five times slower. How do the two implementations differ?

Array.Sort should be simple: an in place quicksort, the best general solution for this sort (heh heh heh) of problem. How about Enumerable.OrderBy? It returns an OrderedEnumerable which internally uses a Buffer<T> to get all the values in a container, then uses an EnumerableSorter to ... quicksort the values. Hmm...

Let's get back to Array.Sort. It's not as straightforward as it seems. First of all it "tries" a SZSort. If it works, fine, return that. This is an external native code implementation of QuickSort on several native value types. (More on that here) Then it goes to a SorterObjectArray that chooses, based on framework target, to use either an IntrospectiveSort or a DepthLimitedQuickSort. Even the implementation of this DepthLimitedQuickSort is much, much more complex than the quicksort used by OrderBy. IntrospectiveSort seems to be the one preferred for the future and is also heavily optimized, but less complex and easier to understand, perhaps. It uses quicksort, heapsort and insertionsort together.

Now, before you go all "OrderBy sucks!", read more about it. This StackOverflow list of answers seems to indicate that in case of strings, at least, the performance is similar. A lot of other interesting things there, as well. OrderBy uses a "stable" QuickSort, meaning that two items that are compared as equal will appear in their original order. Array.Sort does not guarantee that.

Anyway, the performance difference in my particular case seems to come from the native code implementation of the sort for integers, rather than algorithmic improvements, although I don't have the time right now to grab the various implementations and test them properly. However, just from the way the code reads, I would bet the IntrospectiveSort will compare favorably to the simple Quicksort implementation used in OrderBy.

Thursday, 13 October 2016

My first DMCA notice

Today I received two DMCA notices. One of them might have been true, but the second was for a file which started with
/*
Copyright (c) 2010, Yahoo! Inc. All rights reserved.
Code licensed under the BSD License:
http://developer.yahoo.com/yui/license.html
version: 2.8.1
*/
Nice, huh?

The funny part is that these are files on my Google Drive, which are not used anywhere anymore and are accessible only by people with a direct link to them. Well, I removed the sharing on them, just in case. The DMCA is even more horrid than I thought. The links in it are general links towards a search engine for notices (not the link to the actual notice) and some legalese documents, the email it is coming from is noreply-6b094097@google.com and any hope that I might fight this is quashed with clear intention from the way the document is worded.

So remember: Google Drive is not yours, it's Google's. I wonder if I would have gotten the DMCA even if the file was not being shared. There is a high chance I would, since no one should be using the link directly.

Bleah, lawyers!

Monday, 10 October 2016

Disqus customer support is non existent [Blogger comment synchronization broken]

I have enabled Disqus comments on this blog and it is supposed to work like this: every old comment from Blogger has to be imported into Disqus and every new comment from Disqus needs to be also saved in the Blogger system. Importing works just fine, but "syncing" does not. Every time someone posts a comment I receive this email:
Hi siderite,
 
You are receiving this email because you've chosen to sync your
comments on Disqus with your Blogger blog. Unfortunately, we were not
able to access this blog.
 
This may happen if you've revoked access to Disqus. To re-enable,
please visit:
https://siderite.disqus.com/admin/discussions/import/platform/blogger/
 
Thanks,
The Disqus Team
Of course, I have not revoked any access, but I "reenable" just the same only to be presented with a link to resync that doesn't work. I mean, it is so crappy that it returns the javascript error "e._ajax is undefined" for a line where e._ajax is used instead of e.ajax and even if that would have worked, it uses a config object that is not defined.

It doesn't really matter, because the ajax call just accesses (well, it should access) https://siderite.disqus.com/admin/discussions/import/platform/blogger/resync/. And guess what happens when I go there: I receive an email that the Disqus access in Blogger has been revoked.

No reply for the Disqus team for months, for me or anybody else having this problem. They have a silly page that explains that, of course, they are not at fault, Blogger did some refactoring and broke their system. Yeah, I believe that. They probably renamed the ajax function in jQuery as well. Damn Google!

Friday, 7 October 2016

Fragmentation in an SQL table with lots of inserts, updates and deletes

I've met an interesting case today when we needed to manipulate data from tens of thousands of people daily. Assuming we would use table rows for the information, then we get a table in which rows are constantly added, updated and deleted. The issue is with the space allocated in table pages.

SQL works like this: If it needs space it allocates some as a "page" which can contain more records. When you delete records the space is not reclaimed, it remains as is (this is called ghosting). The exception is when all records in a page are deleted, in which case the page is reused as an empty page. When you update a record with more data then it held before (like when you have a variable length column), the page is split, with the rest of the records on the page moved to a new page.

In a heap table (no clustered index) the space inside pages is reused for new records or for updated records that don't fit in their allocated space, however if you use a clustered index, like a primary key, the space is not reused, since there needs to be a correlation between the value of the column and its position in the page. And here lies the problem. You may end up with a lot of pages with very few records in them. A typical page is 8 kilobytes, so a table with a few integers in a record can hold hundreds of records on a single page.

Fragmentation can be within a page, as described above, also called internal, but also external, between pages, when the recycled pages are used for data that is out of order. To get a large swathe of records the disk might be worked hard in order to jump from page to page to get what is logically a continuous blob of data. It is disk input/output that kills a database.

OK, back to our case. A possible solution was to store all the data for a user in a "blob", a VARBINARY column. For reads or changes only the disk space occupied by the blob would be changed, with C# code handling everything. It's what is called trading CPU for IO, which is generally good. However this NoSql-like idea itself smelled badly to me. We are supposed to trust our databases, not work against them. The solution I chose is monitoring index fragmentation and occasionally issuing clustered index rebuilding or reorganizing. I am willing to bet that reading/writing the data equivalent to several pages of table is going to be more expensive than selecting the changes I want to make. Also, rebuilding the index will end up storing all the data per user in the same space anyway.

However, this case made me think. Here is a situation in which the solution might have been (and it was in a similar case implemented by someone else) to micromanage the way the database works. It made me question using a clustered index/primary key on a table.

These articles helped me understand more: