XD blog

blog page

hadoop, jython, pig, python


2014-11-27 Some really annoying things with Hadoop

When you look for a bug, it could fail anywhere so you make assumptions on what you think is working and what could be wrong. I lost of couples of hours because I made the wrong one. I was preparing my teachings and I stored my data using Python serialization instead of json. That way, I could just use eval( string_serialized ). It already worked locally and on a first cluster hadoop when this instruction was embedded in python script. A PIG script was then calling it through streaming. I then tried the same instruction and it failed many times until I check this line was involved. And why? The error message was just not here to tell me anything about what happened. The script was just crashing. I suspect the line was just too long So, I'll do that another way. My first assumption were the schema I used in my Jython script. I finally chose to save that issue by considering only strings. I commented out line after line until this one out finally made my job work. That's the part of computing science I don't like. Long, full of guesses, impossible to accomplish without an internet connexion to dig into the vast amount of recent and outdated examples. And maybe in a couple of months, this issue will be solved by a simple update.

This night never happened. That's what I'll keep in mind. This night never happened.


<-- -->

Xavier Dupré