Manipulate large files on S3 with ease

Working with large files (> 1G) on windows is not easy and many of Linux command shell are missing.

So if you have large files in S3 and you want query/get/list and then search the files you pretty fast get stuck.

I found these tools to be helpful:

S3 command line (http://s3.codeplex.com/) which lets you work from cmd or power shell with easy commands manipulating S3 file system (like get, put, auth, list and many more)

LTF – large files viewer (http://www.swiftgear.com/ltfviewer/screenshots.html) best for csv files cause it has a tooltip hint when you hovering a field telling you the title of that field

Go ahead and give them a try

Remote debugging Hadoop using Cloudera VM instead of Amazon’s EMR

Abstract

Enable you to debug the map-reduce process on your local machine by setting up Hadoop in a VM instead of debug through logs in amazon’s EMR

Instructions

  1. Download cloudera quick start VM follow the instructions here for pre requisite and download link. You can choose the VM type (vmware, virtual box etc’)
  2. Extract the file and start the VM
    1. username: cloudera
    2. password: cloudera
  3. The VM should start with FF browser open
    1. in the home page (or in the bookmarks) click the cloudera manager and wait for it to initialize
    2. once the you have it running you can issue hadoop commands on the terminal
    3. open a terminal and issue “Hadoop fs –ls” which should work but return nothing
  4. make sure you have a connection between the host machine and the cloudera guest VM
    1. issue an ifconfig command on the terminal to find your ip
    2. try to ping your ip from the host
    3. if that does not work you will have to configure the VMplayer/VB network settings
      1. in VB, go to network and select
        1. Attached to: Bridge Adapter
        2. Promiscuous Mode: Allow VMs
      2. Apply the new settings and wait for the network to work again
      3. Issue another ifconfig command to find you new IP
      4. Try to ping it again from the host
  5. On windows only: DO not try to use “Share Folder” or restart the VM, this will cause couldera services to stop functioning because of IP conflicts!
  6. Transfer your hadoop jar to the VM along with the following files to the same folder
    1. Run.sh
    2. Input file to process by hadoop
  7. Issue the following command on terminal to move the input file to Hadoop
    1. Hadoop fs –put <the location of your data file> /user/cloudera/input
    2. Check that the file is there: Hadoop fs –ls /user/cloudera/input
  8. Make sure your Run.sh file has the write data
    export 
    export HADOOP_OPTS=
    echo $HADOOP_OPTS
    
    #remote debug option enabled
    export HADOOP_OPTS="-Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=5005 -Dlocal=true"
    
    #remote debug option disabled
    #export HADOOP_OPTS="-Dlocal=true"
    
    #your jar file
    export JARP=<the hadoop jar name you want to execute>.jar
    
    #hadoop output
    export OUTPUT=/user/cloudera/out/ 
    
    #delete the output dir if exists
    hadoop dfs -rm -r $OUTPUT 
    
    ##########################regular run#############################
    hadoop jar $JARP <jar parameters go here, including the output folder we set above>
    ##################################################################
    
    
  9. Execute Hadoop: “./run/sh”
  10. Now you see in the terminal that the debugger awaits a session so you’ll need to connect using the host inteliJ session with your Hadoop jar code
    1. Open inteliJ on your host machine 
    2. Open “run/debug configurations”

    3. Create a new remote configuration
    4. image
    5. Start the debug session

      i. Note that you may need to start the session twice if the command before the job execution in the run.sh file fails (you will see this data in the terminal that will state it is waiting for the debugger to attach)

Installing pyodbc on ubuntu > 12

I had many problems with it till I got it running, pycharm (“intelligence”) won’t help you there cause you’ll need to compile the lib for it to work first.

$ sudo apt-get install unixODBC-dev

$ sudo apt-get install g++

$ sudo apt-get install python-dev

$ sudo apt-get install tdsodbc

download and extract the pyodbc source code (http://code.google.com/p/pyodbc/downloads/detail?name=pyodbc-3.0.7.zip&can=2&q=)

run the setup file in the extracted directory:
$ python setup.py build
$ sudo python setup.py install

10 hours = desktop and mobile app for YouTube remote controller (hackathon)

The stack:

  • Angular JS
  • SignalR
  • MVC 5
  • VS 2013
  • MongoDB
  • Zurb Foundation 5.0
  • Android Client

The Mission:

  • Create a remote you tube player (android device like TV, tablet)that is controlled from a web site

Scope

  • 1 day (10 hours)
  • 2 teams: .NET (my team) and Android

Repository

Nugget the good parts

Since we already had a fair experience with SignalR, Mongo and of course MVC the wiring was pretty easy. Installing nugget packages for signalR, Mongo and Unity gave us a quick head start setting a repository pattern for mongo and simple hub for signalR

Nugget the bad parts

Too many flavors for each package: Finding the right packages is hell and feels like sending your hand to find a Coca-Cola bottle inside a bucket full of beers. Many packages have 3-5 (in a good day) flavors, and cause much confusion and web-searching… it seems like that most of the vendors has more than one flavor for the same package. This is just wrong; instead of a simple description and single packages the confusion is vast. In most cases you find yourself creating a side projects just to see what the packages gives you and what the difference from other flavors is. For instance “Unity boostrapper for ASP.NET Web API” and “Unity boostrapper for ASP.NET MVC” – now, what if you are using both in your MVC project – this is just wrong.

No option to set package location: there is no option to set where each package files should be installed, it just sets the files where ever the package creator thought they should be – again, this is just wrong since there is a very simple solution for this.

Angular JS the good parts

Organizing the project: Getting started with angular was quite easy, all though you need to read at least the entire tutorial before you can get started with it. I liked that it organizes your project pretty good and does not leave you room to think how and where to place every file.

You can chose from feature base to component base organization, and since I’m coming from the WPF world – feature base it is.

Scopes: scopes are pretty cool, since they act as your view mode. Although angular is more of an MVC and lacks the good parts of a true mvvm FW, it’s still pretty easy to get along.

Angular JS the bad parts

Bindings: I wish angular was more like knockout, but I get it the google guys wanted to keep their fields clean without having to declare each field as observable and turning them into functions. Still the binding and the digest model can become very heavy, and the need for $apply is just cumbersome. Especially when working with YouTube API (or any external PAI for that matter) you get to call $apply a lot, and that’s just wrong (againJ)

Service, Factory, and Provider: the difference between provider and factory is pretty clear. The difference between factory and service is not clear at all. If one of Tikal’s gurus wouldn’t explain that services provide a contractor and factory just return a singleton instance I wouldn’t have understand it till today. Still it’s not that clear to me.

Moreover I think that the naming (service, provider, factory) are much the same in the plane kind of way that most programmers get them. Really: what is the difference between a provider and a service? Doesn’t a service provide data?

Zurb Foundation the good parts

Mobile first: although bootstrap 3 is mobile first by nature it seems to me that bootstrap is not 100% that way and I’m still not sure the EM vs Pixels are well implemented. It was very easy to get started and the documentation is a breeze

Zurb Foundation the bad parts

Templates: there are no templates for foundation, not even close to what bootstrap has to offer

Android Clients integration

Now, this is the interesting part. I thought it’s going to be real hard to do the integration but it was rather simple. The android app implemented a native player since playing using HTML5 and the browsers is not that recommended. Even google pass you to the YouTube app from the browser when you want to play a movie.

So the deal was to build a hybrid app, one that displays a native player surrounded with HTML content. This seems like a good future for the app development echo system. Where you can focus on HTML for compatibility between OS’s and leaves the hard parts for the native OS to do.

The basic idea was that the app will consume a SignalR service and implement the SignalR proxy for every method. We wasn’t sure that the android app can implement JS method at runtime and we finally came up with a different approach:

   1:  ytRemotePlayerApp.controller('PlayerCtrl', ['$scope', '$http', '$rootScope', 'remotePlayerService',
   2:    function ($scope, $http, $rootScope, remotePlayerService) {
   3:   
   4:        var client = remotePlayerService.getClientProxy();
   5:        var playingVideoId;
   6:   
   7:        client.receivePlayCommand = function (videoId) {
   8:            console.log('receivePlayCommand ' + videoId);
   9:   
  10:            if (window.JBridge) {
  11:                if (videoId == playingVideoId) {
  12:                    window.JBridge.play();
  13:                } else {
  14:                    playingVideoId = videoId;
  15:                    window.JBridge.start(videoId);
  16:                }
  17:            }
  18:        };
  19:   
  20:        client.receivePauseCommand = function (videoId) {
  21:            console.log('receivePauseCommand ' + videoId);
  22:   
  23:            if (window.JBridge) {
  24:                window.JBridge.pause();
  25:            }
  26:        };
  27:        
  28:    }]);

Notice that we look for the window.JBridge which is pushed to the page by the android app. So the desktop version is not prone to errors and once the bridge is alive we can call methods that where implemented on via the bridge.

So combining SignalR and android was easy. See the repo for more details – the master mind behind this is Michael who came up with this setup.

Mockups

mockups

mockups

Enterprise Library Is Now True Open Source Project

Although EntLib has long shared its source code (Patterns & Practices where one of the first to do so) they are now accepting community contributions and can be declared as true OSS

    1. Starting Thursday, Nov 21, 2013, we will be accepting community contributions to the codebase (both new features and bug fixes), subject to the contribution guidelines.
    2. Microsoft patterns & practices continues to staff the project to curate as well as engage in active development and sustained engineering together with the community.
    3. In the spirit of true open source, the p&p project team will use the same process for making updates to the application blocks as any community member. No secret repositories, hidden issue trackers, or internal-only processes.
    4. Our quality bars are not lowered in any way.

taken from: http://blogs.msdn.com/b/agile/archive/2013/11/21/microsoft-enterprise-library-open-development-model.aspx

If you are not familiar with EntLib and you consider yourself a .NET programmer than you should really get to know it

Also, do check the new Enterprise Library 6: Developer’s Guide Released

Debug Android WebView (JS + HTML)

The long awaited ability to debug HTML on your android using your desktop is here. But, only if you are using kitkat 4.4. under the hood, kitkat is using chrome 30 instead of the unknown “Android Browser”, also we get all the new HTML 5 features like web workers and web sockets…

ability to debug remotely Android native webviews –including PhoneGap apps- and the Android Browser and it works smoothly both from real devices and from the emulator.

Hopefully the need for jshybugger is no longer a necessity