Being Productive with HDInsight

This post will be the holding place where I put misc. tools and tips for HDInsight

Build Tools

1. Apache ANT (

Extract archive to c:\ant\ then modify the classpath to include Ant:

set ANT_HOME=c:\ant

set PATH=%PATH%;%ANT_HOME%\bin

2. Apache IVY (

  • Copy Ivy.JAR to Ant lib folder

3. Git Client (

Data Preparation/Research Tools

1. CURL (


3. Enthought Data Platform (EDP) (

4. GNU Parallel ( )


Community contributed user defined functions for PIG

  • Retrieve source from Git:
    git clone
    ls Pig
    git checkout -b branch-0.9 remotes/origin/branch-0
  • Build Pig and then PiggyBank using Ant
  • Pig Script:
    -- myscript.pig
    REGISTER C:\Users\Administrator\pig\contrib\piggybank\java\piggybank.jar;
    A = LOAD 'student_data' AS (name: chararray, age: int, gpa: float);
    B = FOREACH A GENERATE myudfs.UPPER(name);
    DUMP B;

One thought on “Being Productive with HDInsight

  1. Pingback: MMM More Bacon – Pig User-Defined Functions (UDFs) | Bluewater SQL

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s