2010

December 08, 2010

Reading Java properties file in Clojure

A simple and effective way to read properties files in Clojure, since they transform into Clojure maps!


(into {} (doto (java.util.Properties.)
         (.load (-> (Thread/currentThread)
         (.getContextClassLoader)
         (.getResourceAsStream "log4j.properties")))))

Next, to actually read this in, using atoms to swap the values like this seems to work,


(def *args* 
  (atom {:a 10, :b 20})) 

(defn -main  
  (swap! *args* assoc :a (read) :b (read)))

If you have any questions or thoughts, don't hesitate to reach out. You can find me as @viksit on Twitter.

July 08, 2010

Decoding the US Military's Cyber Command Logo code

From this Wired article (http://www.fastsum.com/support/md5-checksum-utility-faq/md5-hash.php), it looks like there’s a number that is part of the cyber command’s logo – 9ec4c12949a4f31474f299058ce2b22a. Well, its 32 characters long, and looks like a hash. Sure enough, a quick python check later of the organization’s mission statement with md5 results in,


import hashlib
>>> hashlib.md5("USCYBERCOM plans, coordinates, integrates, synchronizes and conducts activities to: direct the operations and defense of specified Department of Defense information networks and; prepare to, and when directed, conduct full spectrum military cyberspace operations in order to enable actions in all domains, ensure US/Allied freedom of action in cyberspace and deny the same to our adversaries.").hexdigest()
'9ec4c12949a4f31474f299058ce2b22a'

Voila!

If you have any questions or thoughts, don't hesitate to reach out. You can find me as @viksit on Twitter.

June 20, 2010

Mutable vs Immutable datastructures - Serialization vs Performance

In my last post, I was playing around with methods to serialize Clojure data structures, especially a complex record that contains a number of other records and refs. Chas Emerick and others mentioned in the comments there, that putting a ref inside a record is probably a bad idea – and I agree in principle. But this brings me to a dilemma.

Lets assume I have a complex record that contains a number of "sub" records that need to be modified during a program's execution time. One scenario this could happen in is a record called "Table", that contains a "Row" which is updated (Think database tables and rows). Now this can be implemented in two ways,

Mutable data structures – In this case, I would put each row inside a table as a ref, and when the need to update happens, just fine the row ID and use a dosync – alter to do any modifications needed.
- The advantage is that all data is being written to in place, and would be rather efficient.
- The disadvantage however, is that when serializing such a record full of refs, I would have to build a function that would traverse the entire data structure and then serialize each ref by dereferencing it and then writing to a file. Similarly, I'd have to reconstruct the data structure when de-serializing from a file.

 
{:filename "tab1name",
 :tuples
 #<ref :field="" :tupdesc="" nil="">},
      :tup #<ref :name="">}
     {:recordid nil,
      :tupdesc
      {:x
       #<ref :field="">},
      :tup #<ref :name="">}}>,
 :tupledesc
 {:x
  #<ref :field="">}}

	</ref></ref></ref></ref></ref>

Immutable data structures – This case involves putting a ref around the entire table data structure, implying that all data within the table would remain immutable. In order to update any row within the table, any function would return a new copy of the table data structure with the only change being the modification. This could then overwrite the existing in-memory data structure, and then be propagated to the disk as and when changes are committed.
- The advantage here is that having just one ref makes it very simple to serialize – simply de-ref the table, and then write the entire thing to a binary file.
- The disadvantage here is that each row change would make it necessary to return a new "table", and writing just the "diff" of the data to disk would be hard to do.

 
#<ref :field="" :name="" :tup="" :tupdesc="" :tupledesc="" :tuples="" nil="">

<p>
So at this point, which method would you recommend?</p>
<p></p>
</ref>

If you have any questions or thoughts, don't hesitate to reach out. You can find me as @viksit on Twitter.

June 19, 2010

Serializing Clojure Datastructures

I’ve been trying to figure out how best to serialize data structures in Clojure, and discovered a couple of methods to do so. (Main reference thanks to a thread on the Clojure Google Group (http://groups.google.com/group/clojure/browse_thread/thread/29a94dd74b8beaaa/a05b126b192195e9) )


(def box {:a 1 :b 2})

(defn serialize 
  (with-open 
    (.writeObject outp o)))

(defn deserialize 
  (with-open 
    (.readObject inp)))

(serialize box "/tmp/ob1.dat")
(deserialize "/tmp/ob1.dat")

This works well for any Clojure data structure that is serializable. However, my objective is slightly more intricate – I’d like to serialize records that are actually refs. I see a few options for this,

– Either use a method that puts a record into a ref, rather than a ref into a record and then use the serializable, top level map
– Write my own serializer to print this to a file using clojure+read
– Use Java serialization functions directly.

Thoughts?

If you have any questions or thoughts, don't hesitate to reach out. You can find me as @viksit on Twitter.

June 11, 2010

Ode to an Orange

A whiff of citrus – vibrant,
shiny, dimpled and thick,
your fingers move, probing
textural ecstacy,
as your tastes await
the sweet tartness within.
Peel away the layers
softly, envelop a piece,
let your tongue steep
in a myriad of flavors,
with the lingering scent
of summer under a blue sky,
look around,
and all is well again.

If you have any questions or thoughts, don't hesitate to reach out. You can find me as @viksit on Twitter.

June 04, 2010

Stack implementation in Clojure II - A functional approach

My last post on the topic was creating a stack implementation using Clojure protocols and records – except, it used atoms internally and wasn’t inherently “functional”.

Here’s my take on a new implementation that builds on the existing protocol and internally, always returns a new stack keeping the original one unmodified. Comments welcome!


(ns viksit-stack
  (:refer-clojure :exclude ))

(defprotocol PStack
  "A stack protocol"
  (push  "Push element in")
  (pop  "Pop element from stack")
  (top  "Get top element from stack"))

; A functional stack record that uses immutable semantics
; It returns a copy of the datastructure while ensuring the original
; is not affected.
(defrecord FStack 
  PStack
  (push 
	"Return the stack with the new element inserted"
	(FStack. (conj coll val)))
  (pop 
       "Return the stack without the top element"
	 (FStack. (rest coll)))
  (top 
       "Return the top value of the stack"
       (first coll)))

; The funtional stack can be used in conjunction with a ref or atom

viksit-stack> (def s2 (atom (FStack. '())))
#'viksit-stack/s2
viksit-stack> s2
#<atom>
viksit-stack> (swap! s2 push 10)
#:viksit-stack.FStack{:coll (10)}
viksit-stack> (swap! s2 push 20)
#:viksit-stack.FStack{:coll (20 10)}
viksit-stack> (swap! s2 pop)
#:viksit-stack.FStack{:coll (10)}
viksit-stack> (top @s2)
10
</atom>

If you have any questions or thoughts, don't hesitate to reach out. You can find me as @viksit on Twitter.

June 03, 2010

Resolving Chrome's SSL Error

I recently started getting a number of SSL related errors on accessing https links with Google Chrome on Ubuntu. One looks like,

107 (net::ERR_SSL_PROTOCOL_ERROR)

The top link on Google’s search results is pretty fuzzy, so here’s the solution that works for me.

Go to Settings -> Options -> Under the hood, and enable both SSL 2.0 and SSL 3.0. This should allow Chrome to talk to the server with either protocol.

There’s also a DEFLATE bug that got fixed to solve this issue in release 340 something. http://codereview.chromium.org/1585041

If you have any questions or thoughts, don't hesitate to reach out. You can find me as @viksit on Twitter.

June 02, 2010

Stack implementation in Clojure using Protocols and Records

I was trying to experiment with Clojure Protocols and Records recently, and came up with a toy example to clarify my understanding of their usage in the context of developing a simple Stack Abstract Data Type.

For an excellent tutorial on utilizing protocols and records in Clojure btw – check out (http://kotka.de/blog/2010/03/memoize_done_right.html#protocols).


;; Stack example abstract data type using Clojure protocols and records
;; viksit at gmail dot com
;; 2010

(ns viksit.stack
  (:refer-clojure :exclude ))

(defprotocol PStack
  "A stack protocol"
  (push  "Push element into the stack")
  (pop  "Pop element from stack")
  (top  "Get top element from stack"))

(defrecord Stack 
  PStack
  (push 
	(swap! coll conj val))
  (pop 
       (let 
	 (swap! coll rest)
	 ret))
  (top 
       (first @coll)))

;; Testing
stack> (def s (Stack. (atom '())))
#'stack/s
stack> (push s 10)
(10)
stack> (push s 20)
(20 10)
stack> (top s)
20
stack> s
#:stack.Stack{:coll #<atom>}
stack> (pop s)
20

</atom>

PyCassa vs Lazyboy (updated)

Update

As Hans points out in the comment below, it appears pycassa natively supports authentication with org.apache.cassandra.auth.SimpleAuthenticator. Lazyboy on the other hand doesn’t by default.

It’s not too hard to do it though. Intuitively, we could do something like this.

NB: Untested code!! I might create a patch for this when I get the time, so this is just an outline.


# Add this to lazyboy's connection package
from cassandra.ttypes import AuthenticationRequest

And in lazyboy’s _connect() function, add another parameter called logins, that is a dict of keyspaces and credentials which looks like the following.


# logins format
{'Keyspace1' : {'username':'myuser', 'password':'mypass'}}


def _connect(self, logins):
"""Connect to Cassandra if not connected."""

    client = self._get_server()
    if client.transport.isOpen() and self._recycle:
        if (client.connect_time + self._recycle) > time.time():
            return client
        else:
            client.transport.close()
    
    elif client.transport.isOpen():
        return client
    
    try:
        client.transport.open()
        # Login code 
        # Remember that client is an instance of Cassandra.Client(protocol)
        if logins is not None:
            for keyspace, credentials in logins.iteritems():
                request = AuthenticationRequest(credentials=credentials)
            client.login(keyspace, request)
    
        client.connect_time = time.time()
    except thrift.transport.TTransport.TTransportException, ex:
        client.transport.close()
        raise exc.ErrorThriftMessage(
            ex.message, self._servers)

Original Post
I’ve been looking to answer which Python library is currently more fully featured to use to communicate with Cassandra.

From Reddit,

API-wise, both look like they are pretty much basic wrappers around the Cassandra Thrift bindings. I’d prefer lazyboy over pycassa though, given that firstly, it’s being used in production right now at Digg, and because it looks like lazyboy’s connection code is more featured than pycassa.

and

The connection code (Lazyboy) seems to be much more suited for use in production (use of auto pooling, auto load balancing, integrated failover/retry, etc.) (than PyCassa)

Thanks to GitHub, I was able to do some analysis of their traffic and commits,

Traffic Data

!(http://chart.apis.google.com/chart?chd=s:LMMeDMGFJFDEBECGCAEEEFFGKHEHOMEGBCDFGQEDDGIFEMCDEEEGBKDFDNCEODCJCFWLHJushZOQn9VgRbMMeVsn0i&chs=460x100&cht=lc&chxl=0:%7C2009-12-14%7C2010-01-13%7C2010-02-12%7C2010-03-14%7C1:%7C0%7C80%7C101%7C134%7C202%7C404&chm=B,EBF5FB,0,0,0&chco=008Cd6&chls=3,1,0&chg=8.3,20,1,4&chxt=x,y)

**LazyBoy**!(http://chart.apis.google.com/chart?chd=s:AAAAAAAAJGGDCABCBAADADADHACAEFADABHDCAABNGFCCJCCDCDCCCGFDCACBACDCAFCDANVNIHLWcfUILOLVUZh9Z&chs=460x100&cht=lc&chxl=0:%7C2009-12-14%7C2010-01-13%7C2010-02-12%7C2010-03-14%7C1:%7C0%7C46%7C57%7C77%7C115%7C231&chm=B,EBF5FB,0,0,0&chco=008Cd6&chls=3,1,0&chg=8.3,20,1,4&chxt=x,y) **Pycassa** ### Commit Data

**LazyBoy**!(http://chart.apis.google.com/chart?chs=400x150&chds=-1,24,-1,7,0,10&chf=bg,s,efefef&chd=t:0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23%7C0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7%7C0,0,0,0,0,0,0,1,0,5,0,0,0,0,0,1,1,0,0,2,1,4,1,0,0,0,0,0,0,2,3,2,2,0,6,0,0,2,0,0,0,0,0,0,0,0,0,2,10,1,0,0,1,0,2,1,0,2,0,0,0,1,0,1,0,0,0,0,0,4,0,0,0,0,0,0,0,0,0,0,0,0,1,2,1,2,1,6,0,2,3,1,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,2,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0&chxt=x,y&chm=o,333333,1,1.0,25.0&chxl=0:%7C%7C12am%7C1%7C2%7C3%7C4%7C5%7C6%7C7%7C8%7C9%7C10%7C11%7C12pm%7C1%7C2%7C3%7C4%7C5%7C6%7C7%7C8%7C9%7C10%7C11%7C%7C1:%7C%7CSun%7CMon%7CTue%7CWed%7CThr%7CFri%7CSat%7C&cht=s) **Pycassa** A larger number of people know about LazyBoy but code commits on it are currently on a stand still. Pycassa on the other hand seems to be growing at a pretty fast rate.

It looks like LazyBoy is probably a better library to start with, for now. I’ll talk about my experiences with both in another post.

If you have any questions or thoughts, don't hesitate to reach out. You can find me as @viksit on Twitter.

May 08, 2010

Thrush Operators in Clojure (->, ->>)

I was experimenting with some sequences today, and ran into a stumbling block: Using immutable data structures, how do you execute multiple transformations in series on an object, and return the final value?

For instance, consider a sequence of numbers,


user> (range 90 100)
(90 91 92 93 94 95 96 97 98 99)

How do you transform them such that you increment each number by 1, and then get their text representation,


"^_`abcd"

Imperatively speaking, you would run a loop on each word, and transform the sequence data structure in place, and the last operation would achieve the desired result. Something like,


>>> s = ""
>>> a = 
>>> a

>>> for i in range(0,len(a)):
...   s += chr(a+1)
... 
>>> s
'^_`abcd'

If you knew about maps in python, this could be achieved with something like,


>>> ''.join()
'^_`abcd'

The easiest way to do this in Clojure is using the excellently named (http://debasishg.blogspot.com/2010/04/thrush-in-clojure.html)(-> and ->>). According the doc,

Threads the expr through the forms. Inserts x as the
second item in the first form, making a list of it if it is not a
list already. If there are more forms, inserts the first form as the
second item in second form, etc.

It is used like this,


user> (->> (range 90 100) (map inc) (map char) (apply str))
"^_`abcd"

Basically, the line, (-> 7 (- 3) (- 6)) implies that 7 be substituted as the first argument to -, to become (- 7 3). This result is then substituted as the first argument to the second -, to get (- 4 6), which returns -2.


user> (-> 7 (- 3) (- 6))
-2

Voila!

If you have any questions or thoughts, don't hesitate to reach out. You can find me as @viksit on Twitter.

May 06, 2010

Stock Crash

!(http://nikonizer.yfrog.com/Himg265/scaled.php?tn=0&server=265&filename=po7.png&xsize=640&ysize=640)This is what the stock market looked like at 2pm today.

From the (http://www.reuters.com/article/idUSTRE6341EA20100506),

The Dow suffered its biggest ever intraday point drop, which may have been caused by an erroneous trade entered by a person at a big Wall Street bank, multiple market sources said.

and the suspected cause? A UI Glitch!

In one of the most dizzying half-hours in stock market history, the Dow plunged nearly 1,000 points before paring those losses—all apparently due to a trader error.

According to multiple sources, a trader entered a “b” for billion instead of an “m” for million in a trade possibly involving Procter & Gamble , a component in the Dow. (CNBC’s Jim Cramer noted suspicious price movement in P&G stock on air during the height of the market selloff. Watch.)

Sources tell CNBC the erroneous trade may have been made at Citigroup .

“We, along with the rest of the financial industry, are investigating to find the source of today’s market volatility,” Citigroup said in a statement. “At this point we have no evidence that Citi was involved in any erroneous transaction.”

According to a person familiar with the probe, one focus is on futures contracts tied to the Standard & Poor’s 500 stock index, known as E-mini S&P 500 futures, and in particular a two-minute window in which 16 billion of the futures were sold.

Citigroup’s total E-mini volume for the entire day was only 9 billion, suggesting that the origin of the trades was elsewhere, according to someone close to Citigroup’s own probe of the situation. The E-minis trade on the CME.

If you have any questions or thoughts, don't hesitate to reach out. You can find me as @viksit on Twitter.

May 01, 2010

C++ Modulus Operator weirdness

Its surprising that the modulus (%) operator in C++ works upwards, but not downwards. When working on some code, I expected,

-1 % 3 = 2
0 % 3 = 0
1 % 3 = 1
2 % 3 = 2

but ended up with,

-1 % 3 = -1
0 % 3 = 0
1 % 3 = 1
2 % 3 = 2

As a result, you’d need to ensure that either you check that your result is

result = n % 3;
if( result

Or, a better solution might be to change the expression such that the negative case never arises,

{ int n = 0; int inc = -1; cout

Hope this helps someone out there!

If you have any questions or thoughts, don't hesitate to reach out. You can find me as @viksit on Twitter.

April 20, 2010

Implementing Binary Search with Clojure

I was trying to implement a simple binary search using a purely functional approach, and after much hacking, googling and wikibooking, came up with this in Clojure.

(defn binarysearch ( \ (binarysearch lst 0 (dec (count lst)) n)) ( \ (if (> lb ub) -1 ; this is the case where no element is found (let [mid (quot (+ lb ub) 2) mth (nth lst mid)] (cond ; mid > n, so search lower (> mth n) (recur lst lb (dec mid) n) ; mid

If you have any questions or thoughts, don't hesitate to reach out. You can find me as @viksit on Twitter.

April 15, 2010

Clojure Application with Command Line Arguments

I was recently looking for a method to create an application with Clojure that would allow specification of command line arguments.

I came across an excellent (http://stackoverflow.com/questions/1341154/building-a-clojure-app-with-a-command-line-interface) on Stack Overflow by (http://stackoverflow.com/users/128927/alanlcode), that provides a spectacular example. I’ve (http://gist.github.com/367681) it for reference.

If you have any questions or thoughts, don't hesitate to reach out. You can find me as @viksit on Twitter.

April 04, 2010

20 Days of Clojure

Came across an excellent (http://loufranco.com/blog/files/category-20-days-of-clojure.html) of blog posts by Lou Franco, where he uses the SICP videos as input to learn more about Clojure.

His explanation of HashMap implementations in Clojure, using multimethods, as well as pointers on parallelizing functional programs are very well written. I’m currently on his day 10 post.

If you have any questions or thoughts, don't hesitate to reach out. You can find me as @viksit on Twitter.

April 03, 2010

Thinking in C++ by Bruce Eckel is an excellent book

I just finished skimming through Bruce Eckel’s Thinking in C++ book – available for free from his website.

Volume 1 covers the basics pretty well and I didn’t really do much more than glance at it, but volume 2 is highly recommended for its marvelous treatment of the C++ STL containers and algorithms.

If you have any questions or thoughts, don't hesitate to reach out. You can find me as @viksit on Twitter.

March 30, 2010

Programming Clojure with Clojure 1.2.0 snapshot

This post contains a list of changes between Clojure 1.1.0 and 1.2.0 that would affect you if you’re reading Stuart Halloway’s “Programming Clojure”.

It looks like you’d have to replace,

(use ‘)
(filter indexable-word? (re-split #”\W+” “A fine day it is”))

with

(use ‘\) (filter indexable-word? (split #”\\W+” “A fine day it is”))

–> (“fine” “day”)

since the str-utils module got renamed to string, and the re-split function to split.

If you have any questions or thoughts, don't hesitate to reach out. You can find me as @viksit on Twitter.

March 29, 2010

BeautifulSoup on Cojure? Enlive

I was looking for a suitable library for Clojure that would work like Python’s BeautifulSoup or lxml – and found enlive.

An excellent tutorial here http://github.com/swannodette/enlive-tutorial.

If you have any questions or thoughts, don't hesitate to reach out. You can find me as @viksit on Twitter.

March 28, 2010

Clojure AOT compilation tutorial

I was trying to figure out how to AOT compile a Clojure program, in order to really see some fast execution times. The simplest way to describe AOT compilation would be how its done in Java,

javac file.java
java file

The invocation of the Java compiler (javac) is the pre-compilation of the source file, which is then loaded by the JVM in the next step. In the case of Clojure, when a program is run using,

clj myfile.clj

The code is first compiled, and then executed – resulting in large amounts of time for the output to be displayed even for simple programs.

The AOT compile process turned out to be trickier to set up than I expected, so I thought I’d put it out there for all those who get stuck after reading the original documentation at http://clojure.org/compilation.

Firstly, the directory structure for my experiment looks like this,

Directory structure

I created dir1, and its subdirectories as below. The code is in clojure/examples/ and the classes/ directory is the default compile.path – something that the documentation neglected to mention. Without this path, compilation WILL fail.

/dir1/.clojure
/dir1/clojure/
/dir1/clojure/classes
/dir1/clojure/examples
/dir1/clojure/examples/hello.clj

The .clojure file

This file is used with my clj script, that can be obtained from (http://mark.reid.name/sap/setting-up-clojure.html).
It contains a list of directories that are to be specified to the Clojure compiler at compile time. The file looks like,

/dir1:/dir1/classes

hello.clj

This code of course is the default from the Clojure website, as a test.

(ns clojure.examples.hello
(:gen-class))

(defn -main
\
(println (str "Hello " greetee "!")))

The next step is to invoke the Clojure REPL using the clj script, from the dir1 directory.

Type in the following to compile the program in the clojure.examples namespace,

Clojure 1.2.0-master-SNAPSHOT
user=> (compile 'clojure.examples.hello)
clojure.examples.hello
user=>

Success! And the resulting output of the classes directory is,

$ ls /dir1/classes/clojure/examples

hello$_main__5.class
hello$loading__4946__auto____3.class
hello.class
hello__init.class

And lastly, to run this program as any other Java program, you can use,

java -cp ./classes:/opt/jars/clojure.jar:/opt/jars/clojure-contrib.jar clojure.examples.hello Viksit
Hello Viksit!

Also, I highly recommend Stuart Holloway’s book “Programming Clojure”. Its turning out to be an excellent read.

If you have any questions or thoughts, don't hesitate to reach out. You can find me as @viksit on Twitter.

February 27, 2010

viksit-code

It’s been a while since I’ve uploaded code back to the community. This page will track my code on the web, as well as contains links to useful snippets of code in various opensource projects that I’ve worked with.

Below is the Google code repository for my code.

http://code.google.com/p/viksit-code/

I’m just getting into using git for everything which totally rocks,

(http://github.com/viksit)

If you have any questions or thoughts, don't hesitate to reach out. You can find me as @viksit on Twitter.

February 25, 2010

Moving from MySQL to Cassandra - Pros and Cons

Moving on from the question of which NoSQL database you should choose, after reading these excellent posts from (http://about.digg.com/blog/looking-future-cassandra%0A) and (http://nosql.mypopescu.com/post/407159447/cassandra-twitter-an-interview-with-ryan-king), I recently asked a question on (http://www.stackoverflow.com) regarding the pros and cons of moving from MySQL to Cassandra.

Stackoverflow Question is (http://stackoverflow.com/questions/2332113/switching-from-mysql-to-cassandra-pros-cons) \

I got some excellent insight and feedback, primarily from (http://spyced.blogspot.com), one of the maintainers of Cassandra, and a systems architect at Rackspace.

He’s also written a (http://www.rackspacecloud.com/blog/2010/02/25/should-you-switch-to-nosql-too/) on the Rackspace blog today as a follow up on the question.

I wanted to highlight a great tip he mentions (via Ian Eure of Digg, and also the creator of a Python Cassandra lib called LazyBoy) that was mentioned at the latest PyCon ’10,

Ian Eure from Digg (also switching to Cassandra) gave a great rule of thumb last week at PyCon: “if you’re deploying memcache on top of your database, you’re inventing your own ad-hoc, difficult to maintain NoSQL database,” and you should seriously consider using something explicitly designed for that instead.

Also mentioned are a couple of general caveats in using NOSQL vs Relational databases,

The price of scaling is that Cassandra provides poor support for ad-hoc queries, emphasizing denormalization instead. For analytics, the upcoming 0.6 release (in beta now) offers Hadoop map/reduce integration, but for high volume, low-latency queries you will still need to design your app around denormalization.

Looks like the Cassandra 0.6 beta is coming out tomorrow, and can already be built from repositories in case anyone’s interested in doing so (and telling me about their experiences!).

If you have any questions or thoughts, don't hesitate to reach out. You can find me as @viksit on Twitter.

February 23, 2010

How 'Aardvark was the 6th idea we tried'

Just came across an excellent (http://ventilla.posterous.com/hello-world-2603) by Max Ventilla, the co-founder of Aardvark (a company that Google bought a few weeks ago). His description of how Aardvark was the 6th idea that he and his co-founders tried is pretty uplifting to anyone even remotely interested in entrepreneurship.

What I found interesting was that they would build a prototype and launch it to potential users, and see what the uptake was. If it didn’t work, they’d brainstorm some more and launch a new idea to see that worked. It definitely lends credence to the entire build, launch and iterate idea that most people proselytize about, but may not necessarily follow.

Some of these ideas are probably obvious to anyone who’s ever brainstormed about web services – I think the main problem with most people out there is that they get stuck on the execution, trying to make things perfect for launch – which negatively impacts the use and adoption of their product.

Another lesson to take home is to have the ability to take an idea from concept to execution REALLY quickly – which means having an established base of people, code, platforms and frameworks ready to start deploying an idea on. If you were to start from scratch each time, I’m not sure if you’re going to go too far!

Twang,
For posterity’s sake, here’s a list of the early ideas we rejected before committing to Aardvark:

Rekkit – A service to collect your ratings from across the web and give better recommendations to you. The system would also provide APIs to 3rd party websites so they could have richer profile data and better algorithms to do collaborative filtering.

Ninjapa – A way that you could open accounts in various applications through a single website and manage your data across multiple sites. You could also benefit from a single sign-on across the web and streamlined account creation, management, and cancellation.

The Webb – A central phone number that you could call and talk to a person who could do anything for you that you could do on the web. Your account information could be accessed centrally and sequences of simple web tasks could be done easily without the use of a computer.

Web Macros – A way to record sequences of steps on websites so that you could repeat common actions, even across sites, and share “recipes” for how you accomplished certain tasks on the web.

Internet Button Company – A way to package steps taken on a website and smart form-fill functionality. From a button, a user could accomplish tasks, even across multiple sites, quickly without having to leave the site or application where the button was embedded. People could encode buttons and share buttons a la social bookmarking.

Each of these ideas turned out to be interesting but not compelling. My cofounders and I would conceive of an idea, build it in very early prototype form, and get it in the hands of users. People might express enthusiasm for one idea or another but they wouldn’t actually use the product that, in admittedly raw form, offered the particular value proposition. In contrast, Aardvark (a chat buddy that could accept questions and have them answered by people in your network in real-time), got pretty immediate uptake.

As an aside, most of these ideas resemble products that venture funded startups have since brought to market. Even as I see much more impressive implementations of what we prototyped, I’m skeptical of their mass appeal.

If you have any questions or thoughts, don't hesitate to reach out. You can find me as @viksit on Twitter.

February 11, 2010

Misleading Youtube-Visa ad

(http://hphotos-snc3.fbcdn.net/hs122.snc3/16961_601990693664_312080_35014750_2352233_n.jpg)](http://hphotos-snc3.fbcdn.net/hs122.snc3/16961_601990693664_312080_35014750_2352233_n.jpg)Misleading Youtube-Visa ad on youtube.com. Try clicking on the “Close this ad” button or the “sound off” button. It just takes you to the visa-youtube homepage! Misdirection? I think so! (see bottom left – shows the link to the ad page)

If you have any questions or thoughts, don't hesitate to reach out. You can find me as @viksit on Twitter.

February 11, 2010

Why I quit Google Buzz

Google Buzz seems to have mashed up a number of positive features from Twitter and Friendfeed into itself, and I quite like the idea – or rather, the vision it is supposed to espouse. Unfortunately, it is at a stage where too much of my private data is available to people I would much rather not allow access to.

So I quit it, removed all my buzzes, made my Google profile even more private than it was before, and thought – whew, I’m done. And then, all of a sudden, I’m bombarded with a ton of questions about why. This blog post is to coherently recount my thoughts (and have people correct me if I’m wrong).

First off – the question I hear most is – “Why not post privately to a group of people?”.

Easier said than done, unfortunately. Like many people, my gmail contacts list is a weird amalgamation of everyone from craiglist car-ad replies, to close friends, and even some colleagues. The thought of conversations with friends suddenly becoming visible to them is a bit unsettling.

Now how is this different from being on Twitter, you ask?

Well, in a number of ways. Anyone with a google account can follow you, and you’ve got to proactively block them from doing so – in my opinion, a slightly flawed strategy. I’d much rather prefer the Twitter/FB model of having *you* control who can or can’t follow you at the very beginning. [I also noticed some lag issues with Buzz wherein you may have followed or unfollowed or blocked someone, but 10 hours later, that operation seems never to have gone through. (Yes its a fledgling service, but still).

Coming back to private posts – Buzz does not offer a list of followers to post to. As a result, I don’t have the option of posting only to those active users on Buzz who follow me. Instead, I’m expected to create a list of people on my contact list who I can post to. And who do I see here? A list of contacts on gmail who I’ve interacted with most frequently – and who may not even be on Buzz! Is it really that hard implementing a “Post to Followers” feature?

Next – any time you mention a user with @, it gets autocompleted to @username@gmail.com – and this data is visible on your public google profile. While a twitter username is something that can’t very easily be mapped to a person, their email address is a whole new ball game. And I wouldn’t like my contact list being exposed to the world.

And its not just me. If I comment on a friend’s Buzz, and they haven’t bothered to make it private – this information is as easily obtained.

So till the time Buzz becomes a bit more private – I’m going to only follow Buzzes from a distance and not participate till I feel my issues with its privacy controls have been addressed.

Update:

Major privacy flak Google’s getting! (http://fugitivus.wordpress.com/2010/02/11/fuck-you-google/)

If you have any questions or thoughts, don't hesitate to reach out. You can find me as @viksit on Twitter.

Viksit

Gaur

About

Colophon