Skip to main content

Spatial Data Processing Library for Stratosphere



Stratophere is big data analysis platform which implements through parallel programming concepts.

Stratosphere's programming model provides six parallelizable operator functions
  • Map
  • Reduce
  • Join 
  • Cross
  • CoGroup
  • Union

JTS library is a standard java library to perform geographical analysis with special data types like points, paths, polygons.

In this spatial data processing library, we are going to implement various geographical analysis such as intersections, joins with big map data in standard formats like OSM(Open Street Map)

There is some effort in here to create new data types in stratosphere which is compatible with JTS library.



Here is an example to Stratosphere Java API with WordCount example. You can simply download the stratosphere-examples and check it your own.
When you need to familiar with stratosphere, there is exercises in bigdataclass website.
You can find out those implemented exercises in java.

Word Count example in stratosphere

We can simply says, it can takes an input file and counts the number of the occurrences of each word in that file.

Basically it uses a mapper and a reducer.

Mapper is extended from MapFunction and reducer is extended from ReduceFunction
Those classes(MapFunction, ReduceFunction) belong to the API (eu.stratosphere.api)
We do not need to worry about those classes too much because they are make our life easier. We need to implement pre-defined functions in these classes.

Mapper
It initially gets the first field as type of String value(java.lang.String) from the record.
Then it normalizes the string. After normalizing the string only contains
  • alphanumeric words
  • whole string in lower case
eg :
1,Big Hello to Stratosphere! :-) -> 1 big hello to stratosphere

Using a tokenizer, it splits into simple words.
Then collector collects data(each splitted word) as a new Record.

collector.collect(new Record(new StringValue(word), new IntValue(1)));

First field represents the word and second field repr esents the occurrence. So each time it should be 1.

Reducer

The reducer comes into the action to do the addition of number of occurrences in the each word. It gets the each record in the mapper and get the count for each word.

After get the sum, it updates the second field of the Record with sum.
element.setField(1, new IntValue(sum))

Example : 1 big hello to big stratosphere




Comments

Popular posts from this blog

Setting environment variables in ubuntu

Ubuntu has a nice way to configure most of the important softwares (java,sbt,scala,....etc) without installing through ubuntu software centre- sudo apt-get install ….... You only have to do is, download that files folder and put it that folder wherever you like and set environment variables. That's all.  Java Configuration  Go to this url  http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html   Download  jdk-7u45-linux-x64.tar.gz  Move that file where you want to store.  In here I've created a directory named 'hms' in my home folder and in hms there is a directory called 'installs' (path = /home/shashika/hms/installs )  Then go to your directory and extract the jdk-7u45-linux-x64.tar.gz file Then you can see the extracted file Now the important part. That is set the jdk/bin path in bashrc file which is in your home folder.  If you can see it in your home folder press ctrl-h then you ...

Install Mongodb using binaries in linux

Mongodb is open source, no sql database which widely use in modern applications. You can go through official mongodb site for more details. First we need to download the suitable(32-bit or 64-bit) tgz file through this link Then put that tgz file where you need (Note :- I put that file in 'installs' directory in Home directory) Extract the tar file Then we can see there is only one directory which called 'bin' inside the extracted mongodb directory. Now we need to create additional four directories which called conf , data , pid and logs . Then we need to add a mongodb configuration file to conf directory which called mongo_default.conf Here is my file. You have to set paths in the file according to your configurations.  If the configuration file is configured correctly, now you can start mongo server explicitly and access mongo database. First we need to go the the mongodb bin directory. Start mongo server with...