Skip to main content

Spatial Data Processing Library for Stratosphere



Stratophere is big data analysis platform which implements through parallel programming concepts.

Stratosphere's programming model provides six parallelizable operator functions
  • Map
  • Reduce
  • Join 
  • Cross
  • CoGroup
  • Union

JTS library is a standard java library to perform geographical analysis with special data types like points, paths, polygons.

In this spatial data processing library, we are going to implement various geographical analysis such as intersections, joins with big map data in standard formats like OSM(Open Street Map)

There is some effort in here to create new data types in stratosphere which is compatible with JTS library.



Here is an example to Stratosphere Java API with WordCount example. You can simply download the stratosphere-examples and check it your own.
When you need to familiar with stratosphere, there is exercises in bigdataclass website.
You can find out those implemented exercises in java.

Word Count example in stratosphere

We can simply says, it can takes an input file and counts the number of the occurrences of each word in that file.

Basically it uses a mapper and a reducer.

Mapper is extended from MapFunction and reducer is extended from ReduceFunction
Those classes(MapFunction, ReduceFunction) belong to the API (eu.stratosphere.api)
We do not need to worry about those classes too much because they are make our life easier. We need to implement pre-defined functions in these classes.

Mapper
It initially gets the first field as type of String value(java.lang.String) from the record.
Then it normalizes the string. After normalizing the string only contains
  • alphanumeric words
  • whole string in lower case
eg :
1,Big Hello to Stratosphere! :-) -> 1 big hello to stratosphere

Using a tokenizer, it splits into simple words.
Then collector collects data(each splitted word) as a new Record.

collector.collect(new Record(new StringValue(word), new IntValue(1)));

First field represents the word and second field repr esents the occurrence. So each time it should be 1.

Reducer

The reducer comes into the action to do the addition of number of occurrences in the each word. It gets the each record in the mapper and get the count for each word.

After get the sum, it updates the second field of the Record with sum.
element.setField(1, new IntValue(sum))

Example : 1 big hello to big stratosphere




Comments

Popular posts from this blog

Setting environment variables in ubuntu

Ubuntu has a nice way to configure most of the important softwares (java,sbt,scala,....etc) without installing through ubuntu software centre- sudo apt-get install ….... You only have to do is, download that files folder and put it that folder wherever you like and set environment variables. That's all.  Java Configuration  Go to this url  http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html   Download  jdk-7u45-linux-x64.tar.gz  Move that file where you want to store.  In here I've created a directory named 'hms' in my home folder and in hms there is a directory called 'installs' (path = /home/shashika/hms/installs )  Then go to your directory and extract the jdk-7u45-linux-x64.tar.gz file Then you can see the extracted file Now the important part. That is set the jdk/bin path in bashrc file which is in your home folder.  If you can see it in your home folder press ctrl-h then you ...

Draw line graphs with AchartEngine in Android

AchartEngine provides an easy way of drawing both line and bar graphs in android platform. In here I describe drawing line graphs.  First need to add AchartEngine jar file to your android project. You can download it from here . If it is maven based project you can add the dependency and all details are in here . After you download the jar, add it into the libs folder and right click on the jar and Select Build the path as Figure 1 follows. Figure 1 - Configure Build Path If it is a maven based project, simply you have to build the project, will be downloaded the jar file to your project. Figure 2 - Graph class In above java class, you can refer easily to understand AchartEngine line graphs implementation. In here, used two pre-defined arrays as X and Y.  Then add those values to TimeSeries which is names as “Line1”   Then you need to add this TimeSeries to XYMultipleSeriesDatase t, then that dataset should add to the XYSeriesRender ....