Spatial Data Processing Library for Stratosphere

Stratophere is big data analysis platform which implements through parallel programming concepts.

Stratosphere's programming model provides six parallelizable operator functions

Map
Reduce
Join
Cross
CoGroup
Union

JTS library is a standard java library to perform geographical analysis with special data types like points, paths, polygons.

In this spatial data processing library, we are going to implement various geographical analysis such as intersections, joins with big map data in standard formats like OSM(Open Street Map)

There is some effort in here to create new data types in stratosphere which is compatible with JTS library.

Here is an example to Stratosphere Java API with WordCount example. You can simply download the stratosphere-examples and check it your own.

When you need to familiar with stratosphere, there is exercises in bigdataclass website.

You can find out those implemented exercises in java.

Word Count example in stratosphere

We can simply says, it can takes an input file and counts the number of the occurrences of each word in that file.

Basically it uses a mapper and a reducer.

Mapper is extended from MapFunction and reducer is extended from ReduceFunction

Those classes(MapFunction, ReduceFunction) belong to the API (eu.stratosphere.api)

We do not need to worry about those classes too much because they are make our life easier. We need to implement pre-defined functions in these classes.

Mapper

It initially gets the first field as type of String value(java.lang.String) from the record.

Then it normalizes the string. After normalizing the string only contains

alphanumeric words
whole string in lower case

eg :

1,Big Hello to Stratosphere! :-) -> 1 big hello to stratosphere

Using a tokenizer, it splits into simple words.

Then collector collects data(each splitted word) as a new Record.

collector.collect(new Record(new StringValue(word), new IntValue(1)));

First field represents the word and second field repr esents the occurrence. So each time it should be 1.

Reducer

The reducer comes into the action to do the addition of number of occurrences in the each word. It gets the each record in the mapper and get the count for each word.

After get the sum, it updates the second field of the Record with sum.

element.setField(1, new IntValue(sum))

Example : 1 big hello to big stratosphere

Makes LIFE easier

Search This Blog

Spatial Data Processing Library for Stratosphere

Comments

Post a Comment

Popular posts from this blog

IbirdsPro – Identify birds in Sri Lanka via their colours

Setting environment variables in ubuntu