Subscribe to our Newsletter

Featured Posts (330)

  • Top 10 Hadoop Interview Questions & Answers

    Q1. What exactly is Hadoop?
    A1. Hadoop is a Big Data framework to process huge amount of different types of data in parallel to achieve performance benefits.

    Q2. What are 5 Vs of Big Data ?
    A2. Volume – Size of the data
    Velocity – Speed of change of data
    Variety – Different types of data : Structured, Semi-Structured, Unstructured data.

    Q3. Give me examples of Unstructured data.
    A3. Images, Videos, Audios etc.

    Q4. Tell me about Hadoop file system and processing framework.
    A4. Hadoop files system is called as HDFS – Hadoop distributed file system. It consists of Name Node, Data Node and Secondary Name Node.
    Hadoop processing framework is known as MapReduce. It caters Map and Reduce tasks that get scheduled in parallel to achieve efficiency.

    Q5/ What is High Availability feature in Hadoop2.
    A5. In Hadoop 2 Passive Name Node is introduced to avoid NameNode becoming single point of failure. This results into…

    Read more…
    • Comments: 0
    • Tags:
  • When coming across the term “text search”, one usually thinks of a large body of text, which is indexed in a way that makes it possible to quickly look up one or more search terms when they are entered by a user. This is a classic problem for computer scientists, to which many solutions exist.

    But how about a reverse scenario? What if what’s available for indexing beforehand is a group of search phrases, and only at runtime is a large body of text presented for searching? These questions are what this trie data structure tutorial seeks to address.

    text search algorithm tutorial using tries

    Applications

    A real world application for this scenario is matching a number of medical theses against a list of medical conditions and finding out…

    Read more…
    • Comments: 0
    • Tags:
  • 7 familiar myths regarding Big Data analytics

    Big Data analytics is in the buzz since a while, but people still have various misconceptions about it and the way it functions to assist you in transforming your business goals. Irrespective of the industry you are into, your company processes a huge amount of data raw data that can be tapered to a more organized form.

    Let’s have a look on the common myths about Big Data:-

    1. Big Data means lots of data

    When you hear Big Data, instinctively an image of loads of data floats in your mind. Big Data is not all about having a huge bank of information which is hardly of any use, it means having quality data which is useful for your business. Having a huge data bank means, it is prone to have redundant and duplicate entries. Big Data analytics helps you streamline the right data, irrespective of the quantity.

    2. Big Data is extremely essential

    Having raw and unprocessed data is practically of no value for an organization, unless it is…

    Read more…
  • Introduction

    By now, you have probably heard of the Hadoop Distributed File System (HDFS), especially if you are data analyst or someone who is responsible for moving data from one system to another. However, what are the benefits that HDFS has over relational databases?

    HDFS is a scalable, open source solution for storing and processing large volumes of data. HDFS has been proven to be reliable and efficient across many modern data centers.

    HDFS utilizes commodity hardware along with open source software to reduce the overall cost per byte of storage.

    With its built-in replication and resilience to disk failures, HDFS is an ideal system for…

    Read more…
    • Comments: 0
    • Tags:
  • 25 Predictions About The Future Of Big Data

    Guest blog post by Robert J. Abate.

    In the past, I have published on the value of information, big data, advanced analytics and the Abate Information Triangle and have recently been asked to give my humble opinion on the future of Big Data.

    I have been fortunate to have been on three panels recently at industry conferences which discussed this very question with such industry thought leaders as: Bill Franks (CTO, Teradata), Louis DiModugno (CDAO, AXA US), Zhongcai Zhang, (CAO, NY Community Bank), Dewey Murdick, (CAO, Department Of Homeland Security), Dr. Pamela Bonifay Peele (CAO, UPMC Insurance Services), Dr. Len Usvyat (VP Integrated Care Analytics, FMCNA), Jeffrey Bohn (Chief Science Officer, State Street), Kenneth Viciana (Business Analytics Leader, Equifax) and others.

    Each brought their unique perspective to the challenges of Big Data and their insights into their…

    Read more…
    • Comments: 0
    • Tags:
  • Originally posted here by Bernard Marr.

    When you learn about Big Data you will sooner or later come across this odd sounding word: Hadoop - but what exactly is it?

    Put simply, Hadoop can be thought of as a set of open source programs and procedures (meaning essentially they are free for anyone to use or modify, with a few exceptions) which anyone can use as the "backbone" of their big data operations.

    I'll try to keep things simple as I know a lot of people reading this aren't software engineers, so I hope I don't over-simplify anything - think of this as a brief guide for someone who wants to know a bit more about the nuts and bolts…

    Read more…
    • Comments: 0
    • Tags:
  • Guide To Budget Friendly Data Mining

    Unlike traditional application programming, where API functions are changing every day, database programming basically remains the same. The first version of Microsoft Visual Studio .NET was released in February 2002, with a new version released about every two years, not including Service Pack releases. This rapid pace of change forces IT personnel to evaluate their corporation’s applications every couple years, leaving the functionality of their application intact but with a completely different source code in order to stay current with the latest techniques and technology.

    The same cannot be said about your database source code. A standard query of SELECT/FROM/WHERE/GROUP BY,…

    Read more…
    • Comments: 0
    • Tags:
  • Top 10 Commercial Hadoop Platforms

    Guest blog post by Bernard Marr

    Hadoop – the software framework which provides the necessary tools to carry out Big Data analysis – is widely used in industry and commerce for many Big Data related tasks.

    It is open source, essentially meaning that it is free for anyone to use for any purpose, and can be modified for any use. While designed to be user-friendly, in its “raw” state it still needs considerable specialist knowledge to set up and run.

    Because of this a large number of commercial versions have come onto the market in recent years, as vendors have created their own versions designed to be more easily used, or supplied alongside consultancy services to get you crunching through your data in no time.…

    Read more…
    • Comments: 0
    • Tags:
  • 8 Hadoop articles that you should read

    Read more…
    • Comments: 0
    • Tags:
  • Originally posted on Data Science Central

    Recently, in a previous post, we reviewed a path to leverage legacy Excel data and import CSV files thru MySQL into Spark 2.0.1. This may apply frequently in businesses where data retention did not always take the database route… However, we demonstrate here that the same result can be achieved…

    Read more…
    • Comments: 0
    • Tags:
  • Where & Why Do You Keep Big Data & Hadoop?

    Guest blog post by Manish Bhoge

    I am Back ! Yes, I am back (on the track) on my learning track. Sometime, it is really necessary to take a break and introspect why do we learn, before learning.  Ah ! it was 9 months safe refuge to learn how Big Data & Analytics can contribute to Data Product.

    DataLake

    Data strategy has always been expected to be revenue generation. As Big data and Hadoop entering into the enterprise data strategy it is also expected from big data infrastructure to be revenue addition. This is really a tough expectation from new entrant (Hadoop) when the established candidate (DataWarehouse & BI) itself struggle mostly for its existence. So, it is very pertinent for solution architects to raise a question WHERE and WHY to bring the Big data (Obviously Hadoop) in the Data Strategy. And, the safe…

    Read more…
    • Comments: 0
    • Tags:
  • The Phoenix framework has been growing with popularity at a quick pace, offering the productivity of frameworks like Ruby on Rails, while also being one of the  fastest frameworks available. It breaks the myth that you have to sacrifice performance in order to increase productivity.

    So what exactly is Phoenix?

    Phoenix is a web framework built with the Elixir programming language. Elixir, built on the Erlang VM, is used for building low-latency, fault-tolerant, distributed systems, which are increasingly necessary qualities of modern web applications. You can learn more about Elixir from this blog post or their official guide.

    If you are a Ruby on Rails developer, you should definitely take an interest in Phoenix because of the performance gains it promises. Developers of other frameworks can also follow along to see how Phoenix approaches web development.

    Meet Phoenix on Elixir: A Rails-like Framework for Modern Web Apps

    In this article we will learn some of the things in Phoenix you should…

    Read more…
    • Comments: 0
    • Tags:
  • A Guide to Managing Webpack Dependencies

    The concept of modularization is an inherent part of most modern programming languages. JavaScript, though, has lacked any formal approach to modularization until the arrival of the latest version of ECMAScript ES6.

    In Node.js, one of today’s most popular JavaScript frameworks, module bundlers allow loading NPM modules in web browsers, and component-oriented libraries (like React) encourage and facilitate modularization of JavaScript code.

    Webpack is one of the…

    Read more…
    • Comments: 0
    • Tags:
  • Top 30 people in Big Data and Analytics

    Originally posted on Data Science Central

    Innovation Enterprise has compiled a top 30 list for individuals in big data that have had a large impact on the development or popularity of the industry. …

    Read more…
    • Comments: 0
    • Tags:
  • Ember Data (a.k.a ember-data or ember.data) is a library for robustly managing model data in Ember.jsapplications. The developers of Ember Data state that it is designed to be agnostic to the underlying persistence mechanism, so it works just as well with JSON APIs over HTTP as it does with streaming WebSockets or local IndexedDB storage. It provides many of the facilities you’d find in server-side object relational mappings (ORMs) like ActiveRecord, but is designed specifically for the unique environment of JavaScript in the browser.

    While Ember Data may take some time to…

    Read more…
  • Google formally announced Android 7.0 a few weeks ago, but as usual, you’ll have to wait for it. Thanks to the Android update model, most users won’t get their Android 7.0 over-the-air (OTA) updates for months. However, this does not mean developers can afford to ignore Android Nougat. In this article, Toptal Technical Editor Nermin Hajdarbegovic takes a closer look at Android 7.0, outlining new features and changes. While Android 7.0 is by no means revolutionary, the introduction of a new graphics API, a new JIT compiler, and a range of UI and performance tweaks will undoubtedly unlock more potential and generate a few new possibilities.
    Read more…
  • I first heard of Spark in late 2013 when I became interested in Scala, the language in which Spark is written. Some time later, I did a fun data science project trying to predict survival on the Titanic. This turned out to be a great way to get further introduced to Spark concepts and programming. I highly recommend it for any aspiring Spark developers looking for a place to get started.

    Today, Spark is being adopted by major players like Amazon, eBay, and Yahoo! Many organizations run Spark on clusters with thousands of nodes. According to the Spark FAQ, the largest known cluster has over 8000 nodes. Indeed, Spark is a technology well worth taking note of and learning about.

    apache spark tutorial

    This article provides an introduction to Spark including use cases and examples. It contains…

    Read more…
    • Comments: 0
    • Tags:
  • Guest blog post by Alessandro Piva

    The proliferation of data and the huge potentialities for companies to turn data into valuable insights are increasing more and more the demand of Data Scientists.

    But what skills and educational background must a Data Scientist have? What is its role within the organization? What tools and programming languages does he/she mostly use? These are some of the questions that the Observatory for Big Data Analytics of Politecnico di Milano is investigating through an international survey submitted to Data Scientists: if you work with data in your company, please support us in our…

    Read more…
    • Comments: 0
    • Tags:
  • Associative Data Modeling Demystified - Part1

    Guest blog post by Athanassios Hatzis

    Relation, Relationship and Association

    While most players in the IT sector adopted Graph or Document databases and Hadoop based solutions, Hadoop is an enabler of HBase column store, it went almost unnoticed that several new DBMS, AtomicDB previous database engine of X10SYS, and Sentences, based on associative technology appeared on the scene. We have introduced and discussed about the…

    Read more…
    • Comments: 0
    • Tags:
  • Guest blog post by Marc Borowczak

    Moving legacy data to modern big data platform can be daunting at times. It doesn’t have to be. In this short tutorial, we’ll briefly review an approach and demonstrate on my preferred data set: This isn’t a ML repository nor a Kaggle competition data set, simply the data I accumulated over decades to keep track of my plastic model collection, and as such definitely meets the legacy standard!

    We’ll describe steps followed on a laptop VirtualBox machine…

    Read more…
    • Comments: 0
    • Tags:

Resources

Research