Subscribe to our Newsletter

All Posts (350)

  • A Guide to Managing Webpack Dependencies

    Read more…
    • Comments: 0
    • Tags:
  • Top 10 Commercial Hadoop Platforms

    Guest blog post by Bernard Marr

    Hadoop – the software framework which provides the necessary tools to carry out Big Data analysis – is widely used in industry and commerce for many Big Data related tasks.

    It is open source, essentially meaning that it is free for anyone to use for any purpose, and can be modified for any use. While designed to be user-friendly, in its “raw” state it still needs considerable specialist knowledge to set up and run.

    Because of this a large number of commercial versions have come onto the market in recent years, as vendors have created their own versions designed to be more easily used, or supplied alongside consultancy services to get you crunching through your data in no time.…

    Read more…
    • Comments: 0
    • Tags:
  • Where & Why Do You Keep Big Data & Hadoop?

    Guest blog post by Manish Bhoge

    I am Back ! Yes, I am back (on the track) on my learning track. Sometime, it is really necessary to take a break and introspect why do we learn, before learning.  Ah ! it was 9 months safe refuge to learn how Big Data & Analytics can contribute to Data Product.

    DataLake

    Data strategy has always been expected to be revenue generation. As Big data and Hadoop entering into the enterprise data strategy it is also expected from big data infrastructure to be revenue addition. This is really a tough expectation from new entrant (Hadoop) when the established candidate (DataWarehouse & BI) itself struggle mostly for its existence. So, it is very pertinent for solution architects to raise a question WHERE and WHY to bring the Big data (Obviously Hadoop) in the Data Strategy. And, the safe…

    Read more…
    • Comments: 0
    • Tags:
  • Top 30 people in Big Data and Analytics

    Originally posted on Data Science Central

    Innovation Enterprise has compiled a top 30 list for individuals in big data that have had a large impact on the development or popularity of the industry. …

    Read more…
    • Comments: 0
    • Tags:
  • Associative Data Modeling Demystified - Part2

    Guest blog post by Athanassios Hatzis

    Association in Topic Map Data Model

    Introduction

    In the previous article of this series we examined the association construct from the perspective of Entity-Relationship data model. In this post we demonstrate how Topic Map data model represents associations. In order to link the two we continue with another SQL query from our relational database

    ```
    SELECT suppliers.sid,
    suppliers.sname,
    suppliers.scountry,…

    Read more…
    • Comments: 0
    • Tags:
  • Guest blog post by Alessandro Piva

    The proliferation of data and the huge potentialities for companies to turn data into valuable insights are increasing more and more the demand of Data Scientists.

    But what skills and educational background must a Data Scientist have? What is its role within the organization? What tools and programming languages does he/she mostly use? These are some of the questions that the Observatory for Big Data Analytics of Politecnico di Milano is investigating through an international survey submitted to Data Scientists: if you work with data in your company, please support us in our…

    Read more…
    • Comments: 0
    • Tags:
  • Associative Data Modeling Demystified - Part1

    Guest blog post by Athanassios Hatzis

    Relation, Relationship and Association

    While most players in the IT sector adopted Graph or Document databases and Hadoop based solutions, Hadoop is an enabler of HBase column store, it went almost unnoticed that several new DBMS, AtomicDB previous database engine of X10SYS, and Sentences, based on associative technology appeared on the scene. We have introduced and discussed about the…

    Read more…
    • Comments: 0
    • Tags:
  • Originally posted on Data Science Central

    Recently, in a previous post, we reviewed a path to leverage legacy Excel data and import CSV files thru MySQL into Spark 2.0.1. This may apply frequently in businesses where data retention did not always take the database route… However, we demonstrate here that the same result can be achieved…

    Read more…
    • Comments: 0
    • Tags:
  • 25 Predictions About The Future Of Big Data

    Guest blog post by Robert J. Abate.

    In the past, I have published on the value of information, big data, advanced analytics and the Abate Information Triangle and have recently been asked to give my humble opinion on the future of Big Data.

    I have been fortunate to have been on three panels recently at industry conferences which discussed this very question with such industry thought leaders as: Bill Franks (CTO, Teradata), Louis DiModugno (CDAO, AXA US), Zhongcai Zhang, (CAO, NY Community Bank), Dewey Murdick, (CAO, Department Of Homeland Security), Dr. Pamela Bonifay Peele (CAO, UPMC Insurance Services), Dr. Len Usvyat (VP Integrated Care Analytics, FMCNA), Jeffrey Bohn (Chief Science Officer, State Street), Kenneth Viciana (Business Analytics Leader, Equifax) and others.

    Each brought their unique perspective to the challenges of Big Data and their insights into their…

    Read more…
    • Comments: 0
    • Tags:
  • Guest blog post by Marc Borowczak

    Moving legacy data to modern big data platform can be daunting at times. It doesn’t have to be. In this short tutorial, we’ll briefly review an approach and demonstrate on my preferred data set: This isn’t a ML repository nor a Kaggle competition data set, simply the data I accumulated over decades to keep track of my plastic model collection, and as such definitely meets the legacy standard!

    We’ll describe steps followed on a laptop VirtualBox machine…

    Read more…
    • Comments: 0
    • Tags:
  • I first heard of Spark in late 2013 when I became interested in Scala, the language in which Spark is written. Some time later, I did a fun data science project trying to predict survival on the Titanic. This turned out to be a great way to get further introduced to Spark concepts and programming. I highly recommend it for any aspiring Spark developers looking for a place to get started.

    Today, Spark is being adopted by major players like Amazon, eBay, and Yahoo! Many organizations run Spark on clusters with thousands of nodes. According to the Spark FAQ, the largest known cluster has over 8000 nodes. Indeed, Spark is a technology well worth taking note of and learning about.

    apache spark tutorial

    This article provides an introduction to Spark including use cases and examples. It contains…

    Read more…
    • Comments: 0
    • Tags:
  • Ember Data (a.k.a ember-data or ember.data) is a library for robustly managing model data in Ember.jsapplications. The developers of Ember Data state that it is designed to be agnostic to the underlying persistence mechanism, so it works just as well with JSON APIs over HTTP as it does with streaming WebSockets or local IndexedDB storage. It provides many of the facilities you’d find in server-side object relational mappings (ORMs) like ActiveRecord, but is designed specifically for the unique environment of JavaScript in the browser.

    While Ember Data may take some time to…

    Read more…
  • Fast Forward transformation with SPARK

    Fast forward transformation process in data science with Apache Spark

    Data Curation :

    Curation is a critical process in data science that helps to prepare data for feature extraction to run with machine learning algorithms. Curation generally involves extracting, organising, integrating data from different sources. Curation may be a difficult and time consuming process depending on the complexity and volume of the data involved.

    Most of the time data won't be readily available for feature extraction process, data may be hidden is unobstructed and complex data sources and has to undergo multiple transformational process before feature extraction .

    Also when the volume of data is huge this will be a huge time consuming process and can be a bottle neck for the…

    Read more…
    • Comments: 0
    • Tags:
  • 11 Great Hadoop, Spark and Map-Reduce Articles

    This reference is a part of a new series of DSC articles, offering selected tutorials, references/resources, and interesting articles on subjects such as deep learning, machine learning, data science, deep data science, artificial intelligence, Internet of Things, algorithms, and related topics. It is designed for the busy reader who does not have a lot of time digging into long lists of advanced publications.

    11 Great Hadoop, Spark and Map-Reduce Articles

    Read more…
    • Comments: 0
    • Tags:
  • Google formally announced Android 7.0 a few weeks ago, but as usual, you’ll have to wait for it. Thanks to the Android update model, most users won’t get their Android 7.0 over-the-air (OTA) updates for months. However, this does not mean developers can afford to ignore Android Nougat. In this article, Toptal Technical Editor Nermin Hajdarbegovic takes a closer look at Android 7.0, outlining new features and changes. While Android 7.0 is by no means revolutionary, the introduction of a new graphics API, a new JIT compiler, and a range of UI and performance tweaks will undoubtedly unlock more potential and generate a few new possibilities.
    Read more…
  • Java versus Python

    Originally posted on Data Science Central

    Interesting picture that went viral on Facebook. We've had plenty of discussions about Python versus R on DSC. This picture is trying to convince us that Python is superior to Java. It is about a tiny piece of code to draw a pyramid.

    This raises several questions:

    • Is Java faster than Python? If yes, under what circumstances? And by how…
    Read more…
    • Comments: 1
    • Tags:
  • Originally posted on Data Science Central

    These are the findings from a CrowdFlower survey. Data preparation accounts for about 80% of the work of data scientists. Cleaning data is the least enjoyable and most time consuming data science task, according to the survey. Interestingly, when we asked the question to our data scientist, his answer was:

    Automating the task of cleaning data is the most time consuming aspect of data science, though once done, it applies to most data sets; it is also the most enjoyable because as you automate more and more, it frees a lot of time to focus on other things.

    Below are the three charts…

    Read more…
    • Comments: 0
    • Tags:
  • Why Not So Hadoop?

    Guest blog post by Kashif Saiyed

    Does Big Data mean Hadoop? Not really, however when one thinks of the term Big Data, the first thing that comes to mind is Hadoop along with heaps of unstructured data. An exceptional lure for data scientists having the opportunity to work with large amounts data to train their models and businesses getting knowledge previously never imagined. But has it lived up to the hype? In this article, we will look at a brief history of Hadoop and see how it stands today.

    2015 Hype Cycle – Gartner

     
    hadoophype

    Some key takeaways from the Hype cycle of 2015:

    1. ‘Big Data’ was at the Trough of Disillusionment stage in 2014, but is not seen in the 2015 Hype cycle.
    2. Another interesting point is that ‘Internet of Things’ which suggests a network of interconnected devices around us, is at peak for 2 years consistently…
    Read more…
    • Comments: 0
    • Tags:
  • Originally posted on Data Science Central

    Summary

    Introducing Data Science teaches you how to accomplish the fundamental tasks that occupy data scientists. Using the Python language and common Python libraries, you'll experience firsthand the challenges of dealing with data at scale and gain a solid foundation in data science.

    About the Technology

    Many companies need developers with data science skills to work on projects ranging from social media marketing to machine learning. Discovering what you need to learn to begin a career as a data scientist can seem bewildering. This book is designed to help you get started.

    About the Book

    Introducing Data ScienceIntroducing Data Science explains vital data science concepts and teaches you how to…

    Read more…
    • Comments: 0
    • Tags:
  • Originally posted on Data Science Central

    Summary:  This is the first in a series of articles aimed at providing a complete foundation and broad understanding of the technical issues surrounding an IoT or streaming system so that the reader can make intelligent decisions and ask informed questions when planning their IoT system. 

    In This Article

    In Lesson 2

    In Lesson 3

    Is it IoT or…

    Read more…
    • Comments: 0
    • Tags:
RSS
Email me when there are new items –

Resources

Research