Separating Data Engineering and Data Science in Analytics

“Data is the new oil – the source for corporate energy and differentiation in the 21st century.” – EMC Study, Data Science Revealed: A Data-Driven Glimpse into the Burgeoning New Field.

In a word, data science and engineering – the not-so-odd couple of the data analytics world – refers to large, diverse, complex, longitudinal, and/or distributed data sets generated from instruments, sensors, Internet transactions, email, video, click streams, and/or all other digital sources available today and in the future.  That definition aside, there are differences between engineering and data science when it comes to big data analytics. The former manages the environments, data, integration, and warehouse functions, among other responsibilities, while the latter is more business-oriented, where company analysts creates and manages the flow of data information.

More specifically, businesses can run a constant stream of data via data science techniques, using data integration and quality management best practices. But it’s the engineering side of the data equation that isolates and industrializes data, mostly using data lakes, business rule environments, and data flow, i.e., the movement of data through various benchmarks and processes and integration areas.  In that way, the twin tandem of data science and engineering can meld together, and continually build on data insights, thus producing better information that can lead to heftier profits.

Getting On the Same Page

To get on that path to better data and stronger revenues, it’s helpful for analysts to explicitly understand the difference between data science and engineering, ideally so they can leverage the power of both in their search for premium business information.  For starters, knowing the difference between data scientists and data engineers can help understand both technologies.

Data scientists – It’s the job of data scientists to ask – and answer – the correct data questions. Hence, data scientists must be highly skilled in complex fields like statistics, machines and data mining. It’s helpful for data scientists to understand computer programing, especially database technologies. Data scientists should also be well versed in creating and updating charts and graphs, and in using sophisticated data visualization tools, and be able to express that data in a clear, concise and compelling written manner. Usually, it takes a Ph.D certification and five years of educational training (including graduate school) to become a data scientist.

No doubt, the need to develop good data scientists can mean the difference between success and failure on any big data campagn. As EMC puts it in a recent survey, “big data scientists touch data in more ways. They are twice as likely as those working with normal data to work across the data life cycle, everywhere from acquiring new data to business decision making, and around half spend a lot of time on each of these activities.”

Data engineers – By and large, data engineer gathers, stores, processes, and makes that data available to data engineers, primarily via API. To excel in all of the above job facets, data engineers much be thoroughly familiar with software engineering, and a deep knowledge of databases and solid engineering best practices, including an equally thorough knowledge of data administration, data cleansing, and the know-how to construct and maintain data pipelines that disseminate information to data scientists. A degree in software engineering is a must to become a data engineer.

Managing Two Disparate Technologies

For companies looking to build solid, sophisticated data analysis programs, it’s imperative in the team-building phase not to “overlap” data science and data engineering efforts, including staffing personnel.

Companies can easily squander resources and time by merging the two technologies, mostly by having data scientists handling the creation, organization, cleaning, and integration and movement of data. Those requirements should be handled by data engineers, and if you don’t have them, you’ll have to hire them (the average annual salary for a data scientist is $92,000, while a data engineer earns about $80,000 annually, according to PayScale.com.  The key is finding and developing good data science and data engineering talent is to weigh any qualifications against the need to create, manage, and understand the movement of data.

Here’s how EMC sees the issue:

“In order to remain competitive in the world of data science, companies need to create organizational cultures that are conducive to data-driven decision making. First, they need to expand their view on the possibilities when hiring data scientists and engineers, and look outside business degrees, and even computer science, to find practitioners with the intellectual curiosity and technical depth to solve big data problems, with academic concentrations in the hard sciences, statistics, and mathematics. Rather than hiring for experience with a certain toolkit, companies should invest in on-the-job training with their chosen set of emerging technologies.”  “Once companies have brought in the right talent, they need to create an environment conducive to effective data management. That means building high-performing, cross-functional teams that include a variety of roles, including programmers, statisticians, and graphic designers, and aligning them to directly support interested business decision makers.

They should also loosen restrictions on data in the enterprise, allowing employees to more freely run data-driven experiments. Finally, successful data scientists and engineers should be given free access to run experiments on data, without bureaucratic obstacles, so that they can rapidly translate their own intellectual curiosity into business results.”

Working In Tandem to Maximize Science and Engineering

Make no mistake, while there are definitely unique and important differences between data scientists and data engineers, companies must have both technologies, and both staffing roles, working in pure harmony.  After all, any big data campaign that digs up raw information and turns it into a valuable commodity needs to expertly meld data science and data engineering – and also know how they differ.

Share this post

Leave a Reply

Your email address will not be published. Required fields are marked *