The Dynamics of Data Roles & Teams


I’m continually surprised by the responsibilities and titles of new roles emerging within the ‘data profession’. Admittedly, this is fairly a nebulous concept and I suspect there are a variety of opinions amongst practitioners as to what the composition of this space looks like. However there are certain trends within this area that practitioners would also agree on. Data is being taken more seriously by organisations than ever before with comparable growth seen in terms of dedicated ‘data people’, investment and technology.

For the sake of convenience & readability, I would like to go over data roles briefly categorised by tech-revolutions — those that influenced a substantial change — and especially ones that will keep evolving in future. In addition, i wrote a piece on Evolution of Analytics with Data recently, that helps gather a better context for this article.

As an amateur blogger, this is clearly a perspective and could be a long-read for them drowsy eyes. A word of advice: grab a cup of Coffee.

Business Intelligence Roles

Quite rightly so, ‘BI’ doesn’t qualify to compete with the trendy buzzonyms around the tech-ecosystem in 2018 and isn’t pleasing to the ears of our data-savvy generation. Are ETL tools & strategies no more in use ? Is the scope of  BI overshadowed by the vast application of big-data & data science methodologies ?? — Hell NO !!


How traditional BI roles were structured in-accordance with the 
business model of the organisation. Source: Microsoft Technet Wiki

Business Intelligence has seen a considerable decline in the last year or two. However, I wouldn’t go so far to call BI dead as it’s application is very critical to major businesses. Roles like BI Analysts, Data Architects, ETL Developer, DW Engineer, BIDW Admins would only become more crucial, emphasising an extra eye on market-leading tools & technology over the jack-of-all-trades roles in present domains.

Business intelligence(BI) concept icons

Scope of Business Intelligence techniques employed in 2018.
Source:Check out infographics & vector designs on DepositPhotos

According to a recent Wisdom of Crowds® Business Intelligence Market Study, BI will continue to provide competent job salaries and dominate certain areas in the market. Here are some of it’s key numeric take-aways in 2018:

  • Executive Management, Operations, & Sales: 3 areas driving BI adoption.
  • Dashboards, reporting, end-user self-service, advanced visualisation, and data warehousing: 5 technologies and initiatives strategic to BI.
  • Small organisations up-to 100 employees have highest rate of BI penetration.
  • 50% of vendors offer perpetual on-premises licensing and cloud subscriptions .
  • Fewer than 15% of respondent organisations have a Chief Data Officer.

In case if you still have a difference of opinion, i recommend you the read the full-post: The State of Business Intelligence, 2018

Big Data & Data Science Roles

Before we take a deep-dive into the current roles, let’s take a step-back to understand how and where it all started. My idea is to demonstrate these roles with a storytelling narrative over the traditional plaintext definitions — the latter being easily accessible around Internet. Additionally, every new-wave in the industry gives birth to confusing buzzwords, false renditions & surrealistic stipulations (which is a mouthful to say the least).

The Change

‘Big data’ was coined to distinguish from small data as it was not generated purely by the firm’s transaction systems. It also stated that predictive analytics offered better data trends in contrary to the fact-based comprehension to go beyond intuition when making decisions. If dimensions & analytics weren’t justifying enough, this phase welcomed the use of a community-driven “Open Source” tools over the highly priced licences.

I usually refrain from citing names of tools in my posts, but it’s fairly impossible to describe this revolution without mentioning Apache Hadoop. The technology-stack & extensible projects, the functional programming paradigms (scalable, concurrent & distributed systems), the rise of noSQL DB systems, job scheduling & cluster resource management, the changing aspects of Drag-n-Drop ETL and better data modelling techniques — all of which was brought together by Hadoop, but it ultimately emphasised on the last — code is the best abstraction for software. And, it introduced — typically on a broad sense — an idea of having custom architecture ready for future integrations with Data Science & Machine Learning.

From the developers’ perspective, What this meant is you don’t necessarily have to be working for tech big-guns to develop new disruptive projects. You had the backing of a community at your disposal and emerging collaboration platforms like Github to showcase your work.


Hierarchy of roles in Big Data & Analytics-driven companies.

From an organisational view, Software Engineers (java developers), DW engineers (BI/ETL developers, Data architects), Infra Admins (DBAs, Linux SAs) explored fancier titles as Big-Data Engineer, Hadoop Developers, Hadoop Architects, Big-Data Support Engineers began to flourish in the job-market. BI-roles fell down the pecking order and the years where line of business users and data-personnel using the same tools, were nothing but over.

Hadoop roles_2

BI roles gradually moving out of the circle of Big Data teams. 
Source: DataFlair

At an industrial level, it had the most impact — as it’s not just tech-firms and online companies that can create products and services from big-data analytics — It’s practically every firm in the industry.

The Fusion

The tech-industry suddenly got divided due to the rising demand of employing Big data with Data Science strategies. As such, the field-roles were classified into three buckets : Software Engineering (Strong programming with Front & Back-end engineers, Web developers, Infra-admins, Middleware specialists, iOS/Android developers), Data Engineering (Strong Data background like ETL developers, DWH architects, BI analysts, Hadoop engineers, DBAs) and welcomed a third set of individuals deemed as the next-generation quantitative analysts (possessing both computational & analytical skills), who specialised in a growing field of study: Data Science.


Venn Diagram showing tools & techniques under SE vs DE vs DS domains.
Source: Ryan Swanstrom, Data Science 101

According to me, this classification yielded in a significant transition with the positives best-leveraged by small-scale firms (< 50 employees) like emerging startups, research-facilities as well as large-scale enterprises (> 1000 employees) like telecom, e-commerce, social media etc. Startups had the liberty of combining multiple-roles into one and encouraging multi-disciplinary growth opportunities, while the mainstream giants had no trouble in employing distinct roles across different departments, thereby adding areas of generating more business.

Entrepreneurs with now a medium-sized (or SMBs) company, who were striving to gain commercial reckoning — competing with the big-players in their respective market — were arguably affected the most. The initial success — through series funding rounds or backed by venture capitalist investments — allowed them to grow larger in numbers (50-300+ employees). They rushed into indefinite-hires, redundant roles, poor decision-making strategies. Eventually, the constant pressure to stay in the market under quarterly-timelines enforced unprecedented lay-offs, stock-distribution losses and even  resulted liquidation at an early stage. Some tech-savvy investors (whom i’d like to refer as guardian-angels) offered M&A assistance, but the industry saw the downside of absorbing roles for the first time.

The Overlap

Meanwhile, it wasn’t just companies having a hard-time with evolving data-roles. This era saw an uprising number of data science enthusiasts (both Academic & Experienced) coming out of their comfort-caves & expanding their skill-set. And Why not, each of these applicants (Mathematicians, Phd Doctorates, Analysts) had every right to apply for one of the finest-paid jobs of the 21st century. Along came esteemed-university professors & philanthropists, with their versions of the ideal-candidature, but that didn’t stop the mob.

Titles with Data Prefixes helped make early distinctions between roles with similar line of tasks. The intent was aimed at identifying skill-coverage and harnessing the right-potential. Data Analysts shied away from business and drove their eyes onto statistics & engineering while Data Architects kept their depth-focus on publishing models (not to be confused with ML), database design, governance with their trademark politically-neutral attitude.

Data Roles

Radar chart explaining overlap of skills between Data-driven roles.
Ignore "Mad Skillz" as it implies "Natural Abilities". Source:edX

Businesses started to gather more understanding by nurturing capabilities of Prescriptive Analytics with Machine Learning around their premise. They began competing on analytics not only in the traditional sense – by improving internal business decisions – but also by creating more valuable products and services. The sheer need (or greed) to attain concrete goals — improved results than last quarter — proportionally showered an overhead of roles and responsibilities. As such, a promising yet challenging position as the likes of a Data Scientist, also beckoned for a central figure across teams — the daily go-to person for anything related to data. Not a lot has been spoken about the stress, fatigue of many a such burdened individuals. If a person of such calibre invested a majority of their time on analysing, they also managed to find time to pursue better opportunities for themselves. Here’s a satirical treat on KDnuggets supporting my claim.

The Trade-Off

Two big questions came into light: Is Data-Science the next bubble ? My answer: NO, but the “Data Scientist” title was arguably becoming one. A textbook demand-and-supply problem —where every aspirant wants a fair share of goods & commodities, but only a few proved worthy of claiming it. Hmm, a bit confusing ?. How do you deal with a fresh graduate applying for this role or what do you do when your data scientist is likely to leave, and you’re left with a pack of “self-proclaimed” ones knocking on your door.

Secondly, With data accessed directly from sources like websites, APIs, social media or internet; the need for software programming languages & the prowess to do so with fast efficiency — couldn’t be compromised .”Not all data scientists held great software foundations” or “Why were software engineering concepts ignored, amidst all the buzz for Data Science ?”. Companies soon realised that only a role reallocation can normalise such inclinations as they looked onto broader engineers to heavily support their data scientists and find that equilibrium amongst different entity roles.

Software engineers, who appeared to have a knack for data science & machine learning , stepped-up to help with this dilemma and strengthened the data engineer club. While those practising core web-programming & stack-driven ambitions moved onto bigger challenges: Full-Stack Engineer.


Full-Stack by past roles (left) & by tech-stack areas (right).

A win-win situation : data scientists got a reliable sidekick with a sigh-of-relief (the inflated hype for their ‘crown’ lowered) and an equally-competent role on the horizon to challenge them. The collusion not only sent those-craving-enthusiasts spinning but also opened another door, making data engineering one of the most sophisticated disciplines today. This modern-day Data Engineer complements every other role, a must-have handyman in every firm and are practically the first-hires in startups these days.

DE vs DS.png

An Infographic-take on Data Engineers and Data Scientists.
Source: Read Full Post on DE vs DS, by Karlijn Willems

The gamble (workaround play that clicked) by balancing mutually distinct roles paid off perfectly but the tech-industry knew they couldn’t afford another setback and had to be prepared with the increasing acceptance of Artificial Intelligence looming around the corner.

The Resolution

Inevitably, companies identified the flaws in their organisational-structure: positions, priorities and capabilities — and incepted Data-Driven TeamsThe prime focus being on role-distinctions, division of labour, avoiding task conflicts, proper rules of collaboration. An extended example of role-based leaders pioneering respective units inside such a team would be : Principal Data Scientist & Engineering Lead.

This slideshow requires JavaScript.

An early look of a well-structured Data Science team under the 
same roof. Source: DataCamp Blog Community

Today, A perfect data-science team is a myth or otherwise an engaging subject of heated debate. What companies expect from their teams is to assemble as a group of superheroes (The Avengers— What they fail miserably on occasions is to appoint a person who provides such teams with a context (Nick Fury). This is where Chief Data Officers come into powerful existence. With data becoming an integral business strategy, CDOs are becoming a more critical role in an organisation. In a Forbes survey, more than 50% of CDOs will likely report directly to the CEO in 2018. They’re bound to take on more active roles in shaping their businesses’ initiatives.

I often get disappointed upon seeing job-descriptions containing “Advanced English Skills” or “Native candidates only”. So, I proactively question (or troll) such job-posters every single time (I do enjoy their apparent pause). Language shouldn’t be deemed as a barrier, rather be utilised as a formidable source of unifying teams. The best example in 2018 to make my stance clear is indeed a language in itself: Python. Founders (CEOs & CDOs) must trickle these little communications within their teams and most importantly — their first focal point — the Talent Requisition team


How Python brings a team of diversified role-types together.
Source: ActiveWizards

These days HR coordinators, recruiters, outsourcing head-hunters all have access to ample data resources (Medium, Datacamp) & data-friendly platforms (LinkedIn Recruiter, Glassdoor) to refine their search for an improved hiring; thereby making their roles even data-driven.

Machine Learning & AI-driven Roles

Perhaps the most compelling aspect about Machine Learning is its seemingly limitless applicability. There are already so many fields being impacted by ML and now AI, including Education, Finance, and more. Machine Learning techniques are already being applied to critical areas within the Healthcare sphere, impacting everything from care variation reduction efforts to medical scan analysis

There are a number of companies for whom their data (or their data analysis platform) is their product. In this case, the data analysis or machine learning going on can be pretty intense. This is probably the ideal situation for someone who has a formal mathematics, statistics, or physics background and is hoping to continue down a more academic path.

“Machine Learning Engineers often focus more on producing great data-driven products than they do answering operational questions for a company.”


New addition to the DataScience team working on ML. Source:Udacity

Companies have become more encouraging and are constantly on the lookout for Machine Learning Engineers : open-minded candidates for ranging from all age-groups (Academic Interns to Research Scientists). The social media generation also have a far more appreciation than before as seen on LinkedIn, Medium, Github.


Bird's-Eye view of multiple ML-roles in AI firms. Source:Udacity

AI-driven companies successfully implementing intelligent machines (like Chatbots) are already a step-ahead than others. Roles organised by software, applied & core is a clear indication — they’re serious about their product developments & service offerings. Since there isn’t any generalisation on profile & seniority today, they’re in full liberty to improvise AI-titles in the future.

Encompassing Roles

There are many roles that complement data-driven teams on a day-to-day basis. They are a must-have in organisation irrespective of the teams they belong to. You’d probably wonder why i didn’t mention them earlier. Honestly, I was skeptical for reasons below:

  • I have limited expertise on these profiles and their scope.
  • They are not primarily seen under the category of data-driven roles.
  • Their domain versatility allows them to operate across different teams.

Let me try to explain before the knife-wielding mob gets here.

  • Graphic Designers : The Creative Heads in every sense. A complete package of art, science, programming, ideas and imagination with endless capabilities. They add value with their vocal-presence & fearless attitude. My personal favourites.
  • Decision-Makers : A role often misconstrued and overlooked. Especially in domain-specific startups, Before hiring that PhD-trained data scientist, make sure you have a decision-maker who understands the art & science of decision-making.
  • DevOps & Site-Reliability Engineers : Broadly in two categories: “business capabilities teams” and “agile operations teams”. Data Architects & Engineers can coordinate, learn and implement tasks like cloud-based (IaaS,PaaS,SaaS) configs, containers, micro-services deployment & virtualisation. However, DataOps is a new platform allowing continuous data-flow within the enterprise.
  • Cloud Architects : Technology Specialists who usually take up consulting roles (charge by hours like their cloud services). Again if your Data engineer is familiar with cloud concepts or a certified associate/professional, you may not hire them.
  • Project & Delivery Managers – Some data science & analytics firms still have to bend to old norms of Agile & Scrum methodologies. Before they start consulting clients to orchestrate sales of their products & services, they need experienced managers to ensure PoC (proof-of-concept) timelines & resources are well-allocated.
  • Network & Cyber Security engineer : Often seen as internal teams but amongst all the above mentions, they will soon be an integral part of the data-driven teams. With data security already showing menacing-concerns in 2018, these roles have been realised “critical” as most companies operate daily with online presence.

Parting Thoughts

Certainly on the tool front, the technology is becoming more accessible and intuitive than ever before. There are an array of adaptors for instance in most cleansing, modelling, reporting & visualisation tools meaning loading data is itself no longer a hugely significant requirement. However this has also encouraged a somewhat ubiquitous view of data – it should just work with minimal effort. There is an ominous risk that less and less time will be dedicated in getting the fundamentals right.

Tech & Industries to watch out in 2018-19:

  • Progressive Web Apps (PWAs) – A mixture of a mobile and web apps.
  • Blockchain & Fintech- Metamodel building,reliable trading & credit scoring.
  • Healthcare Technology – Diagnosis by Medical Imaging (Computer vision & ML).
  • AR/VR – Sport Analysis, Business Cards (Image Tracking), Techno eSports (Hado).
  • AI Speech Assistants, smarter Chat-bot integrations.
  • Smart Supply Chain – Digital twins (IoT Sensors).
  • 5G – Big data, Mobile cloud computing, scalable IoT & network virtualisation (NFV).
  • 3D Printing – Prefabrication efficiency, Defect detection, PredictiveML maintenance.
  • Dark Data – Information that is yet to become available in digital format.
  • Quantum Computing – Cutting data processing times into fractions.

Finally, On the job front, its evident the roles won’t be able to keep with the dynamics of technologies. Landing that next opportunity will be difficult. As per many job advisors, there are binary ways to keep that job security intact: Be an expert in one domain affirming a stance within a stable company or seek challenging roles by identifying newer domains aligned with tech-trends. As a Data Engineer, I follow a hybrid approach — maintaining a learning discipline between professional career & personal ambitions — practically allowing me to work in any tech-driven industry. If there’s any consolation, I surely know that i’m responsible for my success & failures in the future.

Don’t ever let someone tell you that you can’t do something. You got a dream, you gotta protect it. People can’t do something themselves, they wanna tell you that you can’t do it. You want something, go get it. Period.

—  The Pursuit of Happyness





The Evolution of Analytics with Data


We have made a tremendous progress in the field of Information & Technology in recent times. Some of the revolutionary feats achieved in the tech-ecosystem are really worth commendable. Data and Analytics have been the most commonly-used words in the last decade or two. As such, it’s important to know why they are inter-related, what roles in the market are currently evolving and how they are reshaping businesses.

Technology ,often regarded as a boon to those already aware of its potential, can also be a curse to audiences who can’t keep up with it’s rapid growth. Each era has had it’s moments of breakthrough and an equal share of victims (or as i’d like to call them collateral damage). As of today, every monetary-driven industry completely relies on Data and Analytics for their survival.

This blog is an attempt to look over these different stages ; simplifying the various buzzwords, narrating the scenarios which were never explained and keeping an eye on the road that lies ahead. So, without further ado , Grab your “cheat-day” meal & lets take a walk down the memory lane.

Analytics 1.0   →  Need for Business Intelligence : This was the uprising of Data warehouse where customer (Business) and production processes (Transactions) were centralised into one huge repository like eCDW (Enterprise Consolidated Data Warehouse) . A real progress was established in gaining an objective, deep understanding of important business phenomena – thereby giving managers the fact-based comprehension to go beyond intuition when making decisions.

The data surrounding eCDW was captured , transformed , queried using ETL & BI tools. The type of analytics exploited during this phase were mainly classified as Descriptive (what happened) and Diagnostic (why something happened).

However , The main limitations observed during this era was that the potential capabilities of data were only utilised within organisations , i.e. , the business intelligence activities addressed only what had happened in the past and offered no predictions about it’s trends in the future.

Analytics 2.0   →  Big Data :  The certain drawbacks of the previous era became more prominent by the day as companies stepped out of their comfort-zone and began their pursuit for a wider (if not better) approach towards attaining a sophisticated form of analytics. Customers surprisingly reacted well to this new strategy and demanded information from external sources (clickstreams , social media , internet , public initiatives etc) . The need for powerful new tools and the opportunity to profit by providing them – quickly became apparent. Inevitably , the term ‘Big data’ was coined to distinguish from small data as it was not generated purely by a firm’s internal transaction systems.

What companies expected from their employees was to help engineer platforms to handle large volumes of data with a fast-processing engine . What they didn’t expect – was a huge response from an emerging group of individuals or what is today better known as the “Open Source Community”. This was the hallmark of Analytics 2.0.

With the unprecedented backing of the community , Roles like Big-Data Engineers , Hadoop Administrators grew upon the job-sector and were now critical to every IT organisation. Tech-firms rushed to build new frameworks that were not only capable of ingesting , transforming and processing big-data around eCDW/Data Lakes but also integrating Predictive (what is likely to happen) analytics above it. This uses the findings of descriptive and diagnostic analytics to detect tendencies, clusters and exceptions, and to predict future trends, which makes it a valuable tool for forecasting.

In today’s tech-ecosystem , I personally think the term big-data has been used, misused & abused on many occasions. So technically, ‘big data’ now really means ‘all data’ — or just Data.

Analytics 3.0  →  Data Enriched Offerings :  The pioneering big data firms began investing in analytics to support customer-facing products, services, and features. They attracted viewers to their websites through better search algorithms, recommendations , suggestions for products to buy, and highly targeted ads, all driven by analytics rooted in enormous amounts of data. The outbreak of the Big-Data phenomena spread like a virus, so now it’s not just tech-firms and online companies that can create products and services from analysis of data. It’s practically every firm in every industry.

On the other hand, the wide-acceptance for big-data technologies had a mixed impact . While the tech-savvy giants forged ahead by making more money, a majority of other enterprises & non-tech firms suffered miserably at the expense of not-knowing about the data. As a result, a field of study Data Science was introduced which used scientific methods, exploratory processes, algorithms and systems to extract knowledge and insights from data in various forms.

Indeed, an interdisciplinary field defined as a “concept to unify statistics, data analysis, machine learning and their related methods” in order to “understand and analyse actual phenomena” with data. In other words , a well-refined data complemented with good training models would yield in better prediction results. The next-generation of quantitative analysts were called data scientists, who possessed both computational and analytical skills.

The tech-industry exploded with the benefits of implementing Data Science techniques and leveraged the full power of predictive & prescriptive (what action to take) analytics ,i.e, eliminate a future problem or take full advantage of a promising trend. Companies began competing on analytics not only in the traditional sense – by improving internal business decisions – but also by creating more valuable products and services. This is the essence of Analytics 3.0.

There has been a paradigm shift in how analytics are used today. Companies are scaling at a speed beyond imagination, identifying disruptive services, encouraging more R&D divisions – many of which are strategic in nature. This requires new organisational structure : positions, priorities and capabilities. A closely-knit team of data-driven roles ( Data Scientists , Data Engineers , Solution Architects , Chief Analysts ) when brought under the same roof, is a guaranteed-recipe for achieving success.

Analytics 4.0  →   Automated Capabilities : 

There have always been four types of analytics: descriptive, which reports on the past; diagnostic, which uses the data of the past to study the present; predictive, which uses insights based on past data to predict the future; and prescriptive, which uses models to specify optimal behaviours and actions. Although , Analytics 3.0 includes all of the above types in a broad sense, it however emphasises on the last . And it introduces — typically on a small scale — the idea of automated analytics.

Analytics 3.0 provides an opportunity to scale decision-making processes to industrial strength. Creating many more models through machine learning can let an organisation become much more granular and precise in its predictions. Having said that ,the cost & time for deploying such customised models wasn’t entirely affordable and summoned for a cheaper or faster approach.  The need for automation through intelligent-systems finally arrived and this idea (deemed as beyond-reach) that loomed on the horizon is where Analytics 4.0 came into existence .

There is no doubt that the use of artificial intelligence, machine learning, deep learning is going to profoundly change knowledge work. We have already seen their innovative capabilities in the form of Neural Machine Translation , Smart Reply , Chat-bots , Meeting Assistants etc ,which will be extensively used for the next couple of years. The data involved here originated from vast heterogenous sources consisting of indigenous types — one that requires complex training methods — and especially one that can sustain (make recommendations, improve decision-making, take appropriate actions) in itself.  Employing data-mining techniques and machine learning algorithms along with the existing descriptive-predictive-prescriptive analytics — comes to full fruition in this era. One reason why Automated Analytics is seen as the next stage in analytic maturity.

Analytics 5.0  →   Future of Analytics and Whats Next ???  : 

Analytics 4.0 is filled with the promise of a utopian society run by machines and managed by peace-loving managers and technologists. We could reframe the threat of automation as an opportunity for augmentation — combining smart humans and smart machines to achieve an overall better result.

Now, instead of pondering “What tasks currently employed by humans will soon be replaced by machines?” I’d rather optimistically question “What newly feats can companies achieve if they had better-thinking machines to assist them? or How can we prevent death tolls in a calamity-prone area with improved evacuation AI routine or Why can’t AI-driven e-schools be implemented in poverty-ridden zones ?”

Most organisations that are exploring “cognitive” technologies—smart machines that automate aspects of decision-making processes—are just putting a toe in the water. They’re doing a pilot to explore the technology. While others are working on the concept of building a Consumer-AI-Controlled platform. Personal AI agents that can communicate with other AI services or so called bots to get the job done. No more manual interventions with an AI-powered framework to steer your personal day-to-day activities.

I wouldn’t be surprised to see either of these technologies making giant leaps in the future. Surely, there’s an element of uncertainty tied to them but unlike many, I’m rather very optimistic about the intent. There’s always something waiting at the end of the road. If you’re not willing to see what it is, you probably shouldn’t be out there in the first place.

“Everything should be made as simple as possible , but not simpler”

                                                                                          Albert Einstein