I’m continually surprised by the responsibilities and titles of new roles emerging within the ‘data profession’. Admittedly, this is fairly a nebulous concept and I suspect there are a variety of opinions amongst practitioners as to what the composition of this space looks like. However there are certain trends within this area that practitioners would also agree on. Data is being taken more seriously by organisations than ever before with comparable growth seen in terms of dedicated ‘data people’, investment and technology.
For the sake of convenience & readability, I would like to go over data roles briefly categorised by tech-revolutions — those that influenced a substantial change — and especially ones that will keep evolving in future. In addition, i wrote a piece on Evolution of Analytics with Data recently, that helps gather a better context for this article.
As an amateur blogger, this is clearly a perspective and could be a long-read for them drowsy eyes. A word of advice: grab a cup of Coffee.
Business Intelligence Roles
Quite rightly so, ‘BI’ doesn’t qualify to compete with the trendy buzzonyms around the tech-ecosystem in 2018 and isn’t pleasing to the ears of our data-savvy generation. Are ETL tools & strategies no more in use ? Is the scope of BI overshadowed by the vast application of big-data & data science methodologies ?? — Hell NO !!
How traditional BI roles were structured in-accordance with the business model of the organisation. Source: Microsoft Technet Wiki
Business Intelligence has seen a considerable decline in the last year or two. However, I wouldn’t go so far to call BI dead as it’s application is very critical to major businesses. Roles like BI Analysts, Data Architects, ETL Developer, DW Engineer, BIDW Admins would only become more crucial, emphasising an extra eye on market-leading tools & technology over the jack-of-all-trades roles in present domains.
Scope of Business Intelligence techniques employed in 2018. Source:Check out infographics & vector designs on DepositPhotos
According to a recent Wisdom of Crowds® Business Intelligence Market Study, BI will continue to provide competent job salaries and dominate certain areas in the market. Here are some of it’s key numeric take-aways in 2018:
- Executive Management, Operations, & Sales: 3 areas driving BI adoption.
- Dashboards, reporting, end-user self-service, advanced visualisation, and data warehousing: 5 technologies and initiatives strategic to BI.
- Small organisations up-to 100 employees have highest rate of BI penetration.
- 50% of vendors offer perpetual on-premises licensing and cloud subscriptions .
- Fewer than 15% of respondent organisations have a Chief Data Officer.
In case if you still have a difference of opinion, i recommend you the read the full-post: The State of Business Intelligence, 2018
Big Data & Data Science Roles
Before we take a deep-dive into the current roles, let’s take a step-back to understand how and where it all started. My idea is to demonstrate these roles with a storytelling narrative over the traditional plaintext definitions — the latter being easily accessible around Internet. Additionally, every new-wave in the industry gives birth to confusing buzzwords, false renditions & surrealistic stipulations (which is a mouthful to say the least).
‘Big data’ was coined to distinguish from small data as it was not generated purely by the firm’s transaction systems. It also stated that predictive analytics offered better data trends in contrary to the fact-based comprehension to go beyond intuition when making decisions. If dimensions & analytics weren’t justifying enough, this phase welcomed the use of a community-driven “Open Source” tools over the highly priced licences.
I usually refrain from citing names of tools in my posts, but it’s fairly impossible to describe this revolution without mentioning Apache Hadoop. The technology-stack & extensible projects, the functional programming paradigms (scalable, concurrent & distributed systems), the rise of noSQL DB systems, job scheduling & cluster resource management, the changing aspects of Drag-n-Drop ETL and better data modelling techniques — all of which was brought together by Hadoop, but it ultimately emphasised on the last — code is the best abstraction for software. And, it introduced — typically on a broad sense — an idea of having custom architecture ready for future integrations with Data Science & Machine Learning.
From the developers’ perspective, What this meant is you don’t necessarily have to be working for tech big-guns to develop new disruptive projects. You had the backing of a community at your disposal and emerging collaboration platforms like Github to showcase your work.
Hierarchy of roles in Big Data & Analytics-driven companies.
From an organisational view, Software Engineers (java developers), DW engineers (BI/ETL developers, Data architects), Infra Admins (DBAs, Linux SAs) explored fancier titles as Big-Data Engineer, Hadoop Developers, Hadoop Architects, Big-Data Support Engineers began to flourish in the job-market. BI-roles fell down the pecking order and the years where line of business users and data-personnel using the same tools, were nothing but over.
BI roles gradually moving out of the circle of Big Data teams. Source: DataFlair
At an industrial level, it had the most impact — as it’s not just tech-firms and online companies that can create products and services from big-data analytics — It’s practically every firm in the industry.
The tech-industry suddenly got divided due to the rising demand of employing Big data with Data Science strategies. As such, the field-roles were classified into three buckets : Software Engineering (Strong programming with Front & Back-end engineers, Web developers, Infra-admins, Middleware specialists, iOS/Android developers), Data Engineering (Strong Data background like ETL developers, DWH architects, BI analysts, Hadoop engineers, DBAs) and welcomed a third set of individuals deemed as the next-generation quantitative analysts (possessing both computational & analytical skills), who specialised in a growing field of study: Data Science.
Venn Diagram showing tools & techniques under SE vs DE vs DS domains. Source: Ryan Swanstrom, Data Science 101
According to me, this classification yielded in a significant transition with the positives best-leveraged by small-scale firms (< 50 employees) like emerging startups, research-facilities as well as large-scale enterprises (> 1000 employees) like telecom, e-commerce, social media etc. Startups had the liberty of combining multiple-roles into one and encouraging multi-disciplinary growth opportunities, while the mainstream giants had no trouble in employing distinct roles across different departments, thereby adding areas of generating more business.
Entrepreneurs with now a medium-sized (or SMBs) company, who were striving to gain commercial reckoning — competing with the big-players in their respective market — were arguably affected the most. The initial success — through series funding rounds or backed by venture capitalist investments — allowed them to grow larger in numbers (50-300+ employees). They rushed into indefinite-hires, redundant roles, poor decision-making strategies. Eventually, the constant pressure to stay in the market under quarterly-timelines enforced unprecedented lay-offs, stock-distribution losses and even resulted liquidation at an early stage. Some tech-savvy investors (whom i’d like to refer as guardian-angels) offered M&A assistance, but the industry saw the downside of absorbing roles for the first time.
Meanwhile, it wasn’t just companies having a hard-time with evolving data-roles. This era saw an uprising number of data science enthusiasts (both Academic & Experienced) coming out of their comfort-caves & expanding their skill-set. And Why not, each of these applicants (Mathematicians, Phd Doctorates, Analysts) had every right to apply for one of the finest-paid jobs of the 21st century. Along came esteemed-university professors & philanthropists, with their versions of the ideal-candidature, but that didn’t stop the mob.
Titles with Data Prefixes helped make early distinctions between roles with similar line of tasks. The intent was aimed at identifying skill-coverage and harnessing the right-potential. Data Analysts shied away from business and drove their eyes onto statistics & engineering while Data Architects kept their depth-focus on publishing models (not to be confused with ML), database design, governance with their trademark politically-neutral attitude.
Radar chart explaining overlap of skills between Data-driven roles. Ignore "Mad Skillz" as it implies "Natural Abilities". Source:edX
Businesses started to gather more understanding by nurturing capabilities of Prescriptive Analytics with Machine Learning around their premise. They began competing on analytics not only in the traditional sense – by improving internal business decisions – but also by creating more valuable products and services. The sheer need (or greed) to attain concrete goals — improved results than last quarter — proportionally showered an overhead of roles and responsibilities. As such, a promising yet challenging position as the likes of a Data Scientist, also beckoned for a central figure across teams — the daily go-to person for anything related to data. Not a lot has been spoken about the stress, fatigue of many a such burdened individuals. If a person of such calibre invested a majority of their time on analysing, they also managed to find time to pursue better opportunities for themselves. Here’s a satirical treat on KDnuggets supporting my claim.
Two big questions came into light: Is Data-Science the next bubble ? My answer: NO, but the “Data Scientist” title was arguably becoming one. A textbook demand-and-supply problem —where every aspirant wants a fair share of goods & commodities, but only a few proved worthy of claiming it. Hmm, a bit confusing ?. How do you deal with a fresh graduate applying for this role or what do you do when your data scientist is likely to leave, and you’re left with a pack of “self-proclaimed” ones knocking on your door.
Secondly, With data accessed directly from sources like websites, APIs, social media or internet; the need for software programming languages & the prowess to do so with fast efficiency — couldn’t be compromised .”Not all data scientists held great software foundations” or “Why were software engineering concepts ignored, amidst all the buzz for Data Science ?”. Companies soon realised that only a role reallocation can normalise such inclinations as they looked onto broader engineers— to heavily support their data scientists and find that equilibrium amongst different entity roles.
Software engineers, who appeared to have a knack for data science & machine learning , stepped-up to help with this dilemma and strengthened the data engineer club. While those practising core web-programming & stack-driven ambitions moved onto bigger challenges: Full-Stack Engineer.
Full-Stack by past roles (left) & by tech-stack areas (right).
A win-win situation : data scientists got a reliable sidekick with a sigh-of-relief (the inflated hype for their ‘crown’ lowered) and an equally-competent role on the horizon to challenge them. The collusion not only sent those-craving-enthusiasts spinning but also opened another door, making data engineering one of the most sophisticated disciplines today. This modern-day Data Engineer complements every other role, a must-have handyman in every firm and are practically the first-hires in startups these days.
An Infographic-take on Data Engineers and Data Scientists. Source: Read Full Post on DE vs DS, by Karlijn Willems
The gamble (workaround play that clicked) by balancing mutually distinct roles paid off perfectly but the tech-industry knew they couldn’t afford another setback and had to be prepared with the increasing acceptance of Artificial Intelligence looming around the corner.
Inevitably, companies identified the flaws in their organisational-structure: positions, priorities and capabilities — and incepted Data-Driven Teams. The prime focus being on role-distinctions, division of labour, avoiding task conflicts, proper rules of collaboration. An extended example of role-based leaders pioneering respective units inside such a team would be : Principal Data Scientist & Engineering Lead.
An early look of a well-structured Data Science team under the same roof. Source: DataCamp Blog Community
Today, A perfect data-science team is a myth or otherwise an engaging subject of heated debate. What companies expect from their teams is to assemble as a group of superheroes (The Avengers) — What they fail miserably on occasions is to appoint a person who provides such teams with a context (Nick Fury). This is where Chief Data Officers come into powerful existence. With data becoming an integral business strategy, CDOs are becoming a more critical role in an organisation. In a Forbes survey, more than 50% of CDOs will likely report directly to the CEO in 2018. They’re bound to take on more active roles in shaping their businesses’ initiatives.
I often get disappointed upon seeing job-descriptions containing “Advanced English Skills” or “Native candidates only”. So, I proactively question (or troll) such job-posters every single time (I do enjoy their apparent pause). Language shouldn’t be deemed as a barrier, rather be utilised as a formidable source of unifying teams. The best example in 2018 to make my stance clear is indeed a language in itself: Python. Founders (CEOs & CDOs) must trickle these little communications within their teams and most importantly — their first focal point — the Talent Requisition team.
How Python brings a team of diversified role-types together. Source: ActiveWizards
These days HR coordinators, recruiters, outsourcing head-hunters all have access to ample data resources (Medium, Datacamp) & data-friendly platforms (LinkedIn Recruiter, Glassdoor) to refine their search for an improved hiring; thereby making their roles even data-driven.
Machine Learning & AI-driven Roles
Perhaps the most compelling aspect about Machine Learning is its seemingly limitless applicability. There are already so many fields being impacted by ML and now AI, including Education, Finance, and more. Machine Learning techniques are already being applied to critical areas within the Healthcare sphere, impacting everything from care variation reduction efforts to medical scan analysis
There are a number of companies for whom their data (or their data analysis platform) is their product. In this case, the data analysis or machine learning going on can be pretty intense. This is probably the ideal situation for someone who has a formal mathematics, statistics, or physics background and is hoping to continue down a more academic path.
“Machine Learning Engineers often focus more on producing great data-driven products than they do answering operational questions for a company.”
New addition to the DataScience team working on ML. Source:Udacity
Companies have become more encouraging and are constantly on the lookout for Machine Learning Engineers : open-minded candidates for ranging from all age-groups (Academic Interns to Research Scientists). The social media generation also have a far more appreciation than before as seen on LinkedIn, Medium, Github.
Bird's-Eye view of multiple ML-roles in AI firms. Source:Udacity
AI-driven companies successfully implementing intelligent machines (like Chatbots) are already a step-ahead than others. Roles organised by software, applied & core is a clear indication — they’re serious about their product developments & service offerings. Since there isn’t any generalisation on profile & seniority today, they’re in full liberty to improvise AI-titles in the future.
There are many roles that complement data-driven teams on a day-to-day basis. They are a must-have in organisation irrespective of the teams they belong to. You’d probably wonder why i didn’t mention them earlier. Honestly, I was skeptical for reasons below:
- I have limited expertise on these profiles and their scope.
- They are not primarily seen under the category of data-driven roles.
- Their domain versatility allows them to operate across different teams.
Let me try to explain before the knife-wielding mob gets here.
- Graphic Designers : The Creative Heads in every sense. A complete package of art, science, programming, ideas and imagination with endless capabilities. They add value with their vocal-presence & fearless attitude. My personal favourites.
- Decision-Makers : A role often misconstrued and overlooked. Especially in domain-specific startups, Before hiring that PhD-trained data scientist, make sure you have a decision-maker who understands the art & science of decision-making.
- DevOps & Site-Reliability Engineers : Broadly in two categories: “business capabilities teams” and “agile operations teams”. Data Architects & Engineers can coordinate, learn and implement tasks like cloud-based (IaaS,PaaS,SaaS) configs, containers, micro-services deployment & virtualisation. However, DataOps is a new platform allowing continuous data-flow within the enterprise.
- Cloud Architects : Technology Specialists who usually take up consulting roles (charge by hours like their cloud services). Again if your Data engineer is familiar with cloud concepts or a certified associate/professional, you may not hire them.
- Project & Delivery Managers – Some data science & analytics firms still have to bend to old norms of Agile & Scrum methodologies. Before they start consulting clients to orchestrate sales of their products & services, they need experienced managers to ensure PoC (proof-of-concept) timelines & resources are well-allocated.
- Network & Cyber Security engineer : Often seen as internal teams but amongst all the above mentions, they will soon be an integral part of the data-driven teams. With data security already showing menacing-concerns in 2018, these roles have been realised “critical” as most companies operate daily with online presence.
Certainly on the tool front, the technology is becoming more accessible and intuitive than ever before. There are an array of adaptors for instance in most cleansing, modelling, reporting & visualisation tools meaning loading data is itself no longer a hugely significant requirement. However this has also encouraged a somewhat ubiquitous view of data – it should just work with minimal effort. There is an ominous risk that less and less time will be dedicated in getting the fundamentals right.
Tech & Industries to watch out in 2018-19:
- Progressive Web Apps (PWAs) – A mixture of a mobile and web apps.
- Blockchain & Fintech- Metamodel building,reliable trading & credit scoring.
- Healthcare Technology – Diagnosis by Medical Imaging (Computer vision & ML).
- AR/VR – Sport Analysis, Business Cards (Image Tracking), Techno eSports (Hado).
- AI Speech Assistants, smarter Chat-bot integrations.
- Smart Supply Chain – Digital twins (IoT Sensors).
- 5G – Big data, Mobile cloud computing, scalable IoT & network virtualisation (NFV).
- 3D Printing – Prefabrication efficiency, Defect detection, PredictiveML maintenance.
- Dark Data – Information that is yet to become available in digital format.
- Quantum Computing – Cutting data processing times into fractions.
Finally, On the job front, its evident the roles won’t be able to keep with the dynamics of technologies. Landing that next opportunity will be difficult. As per many job advisors, there are binary ways to keep that job security intact: Be an expert in one domain affirming a stance within a stable company or seek challenging roles by identifying newer domains aligned with tech-trends. As a Data Engineer, I follow a hybrid approach — maintaining a learning discipline between professional career & personal ambitions — practically allowing me to work in any tech-driven industry. If there’s any consolation, I surely know that i’m responsible for my success & failures in the future.
Don’t ever let someone tell you that you can’t do something. You got a dream, you gotta protect it. People can’t do something themselves, they wanna tell you that you can’t do it. You want something, go get it. Period.
— The Pursuit of Happyness