Creating a Data Engineering Culture | Big Data Institute

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] so let's talk about data engineering culture so one of the things I want you to understand as you've went through and as you've sat through some of these things some of these talks is that their data engineering culture is oftentimes just implicit it's assumed they don't really say hey my data engineer did this for me they just said yes this is what data engineers do and so when I talked to new companies people who are companies who are either just starting out in their big data journey or something like that they don't realize that there's this implicit assumption of data engineers doing all this so I've kind of set out to say here's how we create this data engineering culture so that we start doing that and we start working with that so what we're gonna do is we're going to talk about what data engineering cultures are I'll tell you a few stories so you can kind of understand it and under and understand what's happening there and then we're going to talk about how should you create your own data gene engineering culture and finally what are some common reasons for failure this is actually important because there's some reasons why we actually want to talk about failure with big data and that is because 85% of Big Data projects fail I'm gonna take a wild guess from looking at your faces that you didn't know that but here's the issue it has not talked about quite frankly the the normal issue when or the normal reason that people when a Big Data project fails is they say well the tech didn't work well here's the thing the tech actually works really well sometimes there there are no limitations there are known issues with the tech but when I talk to the teams and I talked to the people they didn't hit any of them but they yet they blame the technology so what is the issue why are these projects failing and that was something I actually set out personally this is my personal exotic exotic adventure to go through and figure out all right why are these teams failing so often and let me tell you the opposite side you sit south through this conference and you've seen a lot of people do some really cool things and I completely agree those 15 percent of projects that actually get through this they do some really really cool things they can create some incredible value but there's this whole issue of 85% of these projects failing so that should really give you some pause I can't remember if gambling Fleagle here but where I'm from gambling is legal you have better odds of putting it all on black doing some blackjack put it doing rouler wheel rolls and getting better ROI in some cases so what I'm gonna try and do in this and a pretty brief period is share the things that you need to know so that you don't fail so let's talk about what that data engineering culture is dating this is a culture where the value and importance this is key value and importance of data engineering is recognized at all levels and this may be something that that all levels us another important part all levels means that at the executive level all the way down to the individual contributor level if this is not recognized for example if you have a a VP of engineering VP or C XO something like that CTO who doesn't recognize that value then they'll when the axe comes or when they start to fire people they'll say data engineering team don't need it gone this is part of that value the recognition of value so it has to be an organization-wide realization organization-wide of what data science and data and big data they require data engineering at some point I'm going to create this visualization have you ever heard of Atlas holding up the world there's the or the sky depending on which nephew read so Atlas was there he was there holding up the sky or the world and so what I think I was data engineering is that Atlas holding up the data science that is the world most teams most executives are focused on that world they're focused on how do I get that data science how do I get this output when the reality is they should be focused on Atlas and saying how do I get Atlas then he'll hold up the world where get those data scientists those data scientists will be able to do this one of the manifestations of whether this is correct or not is the right ratio of data scientist to data engineers so one wrong manifestation is basically zero or a one-to-one ratio that's usually an incorrect ratio because that means that your data scientists are doing way more data engineering than they should be or if it's a 1 to 0 ratio in other words there is one data scientist and and zero data engineers there's a big problem and we'll talk about some of those problems generally you want to be in the 2 to 5 ratio so for every data scientist you would have 2 to 5 data engineers so let's take a look at another visualization of this this is what I like to share with management and I forgot to ask how many of you are managers in this room out of curiosity got it oh wow a decent number of you how many of you our team leads in decent number of you as well ok good so here is a breakdown of where I see failures happen so they're kind of going from left to right you see team creation team creation the vast majority of failure is going to happen before that release so there that first release there in the middle that's when we've said hey this team is probably done right and this team is probably correctly assembled correctly staff etc etc but up to that point that means that the team the management failed to create the right team so I can't stress this enough early production those 85% of teams failing of projects failing that was specifically due to management failures of not getting the right team and we'll talk about what the right team is in a second but this is kind of one I shared to help people understand it's only after you've released your first code your second release your third release your aunt release then the management issues go away and now you have a team that you back off and you say ok I've created the right team let me back off and let them be successful up to that point it is up to management to make sure that they have the right team and the right people so why would you want to go through that effort of creating a a date engineering culture this is because teams may may not actually fail they may simply vastly underperform in fact some of the clients I've worked with they were vastly underperforming where in their minds they were thinking hey we were doing really cool things with data science the reality was that they were like they were writing their tryke as it were and their their tricycle the three wheels you know they were there on that little trike they were going very slowly but they were moving when they could have been in that Ferrari but they were limited they were limited by their ability to create the data engineering needed to create the data science so what was happening at some of those companies is they would get to this they would hit the wall they would hit this wall of this is a difficult data engineering problem and they would just stop and that's the issue that's what we'll underperforming will mean is they'll-they'll get to the difficult part of data engineering and stop and they can get somewhat there so conferences if you is you've been to conferences what they'll kind of say is that they don't really talk about their data engineering team they don't have like credits at the end of their conference talk saying and here's my data engineer and here's a data engineering manager they don't call it off call it out but what sometimes the leobens kind of mentioned it as an aside and say I was helped out by the little people in the data engineering community so definitely something that we need to have it's kind of an implicit it's kind of assumed so let's talk about some some brief stories here one of those is one of my clients was actually hitting a an issue where they were stopping so I went through I interviewed a bunch of their data scientists and started to ask questions so what happens what happens what happens tell me more about what you're trying to do and one of those was very interesting it was they they thought that they were the business I thought that they were really doing well and they weren't talking to the data scientists when I talked to the data scientist they said oh yeah we'd get this really cool thing that we were trying to do but we would hit this wall of what we couldn't do we couldn't do the data engineering anymore the system just became too complex so here let's kind of unpack that a little bit too complex for a data scientist quite honestly most data scientists are beginner programmers don't take that personally most of you are frankly beginner programmers so you want to compare that with a data engineer who should be a senior level programmer maybe junior but mostly senior level and engineer software engineer this is down the line this is exactly what they're doing this is what they do on a daily basis it is well within their ability to do those things and that's what's really key and important here data scientists are not programmers they program they've learned a program after a fashion they can create these data engineering things after a fashion they are not data engineers conversely you're not going to expect the data engineer to create a machine learning model this is they're two different people and this is really key or one of the big thrust of this talk I want you to understand so when data scientists do data engineering work it isn't just here it it's gonna take me longer it's going to take difficult longer and what's worse they're going to quit if you didn't know this already they will quit if you're losing data scientists right now you may want to go back and look and see are they leaving because they're tired of doing data engineering work so kind of what we were talking about that the personas that what people are doing a data engineer loves to do this sort of thing this is what they've been doing throughout their career if you start forcing data scientists to do this they will quit they will quit after one to six months maybe a little bit different for Europe but in the u.s. they actually talk and I have these conversations with data scientists they say I'm tired of doing this I give it one or two more months and I'm gone that this may be different for your for Europe but this is exactly what may be happening you just make have a lead time so why why should you do this it's because once you have your data science and your data engineering right you're going to really accelerate things this is what I've helped my clients do is now they don't really think about can my data engineering do this can I really do this it's accelerated to the point where now they're really working in unison and now their data engineering team their data infrastructure is no longer the bottleneck what a bottleneck means is often when whenever the data scientists thought about doing something maybe those of you are who are data scientists in the room you're sitting there thinking about I'd love to do X but then you kind of stunt in your mind you're thinking but that will be difficult and that will be difficult and it's not the it's not the machine learning it's not the neural network that's making it difficult that's thinking well how do I even do that in SPARC and how do I get that data right and how do I do this that's what we as data engineers should be removing this is the symbiotic relationship between data scientists and data engineers so how would we go about creating this engineering data engineering culture so data engineer you may I'm not sure how many of you read over ailee's data blog i write for my blog or i also write a lot for a riley's data blog and in there I create I've set out to create definitions and these definitions aren't to put people in boxes it's much more to give a definition for enterprises for larger organizations to say here is what a data engineer is here is what a data scientist is so they can understand this data scientist I'm not going to put them in front of the keyboard to start program so my 1 cent sentence definition of a data engineer is that a data engineer is someone who has specialized your skills in creating software they're a programmer around big data basically they've focused in on big data for their software skills data scientists they are doing applied mathematics and then there's another one DBA so I kind of group a lot of titles and that are there together and that use sequels specifically into the term DBA so those could be ETL developers though could those can be sequel developers you may have heard them be several different titles I kind of group them all together and there's an importance to this because sometimes teams sometimes people will think our managers will think that the data warehousing team is your data engineering team that is wrong there are two very different skill sets and let's unpack that a little bit AEDPA does not know how to program they can only create sequel that's important because you cannot create in my opinion data pipelines solely with sequel you know count yourself lucky if you can I have not hit very many people doing that it is very very rare that people can do that so you heard that they were talking about flink on sequel yesterday and you know that there's other ones the key here is that a data engineer will be able to switch between them they will would know that something is better with sequel versus something is better written in code and they can choose between the two if they only know sequel they will use the worst tool for the job when it's something else is better very very important there so what sorts of skills are needed this is just kind of a laundry list I've written a book about this that you'll see a link in a second but you on every single data engineering team there should be a person who understands distributed systems a person who understands programming an analysis analysis this is not the level that a data scientist needs this is analysis at a perhaps doing counts very doing some pretty rudimentary analysis usually it's so they can keep it track of what's happening keep some understanding of the data their visual communication they need to be able to visually communicate what the data is saying they also need verbal communication this one's important because I need to work with you as a data scientist so I see a lot of data scientists in the room and what what's going to happen is the remember that symbiotic relationship that I was talking about if the two can't become symbiotic because they can't communicate that's a problem so your data engineering your data scientists will need to have very good verbal communication skills between the two of them until you do that you will have that disconnect and that disconnect won't ever go away and you'll think why it's because you don't have a good verbal communication between them there are a few other ones that are actually kind of weird here a project veteran you heard me say they talk about how data scientists are new programmers well a lot of people are new within data engineering they don't have any purpose if ik production experienced on getting this these systems out so what's corn and key and important there is that then you need somebody on your team to call out and say hey you are going to hit a wall at this and hey you are going to do this a year from now only a veteran on your team can tell you that though some of the worst designs I've seen are from very new people to big data and they will cause problems don't and these problems aren't just okay team look we're gonna have to spend a week and pay down some technical debt I've worked with teams where it's been a year they have went eight created Doug's such a hole for themselves that they had to spend an entire year to get themselves out of it so significantly better to avoid that sort of issue they're finally schema we're gonna talk about this in a second you schema is very very important you can't you need to lay your data out correctly in a schema evolutionary way we'll talk about this in a bit of a second and we also need domain knowledge some of your teeth some of your companies it may not be as important but if you're in something like finance let's say it's very very key and important to understand that domain and until your data engineering team understands the domain they won't be able to create the data infrastructure correctly so now let's talk about some common reasons for failure and I'll have to be go through these relatively quick but one of the biggest reasons for for failure and I apologize if any of you are either watching this or are a DBA yourself or identify as the DBA but if you have a team that is all DBAs than you're calling yourself a data engineering team you are unfortunately and doing big data you're unfortunately correct incorrect I should say the issue there is that you more than likely will fail with big data and I have the data and experience to back this up now that if you only have sequel behind you and only a sequel ability you will fail in your in your task of trying to do this so a few things there's links there there's minified links there one of those is an ability gap the issue there is that in in my work and research with teams it is not simply a matter of time or effort that a DBA needs to learn Big Data and to create these data engineering systems it is a nyeon and possible sort of thing it is an ability gap that it's not a skills gap so please do do think of that if the data warehouse team is not the team that is your Big Data team you need they need to be software engineers another one is that there's common one is that they're set up for failure what happens there is that the team will be the company will be circling circling the drain what that means is they're about to go under they were out to go bankrupt and they'll say hey I heard about that big data thing it's gonna save the company that's I've I've hit that before I've had them ones where the the VP the cxo the CTO says Oh big data that's just going to magically going to make us more revenue well maybe but having these unrealistic expectations and desires sets a team up for failure also unrealistic time frames is incredibly terrible one thing to know is that it takes up to and and possibly even more than six months for a team to just feel comfortable with big data this is actually key and important if you are embarking on a big data project and you're saying we're gonna be hundred percent of efficient proficient from day one I'm sorry you're not going to be I've taught way too many teams and you may think but I'm smart I've taught a lot of smart people this isn't a simply an issue of being smart it is the sheer complexity so in that that link up there that complexity I wrote a post for O'Reilly called on complexity and big data in it I argue that big data is ten times more complex than small data please do understand this and internalize that so next one is no one understands schema so you remember how I was talking about all the DBAs being all DBAs being a recipe for failure conversely having no DBAs or no one who understands schema being another type of failure and it's a failure you're not going to hit from day one it's going to be a failure you actually hit later on in that cycle where the person who understands schema on a team is usually not the software engineers when I teach and I work with a team I actually have a very consistent question that I asked them and generally it is a question that is only answered by DBAs and the the thing about that is that the DBAs have spent their entire time and their entire careers being the person who is there about schema they know the schema they know what things should be look like and they know what how things should be lated so in in a very clear sense it's actually important to have somebody who understands that schema on the team because they are going to fight for certain things that a software engineer won't and for that matter a data scientist load so it is actually a kind of a weird thing you need in my opinion at least one person who understands schema now that may be a software engineer may but in my experience it's mostly a it's actually mostly a DBA there so one thing I want to be clear on is a data engineering team is actually multidisciplinary it is not just a group of people with the title of data engineer it may be a group of people with mostly of data engineers but also may be a DBA or two and maybe some front end or visualization developer so veterans this is super important where you need to have a veteran when I've taught at a lot of companies and when I come in sometimes those companies will be trying to leverage a bunch of junior engineers they couldn't find senior people so what they'll get is a gaggle or a bunch of junior engineers and they'll put them all on that team now the issue with junior engineers is the naivete they are going to do something stupid and something stupid in big data doesn't mean it performs poorly it means that it performs terribly and you're going to have to spend months digging yourselves out so I've been I've worked with teams and one of the junior engineers or mid-level engineers will come up to me and say hey what do you think it is a design and within 10 minutes I can save them a month it's that it's that important it's the designs can be that bad so it's very very key and very very important you need to get some kind of veteran skill on your team and finally and too ambitious how many of you have a project where you're going from zero to Big Data the issue there is that you can't really do that I've talked with teams where they say here's my proposed architecture and what they did is they took some conference talk of somebody who said here's here's our architecture and they said alright everybody this is our architecture this is what we're going to do the issue there is that they're not telling you some things from the stage like you didn't raise your hand and say how many iterations did it take you to get to that architecture how many years did it take you to get to - what you're showing up on the screen conference talks did not show that they don't talk about the grueling thing that they took there and so what will happen is middle managers sometimes architects will take that and say boom I could go from zero to this they they did it right it is possible so the the real key issue there for you as you come through and look at these talks is that the architecture that they showed you wasn't version 1 it's version 3 it's version 5 depending on how old the company is they have actually iterated on that architecture several different times so you can you can cut that but you can't go from nothing to that this is how I see teams really fail is that the they say here's our architecture and they make they go through the company and they show it to everybody the VPS and everybody buys in and then a month later two months later six months later the the VP calls and says hey I want a demo in there they say I can't demo this this is that the demo is another two years outlook you signed off on the Gantt chart he says no I want to see what you're doing so what I when I work with teams I recommend that they do this in much more smaller smaller chunks that I believe that there's a velocity to a data engineering team I believe that a team has to gain velocity and as they do it in smaller chunks so you don't do all here's an entire data platform all ready to go it's piecing off and creating that data platform kind of like what those conference talks are telling you so very very key so if this sounds like your team you will need to take an honest look at your team this is actually important honest looks are are actually really difficult to do this is something I could do as an outsider I can look at a team and say boom boom boom you need to do this you need to do that internally sometimes that's either difficult a for because it may be that your friends you've known them maybe there's some other reasons some kind of political reason a when I come in I'm able to say hey you need to make these changes and if you make these changes will be so much better off you need to check and make sure your skin your teams don't have a skill gap or worse yet an ability gap the issue with ability gaps is usually ability gaps are not known it's kind of they're lurking underneath the covers but you do need to know and you need to be watching and actually dealing with these ability gaps so this is the the link to my book I wrote a book about data engineering teams that's the minified URL to get there it will help you it'll explain some of these things that I talked about in this talk about what should a data engineering team look like how should you actually go through how should the data engineering team actually work with a data science team these are the sorts of things that you can do and it has to be doing multidisciplinary do make sure that you get help this is actually an important thing and I think that if there's a big difference between Europe and us so I deal a lot with both I deal with companies around the world what I found is that in the u.s. you have to showing any sort of needing of help means that you're weak and that you don't know what you're doing I found that Europe is a bit more outreach that they do realise their limits and that they will ask for help so what I would highly suggest is that if any of this rings true if any of this does sound sound like your team I would really strongly recommend you get help early the reason for that is because of the ROI or the return on investment if you do this wrong the wrong wrongness doesn't cost you a week or two like it does with small data doing this wrong will cost you six months a year and when you costly you when you have that sort of cost sometimes that will cost people their jobs and that's really something you want I want people to avoid and that's kind of why I set out personally on this journey I set out on this current a personal journey of educating people on this because I saw these failures and I'm tired of seeing failures I am sharing with you all the research I've done so that I really want to see you avoid this please do avail yourself of these things that I've created so that you can avoid it yeah I do I would love for every single one of you to be in that 15% of successes let's let's push up that number of successes because that's pretty bad I know all of us in this room we we don't want to see a 85% of failures because that means that our industry isn't going to grow that those companies are going to can or they're going to fire that team or they're not going to invest in this because they're not seeing value and that's that's a key issue that that I've been dealing with personally so when should you fix these problems if this sounds like your team if this sounds like your heading if you're a train heading toward that brick wall the best time to that you know what the cheapest time to figure out if you're doing something wrong it's up on a whiteboard before you've done that but some of you may have already Dwight boarded some of you may have already put in months into the code you do need to if you are if you are headed towards that brick wall it's never too late fix it just costs more and more the further you go go down that track because then you're going to have to really bring it back you'll have to bring back even more and fix even more there's even higher technical debt there so with those happy thoughts let's open it up for questions I know you have some questions I saw some shaking heads okay go ahead yeah he's got the mic right there all right you got the microphone - oh poor fella so at some point you mentioned that there were some skills that Tim's need could we consider having those skills in different people or is there a subset of skills that you know everyone should have as an engineer that's a really good question in my book I actually go through what's called a gap analysis those of you who are data scientists are shaking head oh yeah gap analysis those of us on the data engineering side we're like gap analysis what's that and it's basically his question no single one person well we'll have all of these skills so let me backtrack to that slide just so that in case you you don't remember that long list but no one person will probably have all those skills you've heard of unicorns and in the Bay Area and Silicon Valley we call them flying unicorns where they're even more difficult to find and were to have all those skills so to ant to the direct answer to his question is all of these have to appear in a team not necessarily in a single individual and so what you'll do and what I walk you through in the book is how to do a skills gap analysis of listening out your people seeing what what skills are missing and then certain skills are hey we can get by on that or there may be other skills where it's you know stop we need to stop right now we need to find that skill two of those on a data engineering team or the top two there if you don't have distributed systems and you don't have programming that is the time to really evaluate and stop that is a hard stop end of story I have the research on this other skills there are going to hit you later on so for example that lack of veteran that's actually gonna really that's really going to hit you not on that first release it's actually going to hit you on that second that third release because now you're going to be iterating on something that's already in production and that veteran has to be there to stop you from doing something stupid answer your question very good a very good question thank you I have a question here yep and so I'm the CTO of a start-up that recently moved in a small data into big data and I'm currently suffering the consequences of not not having data engineers exactly as you said I really like the speech I felt really identified and my question is about the the DBA and the schema because I think it's kind of a blind spot currently I in my team basically there are engineers we already hired a couple and they act as DBAs and so the people that you've had so far are DBAs or not know they're that engineers are acting as DBA so far okay and my question is could you expand on what exactly you mean by a schema I mean I know what a dark schema is but I don't know if I exit understood what you mean by schema and could you also talk about and they work flow around that schema like who should define the schema and who should maintain it and what happens when it changes etc so another good question I just want to unpack one thing that he said and then thank you for mentioning that he was saying I'm moving from big data the small data and I'm hitting these issues that I talked about this would have saved you a decent amount of time I'm sure so good thank you for for calling that out so let's talk about schema so maybe you'd maybe you heard the question I asked Wes and that question I asked Wes about arrow was about schema evolution and that is a question that I would expect a person who is who is handling the schema of the schema part of your team that's the question that they would be asking so as you kind of self evaluate were you asking that question or weren't you it's not an issue it's not something bad about you it just means do you have that schema skill or not because you're thinking about okay you've laid that data down is that data going to be have schema evolution and that's a really key important thing so you then you heard the back-and-forth with Wes and I where he was saying this is all intermediate data it's temporary data this isn't a long-term storage of that and so that's what that schema is that schema person is they're thinking about I'm going to lay down a petabyte of data and I can't go back through and rewrite that petabyte of data every single time we make a schema change we need something like Apache Avro to handle that so there's the second part of your question of how is the sort of schema evolution handled this sort of schema evolution is handled by oftentimes as a business process so oftentimes it's that one it's that person kind of being the schema Nazi as it were they are there to make sure that the developers software engineers I am a software engineer we do stupid stuff sometimes and they're there to make sure that you don't do something stupid with your schema that you don't take a a floating point and make it into an integer for example that's something that the the schema in is seem evolution can't handle or better yet they would have actually prevented you from doing that in the first place they would have said okay this is what the sort of data that you're going to be laying down I'm not going to allow you to do that kind of what your DBA is doing now you know how you go to them now and you say I want to make this schema change to this table and they may fight you and they may say no you should do that and there's that back and forth that's kind of the back and forth that you should be getting at your at your start up or frankly any company uh that kind of negotiation that push back to say are you using the right types are you doing this for the right reasons are you doing this this and this the other thing that that schema person needs to know is the actual bite level representation of this and that's because as we create more and more complicated data pipelines those data pipelines aren't just gonna be a bunch of files on disk they're actually going to be real time they're going to be here's the real-time representation of this data and Kafka for example moving moving moving real time and then we go into our HDFS for long term storage or we go into s3 for long term storage we need to have that same schema throughout and we need to do that another common thing here is the unit tests you'll want to make sure that you have unit test coverage of the full integration of your schema so starting from the very first supported version all the way to the current version can we do scheme of evolution backwards and forwards so hopefully enough of something to give you a good handle I saw another hand over here somewhere it microphones behind you it was about two key muscle so okay good all right looks got a question over here Stage Left hi so if you've covered a large portion of like building pipelines I'm wondering about running wear clothes and the role of DevOps in all this is there do you envision like a DevOps part of the of the platform team to to help them run their pipelines or is there like a separate organization that takes care of that bar can you say a bit about it another really good question so kind of to repeat his question is data engineering DevOps and this is another very common question in my opinion data engineering is separate from your DevOps team that actually goes to another question I asked this morning where the the person was asking talking about how they were spinning up pipelines and I asked them are you a DevOps team and it was to to clarify some of the research I'm doing there I just wrote another post Pro Riley talking about this and so my people that I want to put my software engineer maintaining these production systems because software engineers are like a bull in a china shop they will break things and and we're used to kind of here I'm gonna play in my in my in my database in my local database and I'm just gonna blow it away and like no no no stop because that's the production database you know this is this is not what a software engineer is or are usually good at so yes I I would you might call it data DevOps you might call it data ops I think the one big difference between data ops and DevOps is an understanding of data so the operations team needs to know about data or excuse me they need to know about the processes they need to know about the issues and how to stand up a cluster that sort of thing but the reason I think data Ops you be separate is because now there's issues of data so is the problem due to a process failing it problem due to a disc family or if the problem due to bad data the data ops team ideally will be able to identify that because it's key that yeah otherwise you're programming your data engineering team will constantly be getting paying for things that aren't really a problem and it will just be dragged their productivity down too much so we need to have this data of team that kind of says oh yes this is a data issue let me handle this and it's up when I'll Riley the data it was a sponsored post but it's up there now he's gone for round two yeah hi so everybody's data-driven now so I'm here yeah okay I see you everybody is data driven out or we have data teams everywhere doing everything and sometimes the team is a bit of far from main company business so and developers feel bitter frustrated by what they are doing why they're doing so what we suggest to do in this case how to boost up motivation or something like that so let me restate your question you tell me if this is the right question you're kind of asking me should a data engineering team be located with the business unit or as a separate IT org is that about your question yeah I mean we applied data techniques in every site of a company I don't know for everything done so what my my depending on the size of the company I would say that there are there are a couple different routes I've worked with teams on one is to have a centralized data engineering team that's more consultative I talked about that my book where whenever a team wants to deal with something data related they'll come in and they'll say hey I have this project and the data engineering team will actually act kind of like a consulting arm help them create that there's another part to that where that you're mentioning of the team's being too too isolated to - all over the place and the issue there is then it becomes a hub-and-spoke model where there's a data engineer or data engineers located in the business unit to understand that domain and so we have that domain knowledge maybe what you're seeing is a manifestation of a lack of domain knowledge or perhaps even interesting domain let's just kind of put it out there in sometimes these jobs sometimes these jobs are boring of hey let me eat yelled out for you I'd rather have something more interesting that so perhaps locating them within the business unit where they're that they then have a strong relationship back to that hub of the data engineering team that's another route that I've worked with teams on if you want to follow up with that I talked to me at the office our time for one more or yeah one more if I can ask it and the rest we can take to office hours no we can't let you ask that what do you think should ask go for it be so with the proliferation of SAS tools and hosted data infrastructure right I think of something like Google Cloud dataflow which sort of simplifies a lot of things and abstract away a lot of the previously necessary infrastructure do you think that's changing the the skills and the culture that are needed inside certain teams where they don't have to have such a depth of data engineering experience do you see a trend there I I don't see this going down and that this is actually an interesting question I'll answer it with my general theory and then I'll talk a little bit about it I don't believe that a big data a general-purpose big data system can be made simple or easy it can be made easier but it cannot be made simple I think that only specific use cases and very specific industries can be you can only have a specific purpose built system for this thing that can be easier but when you're dealing with the levels of complex complexity I don't think it's bringing it down I think it's it made maybe just a little bit but it's not enough crucible amount where we can say hey I can hire Johnny off the street front-end developer and let's get him up on this big data stuff I guess the ideal I'd like to see is an eventualities where we don't differentiate between big data and small data kind of what wes was saying we have we have pandas for small data and we have this other thing for big data we have pi spark I'd like to see that but I don't think that the that the people that the bar will be lowered so it put a different way put a more of a business way that's part of the reason why I started my business specifically into big data is because I'd I see the barrier to entry being pretty high of it's pretty difficult to get to this level and when people try to get to that level and aren't at that level it's very very apparent and you see it all over the place you
Info
Channel: Data Council
Views: 6,636
Rating: 4.9006209 out of 5
Keywords: data engineering, data engineering culture, jesse anderson big data, jesse anderson bdi, big data project management, big data culture
Id: VkeleGIUSM8
Channel Id: undefined
Length: 44min 36sec (2676 seconds)
Published: Mon Oct 15 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.