What is Data Integration? Core Concepts, Best Practices, Common Terminology

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi I'm Brendan Peterson one ascribe Seconal evangelists today we're gonna talk about integration but not in specifically described terms what we're gonna do is we're gonna provide a basic overview of integration some of the core concepts you have to understand some of the pitfalls that might exist out there this is gonna be kind of a baseline introductory course that's hopefully some others will build off of so why are we here what are we doing here today this really is providing that baseline education around integration this is gonna be some concepts that are you need to basically get before you can start taking it on integration projects this is gonna be things like common terms so people have the ten names for the same thing we're gonna try to clear some of that up lay out the groundwork for how you build these integrations projects hopefully this will be again it's a precursor to some other training we're gonna be doing down the line but it also should give you some some ability that if this is your first time approaching integration it's a very different beast than just you know a development project then developing your own products I mean it's got a lot of different components so understanding some of this is the the basic level is gonna be really important for you to be successful in your first second subsequent projects going on so the topics we're gonna cover here I'm not going to read them off the screen but basically we're gonna cover off of an overview around what a negation is down into all your your databases fields objects all the things the the aspects of data you need to understand and then we're going to move into a little bit more of the concepts you have to understand around integration so how do I share data back and forth when do I want to do that is the right time right place so we're gonna try to cover all that at a basic level and then again we're gonna be building off of these concepts in future video trainings so but integration overview really what you need to think about is the phases of integration this is your core component integration is not a one concept you're gonna be going through and building off planning you're gonna go through and do your development work you're gonna do all your stitching together you can make time tables so you have to hit deliverables by each one of those so a lot goes into this the planning of it is probably its the most important what do I need to integrate I know I have to integrate a CRM and an ERP application but how exactly do I do that when do I have to do that what objects have to talk to one another so you start to plan those out into the different phases as far as you know when do you architect it when you implement it so installing the software created the mappings that's one part of it but then you had to go through the testing maintainability so this is where you need to make sure this thing is not so unwielding that other people can't come in and manage it once you're you're gone or you pass it off to a customer it's gonna be something that doesn't take a lot of care and feeding if integration breaks down day to day it's not worth anything the deliverables are you gonna have for integration your initial synchronization so if you have a customer or yourselves who are buying a new system you got a load data in there to get started with that initial sync is very critical towards a getting it right be understanding how to do it and see it's gonna drive adoption of those new systems by front loading it with data you don't have to have people manually putting it in you know time after time it's gonna be there and ready for them when they get started testing is critical and I can't tell you how many times that I've had in my experience prescribe the issues where if you had just simply tested it with some more real-life scenarios it would have made your life much easier so if you have 10 million records you get a process moving a thousand back and forth for a couple times it's not cutting it and you got to really start to do some load testing performance testing testing it as far as the the the data integrity as it comes across there rerunning go live so being able to rerun this integration without having to start from scratch every time is something that we're going to talk about and some of our migration techniques but basically you want to make sure that if you have a problem midway through or you're a little ways away from the go-live you want to be a little front load it with some data work with it and then keep running it to bring in the deltas every time you want had that rerun tolerance because when you finally go live it doesn't always make sense to move the entire bucket of data over overnight over the day they have one Day weekend whatever it might be if you can do a lot of that over time leading up to it it makes that final go live much easier it's more of a cut over than a go live alright so the first thing we get understand is is databases now a database really there's two big types that we see all the time there's normalized and do you normalize normalize databases are broken into smaller bite-sized chunks so if you think about you have a contact and they have an address those in a normalized database are going to be two different deuteron objects you can have some contact date over here you're gonna have some adverse data over here they're really fast to create data because it's small transactions yeah you've got a couple of different attributes around that data that you're adding in you're not putting this in this gigantic you know slew of data into a into an object here so normalize database is compartmentalized we'll talk about keys and some of the relationships but each of these tables have different relationships to one another so you've got really easy to add small bytes of data here they're you know everywhere a denormalized database is gonna be more like a like a big text file this is where you've got a lot of data on one particular you know row so I might have my first section is on my account data my second section is a contact information my third section of that same row is address information it can get to the point where it's a pretty large record a record via the person but it's got a ton of information in there it's not all compartmentalize like with a normalized database these are usually slower on the writing of data into there just because it's a big chunk of data we got to pass in but much faster on reads see a lot of this and more like the marketing system some BI tools especially where you want to be able to do you know pull data out really really quickly and get a lot of it in one snapshot versus you know loading in is kind of an ongoing process you're not trying to move in billions of transactions you know immediately clumping terms a database I mean a database is really the most common term but CSV files you know these are in themselves there a database it's a it's a system of record for a bunch of rows of data within their schemas so that could be my sequel term you can also think about that with XML that XML is have a schema which is a definition of what you can put in there that's kind of like a database as well and then api's can kind of be databases so if you have an API that you want to integrate with that is how you get access to data and you feed data in a database is gonna be what holds that in the background but that might be your interface with that that database objects so when we are talking about some of these integration topics I want to boil this down into really basic an object is the thing that contains data so you can see in the little slide I've got you know some some icons you know they're not they're not tables like it's just it's cutesy but they're not a table a table is gonna be another word for this but tables objects views methods you know it's something that describes a particular object so I talked about contacts that contact is an object it might be a table in a database it might be a you know an object in an API it might be a method I'm calling to pull data out of somewhere but it gives me contact data it's a container for that that type of record again a denormalize database you might have one big object which holds a lot of different types of data within it a normalized database so you're gonna see a lot more of these objects because you're gonna have logic counts contacts addresses leads activities they're all going to be different objects that contain that data but it's going to be you know well however they call it it's going to be that that low-level container so that container of data the kind of schema around what it looks like with applications you're integrating with most of them will have some set of standard tables and objects that come out come out of the box they're always going to the same same names same look same feel same set up we call those standard objects pretty pretty regular a lot of applications so give you the ability to have a custom object which is going to be something the user or the designer of that application defines so if I go into something like a salesforce.com or dynamic CRM I can say that you know I've got contacts leads and accounts what I really want is something for my guests I own a hotel I can create an object that has those parameters that I care about those attributes of the guests and have it in a custom object now for those of you who are customers you know you can will create these you can integrate with them it's no big deal this is more so when you're developing integrations as a partner or as an ISV where you're going to be distributing these solutions out over the marketplace you need to think about what objects you're integrating with because if you start building into your standard design using custom objects you have to have some mechanism to get the objects in there upfront otherwise you run as a two problems one yeah they're just not there so when you try to use an object that doesn't exist you'll get all sorts of feedback from the integration tools the other thing is if you're relying on a user and you're giving them a sheet and say hey it's the object name this it looks like this that adds a layer of a failure point so you know I could go in fat-finger some things I can put space where you put under bars and that can break the integration tools that you're going to be using so it's important to understand that there are the standard objects come out of applications they're always there they always look the same you can use those over and over and over again but custom objects are the wild wild west of objects they could look like anything they can be used for anything so you got to be willing to kind of you know maneuver in there and that's where the customizability of some of these solutions comes really handy so again the common terms a table you know sequel table my sequel table it's going to be a database table views are just another way to abstract kind of tables inside of a database but it's really going to give you just a set of rows and columns of data methods objects so if you're dealing with API is that kind of where you get into if you think about a excel sheet you get a bunch of rows and columns that's a table so if you think about that no matter how many rows and how many columns that is going to be your your table your objects that you're gonna be dealing with something that's gonna have a definition that that holds data so next is fields again kind of cute images but you know you got a couple of different definitions there it's not a baseball field it's not a soccer field this is field fields are gonna be an attribute around that object so if I have an object of a contact the fields might be first name last name email city state zip or date of birth whatever whatever whatever it's an attribute describing that that record that you're you're holding within the object the table the view just like objects there's standard fields and custom fields a lot of applications give you the ability to customize them so again Salesforce we have Marketo HubSpot you can build in your own custom fields to have additional attributes around that so we ascribe you know we're a partner driven organization we've got a lot of fields in our CRM system that are geared right to that so what is their area of expertise who is the partner of record for a customer we've added those to our CRM system so that we can better track who the the attributes around those those individuals are or what they are so again if you're designing integrations Sander fields are always gonna be there if I'm dealing with GB 10 it's got the same set of fields no matter what every time I go in there same set I can integrate to those and be confident that I could deploy that to a new customer and it would just work no big deal custom fields it's a little you get more careful you can have people add them you run the risk of the fat fingering if you can have them be imported automatically it's even better but you may not always be have that capability so just be cautious of if you're designing solutions to be distributed be wary of some custom objects custom fields fields have data types so this is where if I've got you know my date of birth I want to cast that as a date time it's a date value if I've got the annual revenue from a company I wanted that to be a float or an integer I want to do this not so much because it changes how it stores data if I define everything as just text you know text holds everything so I can put whatever I want in there but for reporting for dashboards for some tasks and workflow work that I want to go in there I might have to have that data type defined so it's important to know that these things have a lot of different data types there's anywhere from sequel data types there's api's are a little more you know generous with how they're doing that they'll give you data type but is usually pretty specific string string of length x string of length y so be aware that those fields might have different you know values that they can accept and they can work with common terms so fields are again really one of the most common columns header values parameters of an API call parameters of a soap API web services whatever basically again it boils down to this is some kind of an attribute around the the data you're describing that you're holding that you're working with a record so a record is is a row of data it's not a vinyl it's not the record button so this is record not record records are really are what you're dealing with in the the baseline so this is a table or an object has fields or attributes that combination of fields is a record which defines a person a place a thing whatever that data is different types normalized you normalize biggest thing is like in the image you can see here this is a straight you know account or contact lead record it's got a couple of attributes around them if I'm dealing with a normalized database my record of an account might be pulling in records from other tables and objects to pull it together to have one bigger image of that that record to me it's one particular you know record this is one one transactional record that I care about but it might be pulling pieces and parts from other locations so it's very important to know which type of database you're dealing with denormalized it's got all that stuff in one big record so exactly what this image shows the box around the record do you normalize database would have everything I care about in that line when I deal with normalize databases though I have to do joins and the pull data together it still is one atomic transaction it's one account it's one contact I don't really care as when I I'm thinking about my integration that it comes from lots of different places when you're at this level you don't care don't think about it don't worry about it think about it later on because you want to know exactly what type of data you want to move and then worry about the tactical a little later on in the the scoping process common terms records rose you know again it's it's an account it's a thing wrecking the roads are the most common that we run across but it's basically this is it this is the the thing that you're working with this is the contact record or brenden this is the account described this is the address of 1715 Elm Street in Manchester you know whatever it is it is that atomic transaction that thing that you're dealing with so lastly with some of the UM the database terms that you got no saying is keys and keys are key they are are crucial for integration we've got two type of keys we mainly see primary and foreign a primary key is how I uniquely identify that record that I'm looking up or working with inside that that object in the table this might be like a true primary key global unique identifier it might be some number value that I'm contact number 672 it also might be what we'll call more of a a natural key first name last name email they'll a combination of those three fields identify Brendan so keys are really really critical because it gives you the ability to go back to that record so if I need to bring in you know my my addresses I gotta look me up I gotta look up Brendan I gotta find some way to locate me if everyone looks exactly the same there's no uniqueness to it so you can't tie things off to one distinct transactional record foreign keys are how we create those relationships so if I'm again let's do account and contact the way that account and contacts are linked is that on that contact record I've got some field that shows me the account ID so there's some part of the account that lives down on the contact now that's one of the most common ones it's kind of a one-to-many where an account might have a many contacts usually a contact wouldn't have many accounts so you wouldn't have you know a bunch of fields there for different accounts that it holds onto there's different ways and again in other courses we'll talk more about some of those different sharing models many too many one-to-many etc but basically with a foreign key it's how those two records know that they're there you know they're linked together they're related without that you use himself a natural key matching but at a database level at an application level it's a lot easier to manage for these normalized databases to have this primary and foreign key type relationship the compound or combination Keys is really it's just another way to identify it so a real true primary key is usually going to be you know gooood or some kind of a value that is very unique to that transaction a lot of times you see it you know alphanumeric type key values a compound a combination key though it could be an email address and you know first name last name again first name last name date of birth address where I want to be you know first line my address city state and first three of my zip I can get really creative with how those those values line up but that combination of fields is going to say this uniquely identifies this one record this is how I know I'm dealing with the one scribe software not you know scribe out of California which does medical software so it gets me down to the exact one that I want to deal with key storage patterns so storing keys in the integrated systems this is where I used that that objects kind of explanation earlier with accounts in context I've got a account ID field sort of my contact record it's pretty straightforward it's easy and I can do that based on on external system so if I want to bring my ERP key up in my CRM system I can store that in there when I want to bring an update across I know it's there so I can just use that and you know update it I'm good to go that works a lot of times but you sometimes you don't want to do that there can be speed considerations performance considerations so you might want to use keys and external systems so between my ERP and CRM system I might have a database that all it does is hold a key from my earpiece system a corresponding key from my CRM system and now whichever way I go I can check that table first say okay well I got I got you know key a it's gonna translate to key one go back over there so that's a really good method it's a lot speedier for design aspects you've got to actually build that design so it's not always the best because it takes a little more overhead but the downstream can be a lot more beneficial than if you're just storing keys inside those native systems last ones natural key matching so you don't know store keys anywhere to be matching on account name again first name last name email that type of data that you can use in the natural keys combination keys it's gonna be able to find that record based on some values that I know to you know I know because I know my data is gonna uniquely identify that record for me common terms primary key PK foreign keys keys GU is a GU it is a global unique identifier so you'll hear those kind of interchanged sequel has their own auto-generated primary keys where you put a record in gives you a one put another record in gives you a two so on and so forth so there's a lot of ways to use it you'll generally find its primary keys primary keys are pre you know standard industry terms but a keys foreign keys you know that they do have some different different connotations to them so the sharing model this is gonna be really important so once you've understood how this this data stands out what it looks like you got to figure out what do you want to do with it so what data do I want to sync just saying I want to hook up my ear PN CRM systems and wipe my hands of it that's not good enough you gotta say I want I want it counts I want to count some contacts I want addresses shared from CRM down to ERP but not the other way around so I have to start designing around which dated do I want to pass across and how exactly is that gonna happen when do I want to sync it so if I'm again doing these these transactions if I have normalized databases becomes really critical the thing about the timing because if I have an account in contact or even better I have an account in a sales order and I'm trying to push that sales order down to my earpiece system because that's a pretty common integration scenario if my sales order gets across before my account has that orders got nowhere to go so I'm gonna get errors and many useless transactions it's gonna cause me more headache as an integration designer to say okay we'll either build that you know situation into my design or make sure I tell the customer that hey yeah you're gonna get this every once awhile hit this button and it will reach rigor it not something you want to have to do all the time so understanding when to move these pieces of data it's gonna be really important the systems of record are critical so when I have these systems integrated I now have data moving you know one or two ways all the time so I have to have clear definition to say when I have an account get created in my CRM system it's gonna go down to my ERP system all that data is gonna overwrite whatever I've got in my ERP system because my salesperson is much closer to my customer they'll find out when they move offices they'll find out when changes happen on that account record much more so than my finance folks in my earpiece ID by the same token though I may have a field in my ERP system my account ID my account number that's gonna go over to my CRM system and it's only gonna be accessible via the integration from the ERP side because I don't want a salesperson going in and saying well it's not a count 1 2 3 its account 3 2 1 make a change and now I've broken the link between my ERP and CRM applications so there are sometimes you're gonna have specific fields that are gonna be owned by one application of the other or whole objects another good example with using ERP and CRM is your product catalog you want to share that from your ERP system up to your CRM system only by sharing it up share product on hand you know quantity on hand pricing taxes all that kind of information you want to share with your sales team in the CRM app you do not want people to go in your CRM system add a new product because why not you know let's start selling widgets on the fly for a grand apiece that shouldn't ever go back down because again your ERP systems gonna control your financials your product your inventory your catalog your CRM system is just how you sell those things so there's a lot of different ways you can have one way integrations to establish system or record in two way you have conflict resolution where you say hey CRM always wins whatever's coming in I don't care if it's older data if you've updated your ERP account to say they move to Wichita because serum says they're not Wichita serum is gonna win so there's a lot of those considerations to think about and the diagram will bring up here it's a forum because a an easy sync between two systems is simple data goes in data comes out you start adding in other applications that you're gonna be working with this model gets infinitely more complex with every system do you add and with any integration project you're gonna have a specific solution you're trying to accomplish ERP CRM but minute one that s I'm running you're gonna decide hey I got some text files let's bring those in I've got an old legacy system that has contract data I might want to get that across as well you know what our support system would be great to tie that into CRM to bring in some support cases and escalation history so you can see how that that will get much more complex and having a good strategy up front it's gonna allow you to be more proactive and know how to integrate this much more you know easily then being reactive to every new system and having to bring down the integrations across the board and redesign from the ground up so it's critical to think through your sharing model exactly what has to get shared when and even how do you went how do I care about sharing it to one or persistent you want to bring just pieces of it over you know all questions you're gonna have to think about net change patterns so this is getting more nitty-gritty into integration migration I have three listed here there's obviously others but these are the three most common that we see in our integration solutions first one most common is a modified stamp give me everything that's been modified since the last time that I checked super simple a lot of tools your R's included has this baked in where you can say track the last time you ran do a comparison and bring back the the Delta this is handy because most systems have a modified stamp having that is easy because you don't have to pull back every record every time and you can imagine that is your integration grows that could become much more daunting you start off in 2007 with an integration you got a couple thousand records just moving every day it doesn't matter fast forward of 2014 you now have three and a half million records that are in those systems if you're moving that whole bucket back and forth every day it's just gonna become so much overhead that is it's crazy so most things have time stamps really simple pattern where you don't have a time stamp or you want a little more you know a little more control over exactly what's happening we have this kind of like bit filter or semaphore integration where you might have a table that's having a trigger something that's gonna say hey Brendan's account record that just changed so it's gonna have a flag saying that it has just been changed this you'll have some kind of a query looking out and saying give me anything when that flag says I just got changed brings it back moves it across to your other system goes back and updated saying success or failure what this means is that it's more of a kind of a event-driven process you've got something that's going to be going in and making a change to say hey go pick Brendan up he's ready to go you have to have the ability to go back in and flip that bit when you're done otherwise you're gonna pick up the same record over and over and over again which kills you know that that's not that change that's just yeah that's just data so it's a good process you're gonna have a couple out you know other pieces in play to make that work but it is handy it is a lot more event based you can have it a lot you know more quickly looking for records so have it run every every minute instead of every 15 minutes for the polling process a lot of ways you can make that work if you want to get away from polling process completely you use an event-based integration now this is requirements the requirement of this is that you have to have a system that can give you notification of some record if we're talking about you know like salesforce.com has an outbound message capability I tie it to the end of a workflow it throws a piece of XML out scribes there to capture it or something it does how they describe it can be any integration tool capture that incoming data do something with it move it across from another system it can then feed back responses saying yep hey I got it hey this is the new the new account ID or whatever it might be another way you can do it is you know with more of a request response type of interface so if you have a rest endpoint that's sitting out there and you want to say from this system call out to there pass it off some some little small transaction might not be XML you're probably dealing with a web service call coming in and out but that system can trigger a web hook a web hook and call it to something so like Marketo HubSpot Eloqua they all have the ability to to issue web hook calls out and it just it tickles some end points somewhere else and it pushes a to some other place without having to build a full you know fully baked integration it gives you the ability to take little pieces of data and send it those are great because there is no polling process it's just always there you set a rule inside that application and say when this happens push that data outbound it'll do that it'll follow through forever but you don't have to go in and build a lot more capability you're kind of using the built-in capability of those applications but again if the application doesn't have it if you've got a system from 93 that's been running forever on an as/400 most likely you're not going to have event-driven integration capability but you know you can always add it in with some some custom coding some of our basic design patterns so for migration so everything starts the migration so we're gonna start there fault tolerance and rerun tolerance you want to be able to make sure that if you're running data in no matter what no matter how great you are at integration you're gonna get errors it's just how data what data is dirty it's it's always gonna be it always has been so when you build in some fault tolerance the run of your job you're allowing to say you know what I know I get it you're not every wreck is gonna go in some people are putting phone numbers in a website field websites in the phone number field it's gonna get rejected keep on truckin move all the records you can give me a report at the end of the day that you couldn't make make work that rerun capability is going to be Teufel at that point I want to rerun the error records and I can build in some defaults that hey if you've got if you find a number in this field just make it a default value or blanket out null it out let it run through so you can accommodate for all those errors you also want to do the rerun tolerance because let's say in the middle of a run you're moving into a cloud system and going from CRM on-premise your mam line halfway through you know Comcast ISP as an internet outage poof no no more connectivity to the cloud is gone if you have no rerun tolerance built into your migrations you're gonna start back from scratch it doesn't know where it got to so building some capability to say hey I got to here pick up from where I left off let's just keep moving I don't I know I did everything else I worked it worked great let's pick up where I left off now that the Internet's back ordered data this is really going back to parent-child and normalize databases you want to make sure that you bring in those parent objects first and tie everything back to it the I thing you want to do is dump all this data into your target system then have to go back and manually say okay well Brendon's link described so create that link there they're in Manchester New Hampshire so I'll create that tie there you know you don't want to do that anything like that you want to it to be done automatically run in the proper order make all the ties that you need once you've done that you can start looking into timing and multi-threading so the timing of those runs my business units my users my counts those all have to go in upfront once they do though I got a lot of data that can run in parallel contacts leads activities I can run a lot of that at the same time up into those systems so using any kind of a system that can do those Multi multi threaded type integrations you get a lot more throughput a lot more you know in parallel processing and you just build that into your schedule by saying I know that all my my parent relationships are there so I can just fire all this data and let it sort itself out I don't have to do it and trickle through one object at a time for integration a lot of that still comes in handy so understanding all those those requirements it's going to be good more so when you get to fault tolerance in the integration it's that you want to build retry logic in so in my example earlier sales order account if I if I were designing that integration I'd have a lookup to say hey is the account there if it's not go back to my source system grab it add it and then add the order that way I'm not creating a needless error that someone has to go back in and deal with I'm making sure that entire transaction I'm gonna put anything I need into that system when I need it so that fault tolerance is really important because you're if you're a customer you don't want your IT staff spending all day chasing this integration down because they're they're not gonna they're not gonna stick with it you know it's if it's more of a drain on them than it needs to be it's gonna be a main fighting point for partners and ISPs if you deliver integration to customers where they're constantly having to call you up it's going to be a drain on your support resources build your integrations with that kind of fault tolerance in mind because again no matter how great you are integration design you're gonna hit errors you're gonna hit data coming in out of sequence build that logic into the tapping that you're doing instead of dealing with it on the back end with errors and reprocessing process flow data latency these are really all about again parent-child relationships when does the data need to get in there if I have marketing system integrated with my CRM system my personalized data information about my contacts my leads that's got to get in immediately when I send out that email if my if I just got married and my name changed if I just moved and I have to fall into a new territory I want that to get in there almost as soon as it happens because if that email goes out and it's the wrong information to me I'm like oh well these guys don't care about me they didn't see my name change so I went from Brendan to Brenda now I don't have that same personalization effect inside that that email the daily latency on the other side though is yeah bringing the data about the email I got the fact I clicked it and I opened it and I forwarded it it's good to get that back you don't need that that near real-time so you could have that on an hour to hour three our latency where it's just gonna roll through a bunch of those data records bring over what's new update what's been updated it doesn't have to be as real-time so when you think about integration not everything has to be there immediately you may think so when you go approach an integration you think everything has to get there right away and has to be there otherwise you're in you know you're off the creek without a paddle it's not true really think through what the business case is around why you're integrating that data and you'll start to really shine some light on the fact that not all that data is mission-critical to be there within you know 15 seconds some of it can wait minutes some of it can wait hours some of it can wait days so really think about the latency of when that data has to move from one system to the next last part of that is conflict resolution goes back to system of record if I have an account changed in CRM and ERP at the same time who wins is it the last one that was updated is that going to overwrite the other change is that my system of record is always going to overwrite everything regardless of time think about how you want those conflicts to change because if you have a bi-directional integration again at some point you're gonna have this data in flight going both ways they get a figure out when those situations happen who's gonna win how do I check that out is it based on timestamp is it based on specific fields being filled in so if I have my mass or a cow not a fill then the other one doesn't that might mean that's the rule that means I win so think through some of those conflict resolutions and how you want the day to handle it while it's in flight versus finally on the ground you getting a call going in and try to make a change yourself so last slide here is about initial sink methods same concepts so what data what order what timing you're gonna see these over and over again through these integration trainings because those three core components what do you want to sink how does it have to sink when's it have to sink are the baseline pillars of integration when you're figuring out what data make a plan or exactly what objects need to be synched when you're dealing with a normalized database realize that an account may be an account and an address and something else a third object that hangs off of it it may not be an atomic object that you're just dealing with an account is an account is an account it might have some other pieces so when you're doing that use some tools that have the ability to demonstrate that and give you relationships bring that data back in one big transaction because again you don't want to run through for that one record that one account five or six different integration layers to say okay now I've finally got that complete account should have been able to do in one pass I did it in five so you know good for me so think about that exactly what needs to go how it needs to look what order of data the parent-child relationships how they have to get maintained this could be anywhere from a user down to an account and account down to a contact a contact to an address and address to an opportunity ship to address so you've got a lot of different flows for this data to stick together make sure you have a very clear diagram of how you want this to work itself out if I don't think about this and I start moving data without really thinking through this plan I can get orphan records things that are in that target system not tied to anything that's a real bummer when you trying to deal with integration cleanup I could get data decide to the wrong thing so there's a couple different scribe software's there's one that does pence they're both described right in the name so if I'm just matching on that I could find the pen company and start tying all scratchy activities that was for the software company to the penny company yeah there's a lot of ways that this gets screwed up in flight so think about how the data relates to each other that sharing model the keys that you have and then what order and and how quickly does it need to get across because again if I start moving in orders well before I know my products and my accounts are done I've got a whole bunch of other issues that are going to get surface up to me so think through those for your initial sync the last one is one of the more important ones and the one of the most forgotten ones is the volume if I'm doing my testing with with 50,000 records I can move it over the course of two hours I feel great about it everything looks great I've done a lot of different testing you a teas going well the users have said that everything ties the way it should and I'm ready to say yep we can cut over I have no concept if I have 15 million records I don't even know how the server's gonna react I don't know how the product would react I don't know what's gonna happen so you want to make sure when you do your testing you don't have to do the entire bulk load to do testing I mean it's testing for a reason but do something a little more real-world than just a couple sample sets to move easy records back and forth that is where you're gonna flush out all the different performance issues if you build something that just works great there's generally you know there's there's three ways to do things there's a right way there's the wrong way there's the easy way the easy way could work but when you try to do things that you're gonna do bulk loads being performance jobs you're going to really quickly find out that it wasn't the right way to do it so make sure when you think about these different runs of jobs don't just say hey I got this easy operation to throw in there I don't really care what the performance is I don't really know what it does behind the scenes learn those things figure out the right design patterns to use so that it's it's working for your easy testing your low level testing numbers everything to validate that your your procedure works but it is forward thinking so that your proactively planning for the 15 million records versus reacting to a server crashing and you having to figure out how do I pick up the pieces and start over again so hopefully this was a informative hopefully I had a little bit for you that guys could get a better understanding of some of these integration topics again we're gonna use this as some baseline for some of this might have been really rudimentary so I apologize but the goal is to get a level set around what some of these concepts are and we're gonna start going a little bit deeper on some of these individual ones so specific design patterns with connectors that we have so the Civic approaches towards performance and migration so stay tuned to our our blog on our podcast I see a lot more views coming out look at our YouTube channel and again thank you guys for the time and attention
Info
Channel: Scribe Software
Views: 29,780
Rating: 4.8202248 out of 5
Keywords: enterprise integration, application integration, integration platform, integration platforms, enterprise application integration software, integrated application, data integration solution, system integration software, application integration solutions, application to application integration, business data integration, what is data integration, data integration
Id: OIpkcxc-CVM
Channel Id: undefined
Length: 40min 42sec (2442 seconds)
Published: Tue Dec 23 2014
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.