Fortran Pointer Vs Allocatable With Arjen Markus

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi guys in the following video i get a chance to talk to arjun marcus the author of modern fortran in practice he started his career in fluid mechanics specifically numerical modeling of water quality problems working for deltares or when he joined the company delft hydraulics he said that over the years as he became his work became more about maintaining and developing the software to do those numerical models he focused more and more on the features of modern fortran and modern software development practices and applied those to a variety of articles and eventually into his 2012 book he's also had the opportunity to do phd research on nanoparticles in the aquatic environment and has always kept an interest in his original area of expertise combining practical and theoretical subjects or at least that's his passion he says he lives in a suburb of rotterdam with his wife of almost 30 years they have three kids two daughters and a son and one grandson who's half a year old now we discussed the problem of allocatable variables versus pointer variables and the features of both and when you might prefer one over the other so let's uh dive right into the conversation basic idea here is um i've been asking around about you know questions that people have with fortran the language and you know things they'd like help learning about that kind of stuff and one of the very first things is can you help explain pointer semantics and and why why you might use that over allocatable variables um and you piped up and said hey i just got a question about this that was related to this and i would like to to help out so to start off with let's start with allocatable because i think that's at least a slightly easier place to start um where basically the idea of an allocatable is i've got a variable that i don't necessarily know the size of the array or possibly the type of the value that i'm going to be storing at runtime at compile time right so other you know static variables are something that you need to know the size of or type of at compile time and for you may have to correct me on this one but i believe when that allocatable was initially introduced into the language automatic deallocation upon uh exit from scope wasn't part of the standard that's right that came with the fortran 95 standards that's that that's what i thought but yeah but with with that addition using allocatable for the most part you shouldn't have any memory leaks unless there's a compiler bug of some sort um well actually if you only have allocators and no point is at all and then um uh memory memory leaks are actually impossible sans compiler bugs yes i have found some instances where you know some allocatable stuff just there was clearly a bug in the compiler but yeah yes but theoretically if you uh if your compiler works according to the semantics of the standards then memory leaks are impossible yep okay so now then moving on to pointers pointers give you the additional capability that the data in the variable can outlive the variable that created it essentially yes and in that case you have a dangling pointer because if the memory which was allocated at some point for instance fine allocatable and you point to that memory via pointer variable then the allocatable stuff may may disappear and then you have a pointer which points to something which has disappeared right um both opportunities for for yes yes that's one uh opportunity of course you don't have a memory leak then but your you have a reference to memory which is no longer yours correct that's one problem the other problem is you allocate a memory fire pointer and later on in the program you add a gate again via that same pointer and the first memory you have is no longer reachable and then you have a memory leak right and there's nothing you can do about it except for uh correcting your program right so thinking about kind of the semantics of allocatables and static variables um basically anytime you cross a procedure boundary called calling a subroutine or a function or returning from you need a copy in or a copy out right according to the semantics of the standard it's in principle you're supposed to do a copy and error copy out of course you know optimizations and things can can find cases where it's not strictly necessary and safe to avoid the copy but from the kind of principle semantics you're you're you're passing things by value not necessarily by reference well you uh essentially uh pass by reference or at least that's t um that's the most common way to do it copy and copy out is sometimes necessary because your subroutine is in 1437 77 style and you pass in an array which is not contiguous and an erase array section from the second or third dimension for instance okay that's not a contiguous memory so your fortran 77 routine or style starts routine doesn't know about that the array is declared with an asterisk and then the compiler knows it has to copy everything into a temporary array and copy back into the original array that's one case where you can have copy and copy out okay but essentially the semantics of the of the language is such that um that's just a a mechanism which is used to get correctly working procedures it is quite possible to do it in another way okay but that it really depends on on the the situation at hand but um the um the way you um you how did you call that um how do you say that um copy and copy out is not a requirement from the standards okay just like you have that on the right hand side everything is evaluated and then it's copied into the left-hand side of an office of an assignment where it works quite that way is something up to the compiler but the semantics is such that you have the right hand side which is completely evaluated and then copied over to the left hand side that's the semantics if the compiler can find a smarter way to to do that then it's certainly allowed to do that okay um that's that's part of the of the old system what it doesn't what it does mean is that it has to for instance the the right hand side is a double precision and the left hand side is single precision the right hand side still has to be evaluated a fully um double position and only when you copy back into the receiving variable then you get the conversion to left to a single precision well that means that if you have mixed mixed expressions or a mixed assignment um you still have to be able to to reason from that using this semantics and the same is true for for procedures you pass stuff in um if you have an array which is um [Music] declares intent in then whatever happens in the insider inside the the subroutine it must make sure that things are not changed of course you specify inside the routine intent in for this particular array so it can't change anything there but if you have some some sneaky code which accesses the array in another way for instance via the module in which it is originally defined of course then the compiler can't see that but um the the the semantics in general is such that you should be able to to reason without knowing too much about the implementation okay so copying is not necessarily required for crossing procedure boundaries and in the idea is to try and avoid it okay well that that actually i think reduces my desire for pointers because i was going to say that for pointers that you'd only need to copy the address anyway even if you even if you were in a pass by value not by reference yes well uh pointers are actually uh come with a an array descriptor yeah so plus some metadata that's awesome right and for pointers it's a bit more complicated because pointers can also allow you to to specify array sections with with strides strikes other than one okay and of course then it gets very complicated and then quite often copy and copy out will occur but that's that's a detail which is um well most compilers will tell you that that sort of thing are happening but in general it's not necessary okay so now let's talk a little bit about use cases um so the use cases for allocatables are basically if i don't know the size of the array needed at compile time or i don't know the specific type that i'm going to have at compile time there's actually a lot of cases where that may you may not necessarily need an allocatable at least anymore because you can declare variables in blocks or or subroutines or what what have you width sizes based on the inputs to that block that's right yes the automatic raise and with class variables frequently you're passing in something of a concrete type and so you don't actually need it to be allocatable because it's coming in as an input exactly yes okay so you've got a lot of options to uh to work with the memory the drawback of automatic arrays automatic variables like that is you uh can't control whether um well if the the the array size becomes too large you may end up with too much for stack right and that causes a core dump you have no control so if you know that your automatic arrays might get fairly large it it's in general a better idea to use them as allocatables rather than automatic arrays okay just just then you have control you can allocate the array and you can see whether that succeeded or not with automatic arrays you can't right quite often you have um compiler options to control whether that comes from the um from the stack or from the heap for instance the intel photo compiler can do that uh and i guess uh g4 and also but that's relying on what your compiler can do and i generally don't want to rely on that sort of things right that's it's effectively relying on the compiler not a standard yes and of course if you have a very critical program certainly do use all the tricks that compiler can can do but that does mean that your program relies on something outside of the source code right and whether that's acceptable is something [Music] you should decide for yourself i think um what i've seen in the past is also um that a real becomes a double precision for instance if you set the dried compiler option right well that's the sort of things just about everybody i know uh despises and you really shouldn't allow that sort of thing to happen right okay so now let's move on to the use cases for pointer so what what would be a motivating use case for why you would prefer a pointer over an allocatable right um with pointers you can easily define recursive data types um if you think of linked lists or trees hash tables that sort of things those are quite often expressed in recursive data types and that's something you can do with pointers and i'm not sure whether it's allowed now with allocatables i think it is but i should check but certainly i have come up with one or two implementations of a recursive data type where i used allocatable but it's it's not quite as simple as with using pointers because you need an intermediate uh drive type you need an abstract type and then you can be either the base case or the recursive case and and then you can use just allocatables but but like you said it's not quite as straightforward and simple as using the pointers precisely and in in that case you um quite often want to uh to be able to point to something else um if you have a circular buffer for instance and of course it bites itself in the tail and that's easily done with a pointer you can't do that with an allocatable correct and um the literature on this sort of things is always um looking at pointers so it's it's easier to to use pointers in that case basically because everybody thinks of pointless point is done and sometimes like you say the language doesn't really allow it and it does feel a bit strange also to have a recursive data data type with an allocatable insight which is of the same type it it doesn't feel right um so that's that would be a case where points are certainly more attractive than allocatables i don't know if you could have a derived type that has an allocatable component that is of the same type but the way the way i've implemented it before is i have an abstract type one of the extended types has a component which is class of the abstract type right and the other is not right yeah so like there's a circular definition it's it's not necessarily a circular definition no no that's that's that's right no no it's yeah that that breaks the circulation yeah the circular reference there okay so now i want to i do want to kind of ask the question of what was the example that you mentioned that you got a question about that had to do with the difference between allocatables and pointers yes well um what happened was that this guy used a circular buffer the circularity wasn't wasn't essential but a linked list say [Music] and one um one implementation did work and the other one was giving uh strange results um core dumps or whatever and it turned out that um he was using a allocatable as a local variable to uh to allocate the memory he needed for storing in the list and then the list was pointing to that allocated memory but of course on return from the routine that allocated memory disappears right so in that case you really need a pointer right so you you use the pointer to allocate the memory you need and then you can give that um that memory to to the list which then keeps track of it right it's just that if you use uh allocatables a lot you tend to forget that this sort of things happen because they are so nice and it took me a while to figure out what was going on there it's really nice to have your memory cleaned up for you but at some time you don't want that memory cleaned up and then you forget exactly yes and i thought um maybe you can can solve that with a new veloc routine but that only works for allocatables not for allocators and pointers right okay this brings me to the question of how do you avoid memory leaks with pointers because it's at at some point somebody needs to clean up that memory and yes what are some tips or tricks that you might have for deciding when to clean up that memory that's a good question i think that if you use pointers in abstract data types like linked lists etc then it would be a good idea to have a finalizer for that um for that type of data um because the the finalizer can then clean up everything right um if you use um pointers for uh for work memory then most probably you are better off with allocatables because they are cleaned up automatically um if not then you should be very very careful about it um the language doesn't give you much much handles to to handle it in a more sophisticated way it's just like with the opening files to be well present careful but you take care of them yourself to close them when you don't need them anymore and they might get in the way same for pointers i don't know of anything which can um can help you with that other than say uh an encapsulation in a in a linked linked list of these or something like that so that's with finalizers you can um you can make sure that everything is cleaned up automatically that's the basic idea of finalizers of course right i've always been a little unclear on exactly the semantics around the finalizers i think because if if i have a return value from a function the does the finalizer get called on the return value because it goes out of scope yeah um i think so i think so so when i return a linked list from a function its finalizer would get called and clean up all the memory that i was trying to oh that's a good one that's a good one um gosh that that that's why i've always kind of been shy of pointers is like i'd like to encapsulate this into something and have a finalizer that cleans up automatically when i'm done with it but when i'm done with it isn't when it goes out of scope it's when i'm done with it it's a good one i i i expect so because um well basically you uh would um i'll take say you uh return an array so no points or something but just an array um then that the result gets copied into another array and that's uh original array should be cleaned up um if you have a derived type with a finalizer that you and you um return that that type performance from a function i'd say that something similar happens that's my understanding yes of course the default uh default assignment will take care of uh copying all the data exactly um now if you would have a function returning at the right time with a finalizer and you assign that fire pointer so the pointer the variable um yes pointer uh function that's the situation where we're looking for right now but no the situation i'm looking for is i'm not returning a pointer and so and so i'm trying to hide away the fact that i'm using characters so that so that the users of such a facility facility or library don't need to know that i'm using pointers but if my type has a finalizer that finalizer will be be called all over the place and how do i know how do i know which time the finalizer gets called that it's supposed to actually clean up the memory that's a good one i don't know that's certainly some use case which but you should uh you should examine right so my understanding is that if you're using pointers you really have to expose the fact that you're using pointers and be careful about deciding who quote-unquote owns that that data and and responsible for cleaning it up at some point yeah and that means that you want to uh to hide that as much as possible from the innocent user um and that's that's another reason for uh using allocatables whenever um whenever practical the most common example that i've seen of people doing these doing this is you just have a type bound procedure for whatever's hiding away the pointer and say when you're done with the data behind this please call the delete function otherwise there's a memory leak yes yes of course the encapsulating drive type uh could have that finalizer it's not a pointer um so there's no no chance that you do something [Music] inadvertently it could be a local local variable of that that derived type the derived type has the final knife so if you return from the function or the subroutine and that finalize it get called and everything is cleaned up that should be [Music] certainly the the the way it it should work but the the thing you you sketched yeah that's that's something which what you should should investigate i know there's there's have been a lot of troubles with uh finalizes implementing them in the compilers yeah this could be one of the reasons yeah it's it's nice that there is a place to put code that you'd like to run when something goes out of scope but going out of scope isn't always the time that you want to clean up whatever whatever that resource was exactly because you may be trying to pass that resource to somebody else yes and i can imagine two different two different scenarios here um if your drive type contains um allocatables it will probably be okay right because the okay tools with the default assignment will get copied right and certainly with the uh um and the automatic reallocation that should be should be okay if it does contain pointers um then of course you do have a problem right because the pointers will only get um the points on the left hand side will only get references to the to whatever the pointers i were referencing to right so in that case you get dimming pointers and that would call for a user-defined assignment for that particular type yep okay um is there any any other gatches or or interesting tidbits that you'd like to add um well an example i use in in my my courses on fortran is that pointers can point to well actually really weird things um array slices that go all over the place i've never had use for it but um it's a nice nice idea that you can do that sort of thing the other thing i can think of which i um uh i have illustrated in the past but not never actually used this is vector um illicits you know the concept not exactly right well normally you have a slice from 1 to 10 say but you can also specify an array of indices okay it can be quite useful um to uh to um to pack to select all the elements you want rather than via loop you just specify them by the vector let me type that in chess okay [Music] yeah that's a that's a use case that that i probably would have at some point but i've never used that trick this for instance these are vector vector indices right and what it does is it selects um elements one two four ten and three in that order and um puts them in the uh a okay that's something you can can use and of course that can also be an array name rather than this constructor uh-huh okay there's one more thing that i kind of wanted to talk a little bit pointer semantics and differences from other languages right yes fortran pointers are in some sense very different from c pointers yeah and to me the biggest difference is they're kind of default behavior right so in in a language like c where if i have variable equals something where the variable is a pointer that is doing that is changing where the pointer points to exactly is in fortran that is changing the underlying value yeah and so that's a that's a difference between the kind of kind of quote default behavior yes when as i understand it when this was designed they felt that copying the value was what's going to be more important than changing what the pointer is pointing to so changing what the pointers pointing to require special special syntax the the arrow whereas changing the value in the memory it is pointing to is just the assignments so it was a really uh deliberate um decision to do it this way right in a language like c you always have to use the star or the the percent to do things and i guess that that is because in c pointers are much much more important than they are in fortran right that's one thing the other thing is that pointers can only point to other pointers or to variables which have the target attribute right and the reason for that is that the compiler then can know in advance which which variables will be or maybe [Music] subject to this sort of operations which makes it possible to uh to do um optimizations which are not possible in c okay um for instance see you have to be aware that every variable can have an aliasing and fortune this is impossible [Music] unless you do something strong or something slightly dumb it's impossible unless a variable has a target variable very attributes if that's the case then the compiler knows it has to look out for for other things because it may have been pointed to right if not then it can uh optimize the use of registers and that's all things and you see that's impossible okay what one thing that i do do want to point out just for in case anybody who's listening later the target attribute does not necessarily mean that that data will live beyond the scope right just because you declare a variable and point a pointer at it doesn't mean that the underlying data will still be there when you leave scope nope there was a problem beau ran into i talked about okay i think that was all that i had thanks for the discussion that's very helpful for for me yeah and there's an interesting uh problem to sort out all right well thanks for your time and uh we'll talk again later on i'm sure sure well let me know what's uh what you can distill from this so i hope you guys found that conversation useful and enlightening it certainly was for me and i hope that answered most of your questions with regards to allocatable variables versus pointer variables and situations where you might want to use one over the other limit but let me know down in the comments if you have any additional questions if you'd like any other videos or explanations of different fortran features and make sure you hit that subscribe and the like button and click that bell icon to make sure you get notifications when we release new videos thanks guys
Info
Channel: Everything Functional
Views: 725
Rating: undefined out of 5
Keywords:
Id: hPBGpyX--W8
Channel Id: undefined
Length: 36min 44sec (2204 seconds)
Published: Fri Oct 02 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.