Let us resume our formal study of quantum
mechanics by first asking if there are any questions and that will give us a starting
point. Last time we disused the uncertainty principle a little bit. We tried to point
out that quantum mechanics has, by its very nature a certain degree of uncertainty in
our ability to completely understand or completely specify numerical values for physical observables. Now of course we must quantify this and let
me say that only part of the statement is right where we often tend to say that quantum
mechanics is all about probabilities and that itís indeterminate and itís uncertain. The
rules for calculating probabilities are completely known. They are deterministic. There is nothing
uncertain about that. But the fact is quantum mechanics does say that associated with physical
measurable quantities; we have a certain intrinsic statistical nature which means that, you can
only talk in terms of probability distributions and all the other things associated with probability
distributions like average values, mean square values, standard deviations and so on. So
in that sense, it differs from classical Hamiltonian mechanics for instance, where we could specify
the state of a system by a very precise point and phase space. The idea of a point in phase
space is lost once you go to quantum mechanics and thatís the way things are. Now the frame
work in which you discuss a quantum mechanism is mathematical in its most elementary form. Itís more elementary than what you need for
Hamiltonian mechanics. We need to have fairly sophisticated concepts such as Poisson brackets,
symplectic structure and so on. In quantum mechanics, this is replaced by a very linear
kind of a structure where the states of systems are specified not by points in some phase
space but by an element of a linear vector space. This element is called the state of
a system and itís an abstract concept. The idea is that there exists something called
the state of a system which happens to be a member of a linear vector space. This state
of the system potentially can give you all the information you can find about the system.
Just as in classical mechanics, once I tell you that the system has n degrees of freedom,
you associate with it a 2n dimensional phase space, you find the generalized coordinates
of momenta. A point in that phase space tells you the state of the system specifying all
the qís and pís .you have to throw that out in quantum mechanics. Now the experiments which led us to this kind
of description, over the years grew and in the early days, the interpretation of quantum
mechanics posed a very severe problem. The formulation of quantum mechanics itself posed
a fairly deep problem but eventually things got ironed out and by 1926 or so, the pre-formalism
was replaced. Since that time, many layers have been added to it. But the original foundational
formalism, that was initiated in a very remarkable short time in the early 1920ís still remains
in place. So let me start by saying that a quantum mechanical
system is described or specified
by a state vector. I use the word vector because itís an element of a linear vector space
and a kind of symbols we would use for it changes from different books and authors.
But now there are standard symbols for these states. The one that I am going to use is
called Dirac notation and I will say lot about Dirac notation as we go along. It is denoted
by something like that. I will use Greek letters for these state vectors and I am going to
put these mysterious angular brackets here just to tell you that it is an element of
a liner vector space. Now whenever I say linear vector space, you
imagine a three dimensional Euclidean space, for example. The three dimensional Euclidean
space is a linear vector space. Therefore all the properties that we are going to talk
about for state vectors can be imagined by thinking about ordinary three dimensional
vectors and using vector algebra to add them. Now this state vector is a function of time.
Let me write psi of t with a capital psi. I will use capital for the moment. We will
see why I use a capital letter because I want to use a small psi for something else. In
order to find out what happens to the state as time goes along, we have to prescribe a
rule of evolution just as in classical physics we prescribed Hamiltonís equations of motion.
In exactly the same way, I am going to prescribe a rule for this psi of t and afterwards we
will come back and simultaneously we will interpret what the psi of t and how it gives
you information on various quantities and so on. But first a little bit of a digression
on linear vector spaces. This is essential because I will assume that you know about
linear vector spaces. So let me go through a small mathematical
digression on linear vector spaces. Essentially if you know about
matrices you already know all the mathematics you need. So let me reassure you that there
is not much left and all you need to know is how to handle matrices and that too square
matrix. Now let me formally define a linear vector space. I will leave out a few things
here and there but we will fill them in as we go along. So this contains a set of elements
which we called vectors
and I need a notation for these elements. Letís call this notation as psi, phi, chi
etc. I use Greek letters for these vectors. These are called ket vectors for a reason
which Iíll explain subsequently. What I use is a funny angle or bracket for it; just like
tensor notation in ordinary you can calculate where we have the summation convention and
the index notation. So it helps you do calculation very easily once you use this notation. In
fact half the battle is won if you express things in the right notation. And this is
exactly what the Dirac notation does. So it contains a set of elements among which you
define certain operations. The fundamental one is that of addition. So you say that you
add two vectors. Linear vector space V contains a set of elements.
So it says that if psi and phi are elements of V, then psi + phi is also an element of
V. So you add two vectors and you get another vector which also belongs to the same space.
Moreover this addition is associative, in the sense that you could add the vectors in
any order you like. It is also true that this addition is commutative. You can add them
in either order. Importantly, the liner vector space is set
of elements or vectors defined in a certain field and this field is a field of scalars.
So you simultaneously introduce a set of numbers or scalars which belong to a field. We donít
want to get into what a field is right now. The real numbers and complex numbers form
a field. Those quantities are denoted by a, b, c etc. these are elements of some field.
That field is generally the field of real numbers R or the field of complex numbers
C such that a (psi) is also an element of V. so you define multiplication by a scalar
and it still gives you a vector. Just like in an ordinary three dimension space, I have
a position vector r, twice r or thrice r exactly remains the same.This has all the obvious
properties namely, a times b (psi) is the say as (ab) times psi. We are not going to
worry at the moment about not commutative fields. So we are going to assume its real
or complex which is R or C. So that means that ab is same as ba. Then the further properties are a times (psi
+ phi) = a (psi) + a (phi). So all the obvious properties, that you know about multiplying
ordinary three dimensional vectors by real numbers are listed one after the other. There
exists in this space a special vector called a null vector. We are going to use this for something else
but right now when we are taking about what a null vector is, let me just write it as
ket 0. The reason I am hesitant to write it in this form is obvious. As you already know,
when we solve some problems in quantum mechanics, we will be looking at various states and one
of them is called the ground state. It could correspond to the lowest energy for this system
and occasional I will use this symbol 0 inside the ket for the ground state but thatís not
the null vector. This null vector is such that any vector phi + the null vector is still
equal to phi. Among scalars, there are two special scalars. There is a scalar where one times phi is of
course phi itself and zero times phi is a null vector. A null vector is a vector where
all of whose components are zero. The position vector at the orgin for example, is a null
vector in three dimensional spaces. So since 0 times any vector phi is a null vector sometimes,
what one does is, forgets about writing the null vector. One forgets about writing it
as ket 0 and just uses ordinary 0 for the null vector because of this relationship.
So you might as well just use ordinary 0 for the null vector but you shouldnít get confused
because if I add two vectors, I still get a vector.
On the other hand, if I write (vector phi + vector psi) =0, what I mean on the right
hand side is a null vector and not the scalar zero. But there should be no confusion because
it is obvious from this fact that I could as well just call this zero. So these are
all the properties that you need and since a runs over all reals for example, you can
add the idea of multiplying vector by -1 and then get - the vector. Now this set of properties defines a linear
vector space and there are many examples of linear vector spaces all around. They have
nothing to do with ordinary vectors and three dimensional spaces. These are abstract properties.
For instance, the set of real numbers itself is a linear vector space. It is clear that
these scalars would be the real numbers themselves and the vector would also be the real numbers.
When you add two real numbers, you get another real number and you have a null vector which
is this number 0 itself. R is a linear vector space. I will call this
LVS. The set of real number itself is a liner vector space. R2, points on a plane is a linear
vector space so Rn is also a linear vector space. In other words, the set of x1, x2,
x3 up to xn such that I define addition by saying if you add two vectors with components
x1, x2, x3 up to xn and y1, y2, up to yn, then the result vector is x1 + y1, x2 + y2
and so on. You add component wise. That itself forms a liner vector space. These are real
linear vector spaces. When I go over to complex vector spaces, I would permit multiplication
by complex numbers, for instance. Those are other properties that are used. They would
form a complex linear vector space. These are the only two kinds of linear vector spaces
we are going to look at. There are other instances which are not so
trivial. For instance, the set of all n by m matrices forms a linear vector space because
when you add matrices, you get another matrix. When you add things to a null matrix, nothing
happens to it. You multiply the matrices by scalar numbers; real or complex, you are still
in that n by m matrix space. So the set of n by m matrices is a linear vector space. There are other even more non-trivial examples
of linear vector spaces. For instance, the set of solutions of the equation d squared
x over dt squared + omega squared x =0. The set of solutions of this equation form a linear
vector space because itís a linear equation. There are primitive solutions to this. The solutions are e to the i omega t, e to
-i omega t. they are the linearly independent solutions to this equation. You can form linear
combinations of them and they still continue to be solutions. A very popular linear combination
of this is called the cosine of omega t. Another is called the sin of omega t. but any a sin
omega t + b cos omega t is also a solution and it satisfies all those axioms. So you
see linear vector spaces can be quite general. The elements could be numbers, vectors in
the ordinary space, vectors in n dimension Euclidean space, set of matrices and set of
solutions of some differential equation and so on. So the concept is very general and
extremely useful. So far, we have not introduced the
idea of a distance between two vectors. We
havenít introduced idea of the product of two vectors like a dot product. Those are
all add-ons which come later but a linear vector doesnít need any of those. A very
popular a way of representing ah elements of Rn is to write them in the form x1, x2
and to xn in ordered n- tuple of real numbers. Now to perform manipulations with this, itís
useful to represent this quantity in the following way. Write it as a column vector. Thatís
one more way of representing an element of Rn. Now since you already know matrices, the
moment you have a column vector, one is tempted to form a row vector and then multiply it
on the left hand side to form a scalar. So the idea of the scalar product of two vectors
emerges once you start putting this extra structure in. But itís not such a straight
forward matter. So letís see how to go about that. Notice in particular I said we have
not talked at all about the product of two vectors or the distance. Just as Rn is a linear vector space, itís
just the set of n-tuples of real numbers. We havenít said here is one element, there
is another element and whatís the distance between them. We havenít defined this quantity
of distance. Once you put a matrix, then this Rn becomes Euclidean space here. So the first
thing we want to do is to try so see whether we could introduce the concept of scalar product
among these vectors. To do that, you need to recognize the following. Let me for the moment drop this ket vector
notation and write phi, psi, etc as it is, as the elements of the linear vector space
for simplicity. Then I would like to associate with each vector a scalar. So I chose a particular
special member of this linear vector space, letís say phi. Letís take a special element phi and associate with each
of the vectors in this space, a scalar which I denote by (phi, psi). This is equal to some
scalar which depends on that special element phi and it is dependent on whatís out there.
I associate a scalar but I donít tell you at the moment how I associate a scalar. We
will come to that. For every element of this vector space, I have chosen a special element
phi and when I do something with that element and phi, I get a scalar number with a certain
number of properties. The properties are that (phi, a psi) is the same as a times (phi,
psi). (Phi, psi +chi) = (phi, psi) + (phi, chi). So it says when you are doing this associating,
if you take the sum of two vectors and you find the associated scalar, you might as well
find the associated scalar for each of the members of the sum and add them up. By postulate,
I say that this is equal to that and similarly if you multiply this psi by some scalar number,
you might as well have found this scalar and then multiplied it outside. So the moment
you put in these properties, then for every element here you can find such a scalar. The
set of these scalars over all the phiís here in the space form a linear vector space. The
scalar here is not a function of psi, in the sense of f(x) or something. It depends on
psi and the reference element phi. Now one can show that with these postulates, this
set of quantities form a linear vector space itself. well a function I want to reserve for things
where I can differentiate integrate you know continuity and so on I donít want to use
this is instead depends on this I have set of discrete elements So itís clear that itís
discrete set of elements and not a continuous function. We will define function more carefully
later on. And itís a linear function. The whole idea is linearity because of this prosperity.
Itís a linear function. It doesnít depend on some squaring this or cubing this or taking
logs or anything. And these forms themselves from a linear vector space and now you could
say why I should choose S phi. I choose another element and I compute the set of S chi and
so on. I choose all these chiís and all these put together form a linear vector space called
the dual of the linear vector space. So the set of linear functions is an LVS and itís called the dual of the
linear vector space. In other words given a linear vector space, there is a natural
mathematical way in which I can associate another linear vector space. So linear vector
spaces come in pairs the structures these axioms are seen to it that there come in pairs
of this kind. ya that ya because my do it but still its
still a linear vector space pardon me no the real numbers themselves form the linear vector
space as you know the complex numbers form a linear vector space
pardon me perfect psi but there are all isomorphic so this thing this set forms a linear vector
space by itself I havenít told you the rule at all. I am
just saying if you can associate with each psi a scalar by this specified rule, then
that forms a linear vector space. As far as the existence of this dual is concerned, we
donít need to know what the actual rule they have to satisfy is. How do you reconstruct
V? A very good question! Can I reconstruct V from this dual? We will answer this. We
will see by looking at examples. So itís a collection we always talk about Let me write
down what this is for Rn and then immediately you will see what this quantity is. There
is unique dual to ever linear vector space I am not proving all these theorems. My idea
is to show you the operational method by what I going to do Rn for instances. So these are
theorems which exist. Now the question is what the use of this is.
There is a natural way to associate a scalar with a vector. as you know in ordinary three
dimensional vector spaces, if youíre given vectors like a, b, c etc and asked to construct
a scalar from these vector, you do something called the dot product. So this quantity here
is the dot product. So we are heading towards a dot product of two vectors by associating
a scalar with a pair of vectors. So itís a bilinear operation. We are taking 2 vectors
and doing something to it to get a scalar. Now since S chi, etc themselves form a linear
vector space, it is convenient to say that this vector space here which consists of a
set of elements here is represented by writing this writing this psi in this form. This is
an element of V and by saying that this phi here on the left hand side in this bracket,
let me write it in this form and say that this is also a different kind of vector which
lives in a dual space. In other words, I have a set of scalars and
now I am saying for every for every number which is in this space they are created by
taking the original ket vectors elements of V, doing something to that using this reference
element phi. So this set of numbers I now write in a given notation instead of this
kind of bracket and I say this is an element of dual space and are called bra vectors.
These are called ket vectors This is the notation used in physics. So what
you have done is to take the set of linear functions here and replace them by an element
of a dual space phi and then you say this quantity here stands for the linear function
or the scalar. So that dual vectors are going to be written in a different form from the
ket vectors from V, so that I know that these belongs to V. This involves phi they can involve many other
vectors all those vectors all these guys I now say a represented by this phi here um
with chi phi for every element every element in the original vector space there is an corresponding
element in the dual vector space here such that I take one element from the dual space
I take one element from the original space and I form a bilinear combination to give
me a scale which satisfies these properties. I write x1, x2, x3Ö xn are elements of Rn.
if I construct the corresponding row vector
in this form, then I say all the vectors are
elements of the dual to Rn. Then there is a natural way to produce a scalar which satisfies
these properties which is matrix multiplication with the bra vector on the left and ket vector
on the right or the row vector on the left the column vector on the right. So itís immediately clear that x1, x2, x3Ö
xn with say y1 to yn gives a scalar. its x1 y1+x2y2+ etc up to xnyn. Itís intuitively
clear that the dimension of the dual space must be the same as the dimension of the original
space. Since I have said that itís in one to one correspondence, for every element in
the original vector space, there is a corresponding element in the dual vector space. And the
way to remember it is to say that one of them is represented by column vectors the other
by row vectors. I could have changed this notation. I could
have said the original space is represented by row vectors and the other by column vectors.
It would make multiplication a nuisance. You have to be careful how you multiply. Since
I multiply row by column and thatís the rule by matrices. So we will use this convention.
The further complication which looks a little confusing is that this space here is exactly
the same as the original space itself. It so happens that Rn is the same as its dual.
Still we will like to distinguish between the space and its dual and therefore I will
represent one of them by column vectors by and the other by row vectors. In fact there is an exact theorem which says every n dimensional
linear vector space is isomorphic to n dimensional Euclidean space. In other words you can think
of every n dimensional vector space in terms of column vectors and the corresponding row
vectors. its only when you go to infinite dimensional spaces that you run into some
technique which are non trivial but all finite dimensional spaces look exactly the same. You cannot define the multiplication
of two vectors belonging to the same vector space. So every time you take a dot product
of two vectors in ordinary three dimensional Euclidean spaces you are really taking one
element from the dual space one element from the original vector space and taking the dot
product. As you can see you cannot multiply two column vectors and get a scalar. To get
a scalar you need to have a row on the left and column on the right. Itís just that in
ordinary three dimension space these two are the same spaces. so one doesnít realize that one is doing
this but if you you have to write it out in terms of column vectors and row vectors, its
quite clear that you form a scalar or take a dot b. The way you write a dot b is a1b1+
a2b2 +a3 b3. If you represent a and b by column vectors, the a on the left hand side it has
to be a row vector. So the scalar product between these vectors
is only defined by taking one element from V tilda and one element from V. It will become
particularly important when you look at infinite dimension spaces then you have to very careful
that you just that. You can define the product of a vector with itself but itís a map to
something else. You can
define a V direct product with V. So I would
take a given vector and I take another vector from the same space and this would be mapped
from the original vector space. Thatís a different space all together. Its dimensionality
is higher. If each of this is n dimensional, then the dimensionality of the other space
is n squared. if you take ordinary Cartesian vectors with
components like a,b,c etc and I write components, the set of numbers ai bj is precisely this.
Itís an element of V cross V because if I take a1, a2, a3 and I consider this set of
numbers, ai bj have nine possibilities. So immediately you see that this quantity ai
bj is not an element of the original R3.its an element of (R3 cross R3). Itís a nine
dimensional space. I write it like
a vector b vector without dot or a cross. They are called Cartesian products or tensor
products. But we are taking about finding a scalar from these vectors and for a scalar,
you take an element of the dual, you take an element of the original space and you multiply.
Now the moment I do this, I also have the possibility of defining the inner product. It has the properties we wrote down for S.
the moment I do this, I can define the inner product of a vector with itself. Let us the
take the inner product of psi with itself. What would this be in order in three dimensional
euclidean space or n dimensional euclidean space? If psi is represented by x1, x2, up to xn
, then this quantity here is equal to summations i=1 to n, Xi squared. This corresponds to
the length of this vector. So if I say itís a vector in n dimensional space with component
x1 to xn then sum of the squares of the element is the square of the length of this vector
and because it is a sum of real number of positive quantities, it would be zeros if
and only if the vector is a null vector. So this is equal to 0 if and only if psi is the
null vector. We would like to preserve this property but then we said that these vectors
are completely general quanties and could in fact be multiplied by complex numbers.
Then this is no longer true. How would you preserve that? I define the element in the
dual space by taking the complex conjugate. So if these are not elements of Rn but are
elements of a general n dimensional vector space, to avoid confusion let me use some
other symbol for it. Lets call it alpha1 to alpha n. If this corresponds to psi, this
psi should really be alpha1*, alpha2 * to alpha n *. Now they are in good shape because
this corresponds to mod phi the whole squared. And you are guaranteed that it is zero if
and only if each of the alphas is zero. so we begin to see our first generalization that
if you are looking at complex vector space and you have an n dimensional space, the elements
are in a column vector in the dual space which are the bra vectors or the complex conjugate
transposes. So this is the reason why in matrix analysis complex conjugation is not a very
natural operation. Complex conjugate transposes is a natural operation and itís called the
Hermitian conjugate when you take complex conjugate transpose. This relation is still true but it also implies
that if you have a vector a on psi and this is equal to psi prime say, then for the corresponding
psi prime, you have to take each element of psi, write it as a column vector, multiply
by a and take complex conjugate transpose. So it is immediately clear that is equal to
a star on psi. So any scalar you multiply, if its multiplying an element of the dual
space, you start with an original vector multiplied by a scalar. You have a new vector. If you
first want to find its adjoint in the dual space, then take the adjoint of the original
vector and multiply by the complex conjugate of the number of the scalar and this will
satisfy all those associative properties and so on. The moment of you have this, you can
now start defining the distance between two vectors because you have the idea of a scalar
product. Notice that this rule also implies that if I take the complex conjugate of this
scalar phi with psi, this is equal to psi phi. So scalar products in general donít have
to be real. It could be complex numbers but the inner product of phi with psi is not the
same as the inner product of psi with phi because there is a complex conjugation involved
here. So thatís missing in real vector spaces and thatís why you write a dot b = b dot
a. thatís not true in general. a dot b is b* dot a. but in a real vector space, we are
multiplying only real numbers. So this complex conjugation doesnít make a difference. Now I would like to
define the norm of a vector
and I denote it like this. This is by definition,
half the positive square root of this positive number psi with psi. This is a non negative
number and itís equal to 0 if and only if psi is a null vector. Now we are going through things which are
exactly the same as what happens in ordinary three dimensional Euclidean space and those
concepts should remain when you generalize this system. Therefore one would like to have
a statement like psi + phi which is the norm of the length of the sum of the two vectors.
This obeys a triangle inequality. We know in ordinary three dimension space, the sum
of two sides of a triangle is greater than the third always unless a triangle collapses.
So this quantity is lesser than or equal to psi+ phi. You also know another thing that if you took
ordinary vectors a vector I dotted it with b, in the usual real vector space this quantity
is a b cos theta by definition. Theta is the angle between them and the magnitude of cos
theta is between - 1 and +1. Itís equal to +1 if the angle is 0 and -1 if the angle is
pi. Thatís the extent to which it can vary. Therefore it follows that the norm of the
magnitude of a dot b is less then or equal to the magnitude of a times magnitude of b.
thatís just a statement that the cosine has a valve between -1 and +1. Now thatís generalized
to this and the statement is that
phi with psi
mod square is less then equal to the norm of phi with phi psi with psi. Since this is
a complex number in general, we need mod squared. This has a name and is called the Cauchy-Schwarz
Inequality. We use it very extensively to establish the uncertainty
principle. At the mathematical level it follows from the Cauchy Schwarz inequality. Now you
could ask when is this inequality an equality. This is only when a and b are in the same
direction or antiparallel. In other words when, they are collinear. The same thing is
true here and what does it mean to say two vectors are collinear. It means one of them
is a scalar multiple of the other. The direction is the same. In exactly the same way, this
thing becomes an equality if and only if phi for instance, the ket vector phi is in the
same direction as the ket vector psi. In other words, its just psi multiplied by a number.
What does it mean by linearly dependent? Psi itself is said to be made up of adding several
vectors. Every vector can be decomposed to other vectors. So phi is linearly dependent
on psi. Then the Cauchy Schwarz inequality becomes an equality. Otherwise it remains
strictly less than this quantity. We will see how powerful this statement is. Just to
give an example from a way out from this whole thing. You take the gas in this room. it obeys
a Maxwellian distribution of velocities and you can compute the average speed. So lets
compute for instance, the average speed and letís called it v. if I compute this quantity
v, this depends on the square root of temperature. Its some root kt over m where m is a molecular
mass, k is Boltzmann constant multiplied by some number. You could also ask what about 1 over v. what
about the average value of the reciprocal of this velocity the of the speed? Itís clear
that 1 over the average is not the same as the average of 1 over the speed and in this
case, it is strictly greater than 1. This can be shown very trivially by using the Cauchy-Schwarz
inequality. Itís a one line proof and we will do that at some stage. So just to show
you this inequality which starts off very innocuously with just the scalar product of
two vectors in ordinary space as profound implication its part of a much deeper fact.
You can generalize this to a set of n vectors at a time and arbitrary n number. That brings
us to the concept of linear dependence which will then bring us to the concept of basis
set in this vectors space, expansions basis sets, orthogonalization so on. We will talk
about that next time. Thank you!