The original document was written in Word format in 2005-06 and there are a lot of mathematical symbols, equations and special characters. It's a non-trivial task to have it converted to HTML while maintaining its integrity, hence the pictures below (for time-being, hopefully).
Page 1
Page 2
Page 3
Page 4
Page 5
Page 6
Page 7
Page 8
Page 9
Page 10
Page 11
Page 12
Page 13
All the things of this world can be put into two categories—the observers and the observed’s. And they are related through observation. The reason behind the origin of physics is as follows.
When the observers observe (whatever they observe) and communicate with each other, they find that “there are many things which appear same to many of them”. So the natural question they ask is—“Have we observed the same things? If yes, it means that we’ve not dreamt at least. But then why did these same things appear different to some of us?”
Physics starts with a yes to the first question, i.e. it asserts that the observers do observe the same things—the things which are there—the things which have got a real existence. These things or phenomena are called physical realities or physical laws. Then what about the second question? Physics answers—yes there are physical realities to observe but observation doesn’t include the observed only; it includes the observer as well. And observers can use different methods to observe which causes them to make different opinions about the observed. Thus the aim of physics is:
1) find the real laws; and
2) find the laws due to which the real things or laws appeared different to different observers (i.e. the transformation laws).
Now we start keeping things into order. There are some things we find never changing no matter who observes it. We call them the invariants. Rests are the variants and we hope to find the laws responsible for their variation. Now the basic thing an observer needs to observe or measure is a coordinate system in space. So it is quite probable that it might be the different coordinate systems used by different observers that they find the same things different. Let’s try if we can find ways to express what we’ve measured (by using a coordinate system) in a way which is independent of the coordinate system.
We define a field existing in a chunk of space if with every point of this chunk, we can associate a number, or two numbers, or a number of numbers. If we associate a number with every point of space and assert that these associated numbers will be same even if we rotate or displace or in any way transform our axes, we say that we’ve associated a scalar with every point of space, or that we’ve asserted the existence of a scalar field; a scalar is just an invariant number. Instead if we associate three numbers with every point of space, we get, say, a triplet field.
Now if we use these three numbers to denote a direction (which we can always do in 3-D space), we get a piece of straight line; and these three numbers always combine to give a scalar which is the length of the line. This combination is x2 + y¬2 + z2 when the axes are orthonormal. We call this piece of line a vector and the combination we’ve got the square of the magnitude or of length of the vector. Since a vector has a fixed length, it is said to be an invariant. Thus any triplet field (i.e. in which we associate three numbers with every point) becomes a vector field if the three numbers are to denote a direction in space (which is another way of saying that the numbers are independent).
We now summarize and continue.
A normal scalar (i.e. in three dimensions) is invariant under coordinate rotation (in fact under any coordinate transformation). That is, its magnitude remains the same.
A normal vector behaves similarly.
Different observers have different coordinate systems, which means their coordinate systems are transformed with respect to one another.
It suggests that if we can write a physical law in a vector (or a scalar) form then the equation of the law will remain unchanged upon coordinate transformation. Hence it can be said that the law remains same for different observers who only differ in their use of coordinate systems.
It turns out that we can almost always do that, i.e. we are able to write the physical laws in vector (or scalar) form; for instance, .B = 0, or x E = −∂B/∂t. It is probably because there is no preferred direction or position in space, i.e. all directions and positions are equivalent.
Let’s therefore do some vector analysis.
Let’s think about a field V = r. It is a vector field since with every point, we associate a vector r (which is a vector since it has got a direction and an invariant magnitude). Then what is V = v = dr/dt? It is vector field since dr/dt has a direction and an invariant magnitude (both dr and dt have fixed magnitudes, dt is a scalar). And finally what about V = r.v (where r.v is defined to be equal to rxvx + ryvy + rzvz and known as the dot product of two vectors)? r.v is easily seen to be the projection of either of the vectors on the other. Now upon a coordinate transformation, the magnitudes of r and v remains unchanged as well as their relative direction, hence the projection of one over the other must also remain invariant. Thus V is scalar field. Thus we conclude that any field which is given by a function of one or more scalars or vectors or dot products of vectors is also an invariant field, i.e. is either a scalar or a vector field.
Thus the electric field E = constant (i.e. scalar) . r/r3 is a vector field; and the potential field = constant/r is a scalar field.
Now consider a domain of space where the three following fields co-exist:
V1 = vx
V2 = vy
V3 = vz
These are obviously not scalar fields. Of course we can get a vector field from them simply by writing V = vxi + vyj + vzk, but we don’t want to do that, say, because we don’t want the x- component of the vector field to be given by vx, but by a combination of vy and vz (This can happen of course, e.g. in the case of magnetic force the x-component of force is not given by jx, but by jy and jz). Now let’s consider another three functions:
W1 = xvy – yvx
W2 = yvz – zvy
W3 = zvx – xvz
These also seem to be non-scalars. But if we calculate the quantity (xvy – yvx)2 + (yvz – zvy)2 + (zvx – xvz)2, we find it to be a scalar. Thus W1, W2 and W3 can be regarded to denote a vector field and we can write Wz = xvy – yvx and etc. So we get a sort of required field, at the expense of adding terms x, y and z, i.e. the terms of r. Before discussing anything further, let’s first create a shorthand notation for creating the W field out of the V and r fields. Let’s write:
W = r x V
to denote that whenever we’ll see this expression, we’ll understand the form of W. This is called the cross product of two vectors. We can easily see that i = j x k & etc. Further investigation shows that:
a x ( b x c ) = b (a.c) – c (a.b)
and also that |b x c| = |b| |c| sinθ
Just like the any other vector is the vector , which has dx, dy, dz in the denominator, contrary to dr. Now as the direction of dr/dt is the direction of maximum change of dr with a little change in time, similarly the direction of is the direction of maximum change of the numerator with a little change in space. is written as = i∂/∂x + j∂/∂y + k∂/∂z. If we calculate .r, we find it equal to 3 which is a scalar. This confirms that is a vector. when used in conjunction with a scalar is known as gradient operator, when with a vector, as divergence operator. What we get as results are known as the gradient and divergence respectively.
It’s very easy to confirm that we can also use in cross products with other vectors. Let’s investigate the normal cross products a little more; we’ve to justify the formation of all this cross product theory. Another example when a force field in a certain direction is given by the other two perpendicular components of velocity, is the case of rotation. Let a particle be in pure uniform rotation (only to simplify) in the xy plane. Then r is perpendicular to v. Let a vector Lm = r x v. Then in our case, Lmz = xvy – yvx and Lmx and Lmy are zero. Lmz can easily be shown to be a constant (if we turn to spherical coordinates, Lmz = r ∂θ/∂t). And we also see that the (centripetal) force can be found as: F = a constant times v x L. Thus we find that a component of L gives an indication of the rotational motion in the plane of other two dimensions. This is the significance of the cross product. If a cross product term has a component in x- direction, it means there is some sort of rotational motion in the yz plane, i.e. some sort of motion in which force is perpendicular to velocity.
Same is the case with the magnetic forces. If there are two parallel wires along the x- axis, i.e. the charges have a velocity along the x- axis, the forces between them will be in yz plane. This is analogous to rotational motion. And it indicates that the force should be given by something like F = constant times v x a vector. This unknown vector is what we term as the magnetic field B. And it also seems clear that B should be a cross product term given by B = r x A, where A is something proportional to v of the other charges. It turns out that it’s not r but which is to be multiplied to A to get B. Thus
F = constant . v x B
B = x A, and
A = constant . j
A is termed as the vector potential. The laws of electrodynamics (away from dielectrics and paramagnetics) are:
.E = ρ/ε0, x E = – ∂B/∂t, .B = 0, x B = µ0j +1/c2 ∂E/∂t
which can now be written in potential forms:
2 – 1/c2 ∂2/∂t2 = – ρ/ε0, 2A – 1/c2 ∂2A/∂t2 = – j/ε0, with
the condition: c2 .A + ∂/∂t = 0 (implied by the conservation of charge)
These equations can be solved for and A.
The E and B can be found from and A as:
E = – – ∂A/∂t and B = x A
Let’s return to the main topic now.
In all our analyses we find that a physical quantity can change in only two ways—it can change with a change in space or with a change with time. To bring these two types of changes on the same footing (which may facilitate us to write our laws in a simpler way), we try to regard time as just another dimension of space.
First thing we become interested in is to ask—what is the analog of a point of space in time? The answer is simple—it is a moment of time. Next question—what is the analog of a straight line in time? For this we’ll have to analyze a straight line in space. It is basically a curve of which the slope remains constant (dy/dx or dx/dy = constant). Hence for a straight line in 4-D, things like kdt/dx or dx/kdt should remain constant (k is just a constant used to make the units consistent, we can take it equal to 1 as we do between other axes) But dx/dt is nothing else but velocity. Thus we see that if there is point moving in the x-direction with a constant velocity, it is actually drawing a straight line in the t-x plane. A special case is when this constant velocity is zero, or the slope of kdt/dx becomes infinite. Then by analogy with 3-D, it should be equivalent to the axis of time, the t-axis.
So if there is a point ‘O’ at rest, which we call the observer henceforth, holding a 3-D orthonormal grid of axes, we shall say that the observer is actually holding a 4-D orthonormal grid, though we don’t see the fourth axis. And if the observer starts moving at a uniform velocity along an axis, it simply means that the time axis of the grid has rotated by a certain angle towards that axis. This angle can be easily found to be equal to v/k. But what about other axes, or at least the axis along which there is the motion. Well, we can only say at this point that that axis should remain unchanged by making an analogy with 3-D, where we can rotate any axis arbitrarily and independently of the other axes. Now we wish to find the transformation law of the rotation of the time axis just like the laws we have for other axes.
But before trying to find out the transformation law, we take a digression.
Suppose we’ve done some experiment, made our measurements, and derived a physical law. Now if we rotate one or more of the axes of our coordinate system and do the same experiment, we get some other measurements. But if we try to derive the physical law, we find that we get the same physical law. Thus we assert that the laws of physics remain invariant upon the rotation of axes of the observer. Now we are trying to put time also as one of the axes in 4 dimensions. So now we should assert that if we rotate one or more axes in 4 dimensions, laws of physics will remain same. But we’ve just seen that rotating the time axis is equivalent to saying that the observer is moving with a uniform velocity. So we should assert that the laws of physics are same for all observers moving with uniform velocity. This is the first postulate of Special Relativity. And it is equivalent to saying that time can be treated just like any other dimension.
Now in physics, we are always free to put postulates. We can make any absurd looking postulate, may be just for fun, or may be on the call of our intuition. The only thing which is expected is that the postulate should predict some consequences which can be tested by experiment, and which if found true will indicate that our postulate was right.
One such postulate is that there is an ultimate velocity with which the objects can move. Now this postulate combined with the postulate that laws of physics should be same for all uniformly moving observers, we find that the value of the postulated ultimate velocity should be same for all observers. (That ultimate velocity should be same for all observers, however, doesn’t mean that any velocity which is same for all observers is the ultimate velocity. There can be more than one ‘same’ velocities, but there is only one ultimate same velocity).
Well there is one such velocity, at least, which turns out to be the same for all observers (by experiment as well as by the investigation of other physical laws). This velocity is the velocity of light. So far we’ve not come across any other velocity which turns out or which should turn out the same for all observers. So we modify our second postulate (that there is an ultimate velocity) and assert now that there is an ultimate velocity and this is the velocity of light. This is the second postulate of Special Relativity. We shall later find it equivalent to saying that the dimension of time is not completely independent of other dimensions.
So we take ‘c’ (the velocity of light) to be the ultimate velocity. Let’s see whether this gives us any formulae whatsoever. Consider two persons Joe & Jane; Joe is at rest in a verandah and Jane is moving uniformly in a car. Now assume Jane is carrying two parallel mirrors, one near his head and the other near his feet. Between these two parallel mirrors, there is a light beam going back and forth. t' is the time measured by Jane’s clock for light to go up and come down once, and t is the time measured by Joe’s clock for the same.
We call t' the proper time of a frame, i.e. time observed by a moving observer, and denote it by ∆t0, and we shall call t as the time observed by an observer at rest and denote it by ∆t'.
Hence ∆t' = γ ∆t0 ——————————— M
Now let the moving observer send a light ray forward in his car and measure the time for ray to reach back to him. If distance from the front end of the car to himself is L0, then
∆t0 = 2L0/c,
while the observer at rest will observe the same happening; if for him the distance of front end of the car from the moving observer is L', then
c∆t1' = L'+v∆t1' & c∆t2' = L'-v∆t2'
Thus ∆t' = ∆t1' + ∆t2' = 2 L'c/(c2-v2)
These two equations imply:
L' / L0 = (c2-v2)∆t'/c2∆t0
= (1-v2/c2) γ
=1/ γ
Hence L' = L0/γ ——————————— N
So we get two equations for transforming lengths and time intervals for a moving observer from those of a stationary observer.
All this digression we made was to keep pace with the physical discoveries, so that we don’t end up with a very sound, mathematically complicated theory of rotation of time axis, but which however is very different from physical observations; particularly when we do have some experimental evidence indicating some sort of relationship between time and space. Now we turn back to our aim of finding the transformation laws for a rotation of time axis. We now know that it should be such that equations M and N are satisfied.
We know the equations for 3-D rotations are linear. So let’s try a linear transformation here. Also note that if we assert the coincidence of the origin of time axis with that of all the other axes, it means that at t = 0, both the observers were at the same point of space, i.e. that at t = 0, origins of both the frames coincided. And also we don’t expect to have any changes in the y- and z- axes both according to our postulates as well as in comparison to 3-D. Hence we write:
Thus we’ve finally got the transformation laws which rotate the time axis towards any other axis and vice-versa (and which are consistent with our postulates, which in turn are consistent with at least one experimental finding—the constancy of c). Note that these are quite different from the 3-D rotation transformation law. These are linear of course but the angles involved here are not cosθ or sinθ. So we see that we can treat time as another dimension but with a little difference. Another noticeable thing here is that that the if t-axis is rotated towards the x-axis, then the x-axis can not be left like that or rotated arbitrarily; but it will also rotate by the negative of the same angle (this can be seen by sketching the found functions).
t t'
θ
x'
θ x
A little reflection will show that this restriction upon the x-axis is given by the second of the postulates of the Relativity. This is clear because, with this restriction, the t-axis can be rotated up to a certain limit whence it lies completely upon the rotated x-axis, and this is the maximum slope of the time axis, which is equivalent to the assertion of an ultimate velocity. Thus we see that the dimension of time is not completely independent of other dimensions.
So much for the derivations of the formulae. Now we would like to derive some consequences. First thing we note that if we calculate t2 - x2 or more generally t2 - r2 where r2 = x2 + y2 + z2, it comes constant for different observers. So it is very much like ‘length’ squared in 3-D which was found to remain constant upon coordinate rotation. Hence it is a sort of physical reality (remember the first postulate—the physical laws or reality should remain invariant upon coordinate rotation). We call this quantity the square of the ‘interval’. So if there are two events (events are the points of 4-D) with an interval ‘s’ between them, then any other uniformly moving observer will also find the same interval between them, though the components may change, i.e. the time interval and the spatial distance may be found different. So we find at least one physical law consistent with our postulates (besides the constancy of c, of course). It turns out that there are many such other 4-D invariants each of which represents a certain physical law or reality just like the ‘interval’. And the best thing is that the laws of physics appear to be more symmetric and obvious when written in terms of these 4-D invariants. These 4-D invariants are otherwise known as 4-Vectors and we are all set now to start their analysis. But before doing that let me tell you another interesting thing. We know that rotations are not the only transformations in 3-D geometry. There are curvilinear coordinates as well and we’ve transformation laws for them also. And when we try to expand the concept of curvilinear coordinates to 4 dimensions, it sprouts into another beautiful theory which we call the General Theory of Relativity. Well, we’ll continue with 4-Vector analysis.
Noting that the units of space and time are different, we first multiply time by a constant = c (so that we can apply rules of algebra between x and ct), and then take the value of c = 1. This will make the equations look simpler.
Now we define a four vector as something which maintains its magnitude upon rotating the axes in 4-D (the relative direction will of course change with respect to the axes). Note that we’ve not defined the magnitude of a four vector yet.
At least one thing we know up to now which maintains something constant—the interval s, where |s|2 = t2 – |r|2. So we call it a four vector and t2–|r|2 its magnitude Making an analogy with the 3-D, we write |s|2 = s.s. This at once compels us to define cross product as:
s1.s2 = s1ts2t – s1xs2x – s1ys2y – s1zs2z ,
where s1t = t1, s1x = x1 etc.
We can generalize it for any pair of four vectors:
a.b = atbt – axbx – ayby – azbz
So we’ve a four vector in 4-D. But if we divide a normal vector by a scalar, what we get is another vector. Applying it to 4-D, it means if we divide the vector ds by a scalar, say dt (the differential time interval of the observer), we should get another vector, i.e. four velocity of a particle. But wait a minute. Is dt really a scalar? No, it’s not—its value changes in different coordinate systems, i.e. for different observers, as:
dt = dt0/√(1 - v2)
where dt0 is the ‘proper differential time interval’ of the moving particle. Thus what is scalar is not dt, but dt0 which is equal to dt√(1 - v2). Hence we divide ds by dt√(1 - v2) to get the four velocity, u.
Thus:
ut = 1/√(1 - v2), ux = vx/√(1 - v2), uy = vy/√(1 - v2), uz = vz/√(1 - v2)
This is our second four vector. Now we multiply this by m0, another scalar to get four momentum, p of the particle (or of the centre of mass of a system of particles). Henceforth we shall write aμ to represent all the four components of a four vector where μ runs from t to z. Thus
pμ = m0uμ .
Thus the invariance of magnitude of four momentum implies that terms like
[m0/√(1 - v2)]2 – [m0v/ √(1 - v2)]2
should remain constant for different observers. Now remember that the interval s2 = t2 - r2 is an invariant for different observers. But let there be only one observer who, however, rotates only his spatial axes. Then for him, r is separately constant, which means that t is constant as well for him (since s2 is always a constant). But rotating only spatial axes simply means that the observer doesn’t change his velocity. Thus for an observer moving with uniform velocity (or for different observers moving with same uniform velocity), which can be zero as well, t and r are separately invariant.
Therefore we should infer similarly that for such an observer, m0/√(1 - v2) and m0v/√(1 - v2) of a system of particles will remain invariant separately. This smells like our known conservation laws. At least the latter term is similar to our 3-D momentum, which we have believed to be an invariant, except for the √(1 - v2) term. Thus our belief has been proved wrong. But we can make a conservative, and fruitful, approach and assert that the 3-D momentum is still a conserved quantity, the only thing is that there is a slight change in the definition of momentum. It is not equal to m0v, instead equal to m0v/√(1 - v2). Or we can be still more conservative if we say that the momentum is conserved and it is still given by mv, but m is something different from m0, in fact m = m0/√(1 - v2). Thus m = m0 only when velocity of the particle is zero. That is, when the observer is at rest relative to the particle (or the system of particles). Thus this m0 can be called as the rest mass of the particle (the mass of the particle measured by a co-moving observer).
Now what about the first term of the four momentum? We note that m0/√(1 - v2) ≈ m0 + ½ m0v2 or m – m0 ~ ½ m0v2 with our new notation for mass. Thus m - m0 is approximately equal to the expression of normal kinetic energy. So we again change color like a chameleon, and assert that the kinetic energy is always conserved, only use the correct expression for that, which is T = m – m0 and not ½ m0v2. Then what should we call the first term, m? It’s simple. We can call it the total energy E of the particle.
Then what will E be equal to if the particle is at rest? It will be equal to m0 simply. But what does that mean? Is mass equivalent to energy? Does mass increase with velocity? Yes it seems so. But we know it was only to write the definitions of momentum and energy more conveniently that we used the altered definition of mass. It may be that mass is always = m0 but the definitions of momentum and energy are the different ones. The point here is that what do we understand by mass. If it’s not the gravitational mass but the inertial mass only that we are concerned with, then we can safely go with the altered definition of mass (and not the momentum and energy). The reason is this that we can measure the inertial mass only by changing the momentum of the particle. The difficulty in changing the momentum of the particle is, as we see now, due to three terms: m0, v and √(1 - v2). Whenever we’ll do such an experiment, we’ll always have these three terms. Even if we are concerned with the energy-change only, we’ll have √(1 - v2) stuck with m0, although we don’t have v there. Hence we can always blend m0 with √(1 - v2) and form a single entity m. There is no way to tell whether they are separate or not. In fact if we measure the mass of a fast moving particle by making small change in its momentum and if we have no knowledge of Special Relativity, we will surely say that the mass of the particle is m (and not any other thing like m0). Thus we are correct when we say that the mass of a moving particle increases with velocity, since: 1) we have no way to tell whether it happens or not; and 2) it makes other things simple and obvious. And now once we accept that there is an increase in the mass of a body due to an increase in its velocity, which is only possible by doing a work on the particle, i.e. by supplying energy to the particle, then we’ll have to accept that the increased mass is nothing else but the energy gained by the particle. This is again due to the fact that we can never separate the increased mass from the supplied energy. If we take the energy back, the mass will again reduce. So, at least, the increased mass is nothing else but energy. But then to treat the rest mass as something different from energy seems awkward, particularly when in all the other phenomenon, e.g. in showing inertia both the increased and the rest masses behave exactly (we can’t say anything the gravitational effect). So we treat rest mass also as energy, and call it the rest energy of the particle, as opposed to the kinetic energy. Well, if we can do an experiment in which the rest mass is converted to other forms of energy, or in which we can see the gravitational effects of the increased mass, our belief in the equivalence of mass and energy, and in the increase of mass with velocity will become concrete.
Let’s continue with four vectors. Another one is got if we multiply the four velocity by a scalar, the rest charge density of a volume element of charge, ρ0,
to get the four current density, jμ:
jμ = ρ0uμ, or
jt = ρ = ρ0/√(1 - v2), jr = ρv = ρ0v/√(1 - v2).
Still another one is: Aμ = 0uμ, or
At = = 0/√(1 - v2), A = v = 0v/√(1 - v2)
We will now define a four operator as an operator which when applied to a scalar or a four vector, will give another scalar or four vector. Recall that to get four velocity from the differential interval, we divided it by dt√(1 - v2). It seems that d( )/dt√(1 - v2) is a four operator. Let’s find the value of ds.
ds2 = dt2 – dx2 – dy2 – dz2, but
dx = vxdt, dy = vydt, dz = vzdt, hence
ds2 = dt2 – v2dt = dt2 (1 - v2)
Hence ds = dt√(1 - v2)
But ds is an invariant; thus we see why dividing by dt√(1 - v2) yields a four vector. Thus d( )/dt√(1 - v2) is a four operator.
Now we wish to find the analog of 3-D gradient, . A little investigation shows that the four gradient can be defined as:
μ = (∂/∂t, – ).
This can be easily seen as follows. Let’s define μ = (∂/∂t, ). Now we can transform any μaμ for a moving observer (where ‘a’ is a scalar function) since the four terms of μ contain ∂t, ∂x, ∂y, ∂z in the denominator and we know the transformation laws for each of them. If we think ‘a’ to be only dependent upon x and t, we can simplify our calculations. We can now hold x constant, i.e. ∆x = 0.
Thus ∆a = (∂a/∂t) ∆t.
Whereas for a moving observer, ∆a = (∂a/∂x') ∆x' + (∂a/∂t) ∆t'.
On putting the values of ∆x' and ∆t', we find that
(∂a/∂t') – v(∂a/∂x’) , a similar calculation shows that (∂a/∂x') – v(∂a/∂t’)
√(1 - v2) √(1 - v2)
whereas for any other four vector ‘b’, we find that
bt' + vbx' , and bx' + vbt'
√(1 - v2) √(1 - v2)
This discrepancy in the signs shows that if we want μbμ to obey Lorentz transformation laws, we will have to define μ as μ = (∂/∂t, – ) and not (∂/∂t, ).
Now we can find four divergence:
μbμ = ∂bt/∂t – ( – ∂bx/∂t) – ( – ∂by/∂t) – ( – bz∂/∂t)
Thus μbμ = ∂bt/∂t + . b
And finally the four Laplacian:
μ μ = (∂/∂t)(∂/∂t) – (–∂/∂x)(–∂/∂x) – (–∂/∂y)(–∂/∂y) – (–∂/∂z)(–∂/∂z)
Hence μ μ = ∂2/∂t2 – 2 = □2 ………. another notation.
Thus Maxwell’s equations, in potential form, become, along with the charge conservation law:
□2Aμ = jμ/ε0, μAμ = 0.
This is the whole of electrodynamics. But we can make more advancements, particularly in the understanding of E and B.
We recall at this point our definitions of fields. If with every point in space, we associate a scalar, it’s a scalar field; if we associate a vector with every point, it’s a vector field. A vector is something having three numbers (in three dimensions, four numbers in four dimensions etc.) which however produce a scalar. Now we can go further. We will associate nine numbers with each point of 3-D space, more generally n2 numbers in n dimensions and denote them by Tij where i, j run from 1 to n. We call Tij a tensor of second rank in n dimensions. Similarly for tensors for higher ranks. Now we can see that a tensor of 0 rank is scalar in n dimensions, thus an invariant. Similarly a tensor of 1st rank is a vector in n dimensions, since it has n numbers associated with it, and thus it is also an invariant. But nothing can be said about the invariance of higher ranks. They can be or they cannot be. For e.g. consider a tensor Txy = axby – bxay, where a and b are two vectors. From the definition of Txy it is clear that Txy = – Tyx and Txx = 0 (such a tensor is called an antisymmetric tensor). Thus out of total n2 components of T, there remain only (n2 – n)/2 independent numbers. If we put n = 3, we get only 3 independent numbers. Thus although we’ve attached nine numbers with every point of space, we need remember only three. These three can be used to represent a direction and hence this particular 2nd rank tensor behaves like a vector (a 1st rank tensor) in three dimensions. Consider another tensor in 3-D: Txy = 0 and Txx = Tyy = Tzz = axbx + ayby + azbz. Thus out of nine numbers we’ve attached to every point of space, we need remember only one and hence this particular tensor behaves like a scalar. Thus we see that if there are two vector fields coexisting in space such that the resultant effect is obtained by inter-multiplication between their components, then the resultant can be anything, i.e. a scalar, a vector, or just a tensor of 2nd rank. The dot and the cross products are defined so as to be the simplest examples of first and second respectively.
One may ask at this point that what is the point of bringing in the term ‘tensor’ itself if we can have another more comprehensible approach, which is this—Suppose we know that there are nine numbers attached to each point in space. Then in place of saying them a tensor, we can say that there are three vectors attached to each point of space. The answer is this that we can do that (and we do that sometimes) for our comprehension at the most, but then for every law which occurs anywhere involving these nine numbers, we’ll have to write three separate equations, in spite of the fact that we can generate all these nine numbers out of one definition of the tensor. Also the fact that they can be formed out of one definition indicates that they are actually components of one thing instead of three separate things (vectors). Note that we can do the same for vectors—we can regard the three components of a vector to be three different vectors associated to the same point; but this only increases the paperwork.
Now what will be a 2nd rank tensor in 4 dimensions? Simply it will be something which associates 16 numbers to each point in space (which is now space-time). Let’s add that the tensor is antisymmetric. Now we can associate 6 independent numbers to each point. Let’s try to define such a tensor. Consider, for any four vector A, a tensor Fij with a definition:
Fxy = ∂Ax/∂y – ∂Ay/∂x (with the consistent extension: Fty = ∂At/∂y + ∂Ay/∂t )
i.e. Fµν = µAν – νAµ
(Note that in 3-D with normal vectors we would have written it as F = x A)
Thus it is an antisymmetric tensor of second rank. Now if we let this vector A to be the four potential A, then we find:
Fxy = ∂Ax/∂y – ∂Ay/∂x = – Bz, etc. and
Fty = ∂At/∂y + ∂Ay/∂t = ∂/∂y + ∂Ay/∂t = – Ey etc.
So we find that the electric and the magnetic fields are both components, or parts, of ‘an antisymmetric tensor of second rank in four dimensions’.
And now we can find its transformation laws, i.e. the laws to find each of the six components of Fµν, in another set of four axes with rotated time dimension, if we know them in one set. This can be done very easily since we know the transformation laws of all the terms used in the definition of Fµν. And this will mean we’ve found the transformation laws for finding E and B for observer moving at a uniform velocity relative to another if we know them for the latter. These laws are:
Ex′ = Ex Bx′ = Bx
Ey – vBz By + vEz
√(1 - v2) √(1 - v2)
Ez + vBy Bz – vEy
√(1 - v2) √(1 - v2)
Or in easy to remember form,
E′|| = E|| B|| = B||
E┴ + v x B B┴ – v x E
√(1 - v2) √(1 - v2)
Thus we see that the six components of the tensor Fµν can be organized into two 3-vectors, E and B which, however are not four vectors and hence vary when we do a rotation of time axis. But we notice that E2 – B2 does remain constant and if we choose to call it the magnitude of our particular tensor, the tensor becomes an invariant.
Now we’ve reached to the last thing of our discussion—to define the correct equation of force, i.e. to correct F = dp/dt so that it becomes a four invariant (so that all uniformly moving observers should find it true). We’ll have to correct it because:
1) now by p we mean pμ ( = mv), the four momentum which is a four vector,
2) dt is not an invariant, hence we should replace it by ds = dt√(1 - v2).
3) the left hand side of the equation is also not a four vector, it contains only three terms.
The three space terms of F can be corrected by dividing each of them by √(1 - v2) so that they correspond to the three space terms of dpμ/ds. And the time term of Fμ should be equal to dpt/ds = dpt/dt√(1 - v2), i.e. it should be equal to the rate of change of (relativistic) energy divided by √(1 - v2), i.e. F.v/√(1 - v2). So our four force becomes:
fμ = ( F.v /√(1 - v2) , F/√(1 - v2) )*
So the correct equation is:
dpμ/ds = fμ
This equation can be understood as follows. When we apply a force on a particle, its momentum increases as well as the energy (due to the space and time components of force respectively). But besides this the ratio of energy and momentum also changes. In other words, a force increases a (real) thing called four momentum of a particle of which the two components are energy and momentum. But besides that, the distribution of magnitude of four momentum into energy and momentum also changes. This is brought about by the use of ds in place of dt.
Now we’ll correct F = q (E + v x B). This can be done easily, and we get:
fμ = quνFμν
* The √(1 - v2) terms appears in two ways. If it appears in a transformation law, v means the velocity of the other observer with respect to the first one. But this term can also appear when we write a four vector in the 3-D terms we know, and then v means the velocity of the particle etc. This is the case here.
keeping in mind the summation convention, i.e. if in a product a subscript occurs twice, we’ll have to run that subscript from 1 to n simultaneously wherever it occurs ( n = number of dimensions) and sum over the terms so obtained, with the correct usage of signs of terms of course.
So all of our mechanics and electrodynamics is this:
m0d2xμ/ds2 = fμ = quνFμν where Fµν = µAν – νAµ with □2Aμ = jμ/ε0 and μAμ = 0
Even this equation has the redundant terms Fμν and A, which however simplify the writing of the equations.
That concludes our discussion. A last remark is that the appearance of simplicity in the equations of electrodynamics in the four vector notation is not surprising. Actually it was found even before Relativity that Lorentz transformations make the electrodynamical equations invariant or simpler. So, in a sense we’ve accepted these transformations only so that these equations look simpler, by giving the excuse of Relativity. But the thing which is surprising is that many other laws do seem to become simpler or invariant when written in these notations, which confirms our belief in the Theory of Relativity. So we move ahead with full zeal and generalize the Special Theory of Relativity by allowing for any transformation of the four axes, instead of just rotations and translations. And it turns out that the postulate of ‘equivalence of gravitational and inertial mass’ plays the same role now as the postulate of ‘ultimate-ness of light’ played in the Special Theory.