Oh, I think I understand now.
Indeed, if for example f(x) = x^3, then f'(x)=3x^2, which we can write as (3x^2), a 1x1 matrix.
For f:R^n-->R^m, this notion is then replaced with the jacobian. Instead of a (1x1) matrix, we get a (mxn) matrix.
Aha, that's why the differential indeed generalizes that: when he computes the matrix associated with the differential in the special case of f:R^n-->R^m, he gets the jacobian.
Hum, so the question now is why is the jacobian the good generalization of the derivative. For example, if we take the function
f(x,y) = (x^2,y^3,x+y)
The jacobian matrix will be
J(x,y) =
(2x 0
0 3y^2
1 1)
So why does that relate to the derivative? I saw in the article at wikipedia that we should have
f(x,y) = f(x1,y1) + (J(x1,x2))(x-x1,y-y1) + o(||(x-x1,y-y1)||)
In that case it would indeed generalize the 1D case.
But I tried to work that out with that example and I couldn't.
I've actually had a course on multivariable calculus, but I fogot everything now

)
The book I'm reading is called "An introduction to manifolds" by Loring Tu. It's really manifolds for babies

) yeah, very elementary, I really liked it. I already read the first 22 chapters

)
Thanks again!