**Difference between 1st and 2nd order algorithm:**

Any algorithm that requires at least one first-derivative/gradient is a first order algorithm. In the case of a finite sum optimization problem, you may use only the gradient of a single sample, but this is still first order because you need at least one gradient.

A second order algorithm is any algorithm that uses any second derivative, in the scalar case. To elaborate, Newton's step requires use of the Hessian which has second order derivatives i.e. $H_{i,j} = \partial_i \partial_j f(x)$.

In general, an algorithm can be classified as $n^{th}$ order optimization if it uses a tensor of rank $n$.

**Important Distinguishment:**

When an algorithm uses an *approximated* version of the second order information (Hessian) for optimization, it **may or may not** be a second order algorithm. I am emphasizing this because, for example, you may approximate the Hessian by the outer product of the gradient $H = \nabla f(x) \nabla f(x)^T$. In this case you are *only* using first order information, so this would be considered a first order algorithm. On the other hand, if you are approximating the Hessian by only computing the diagonal elements $\partial_i \partial_i f(x)$, this would be a second order algorithm.

The *order* of an algorithm can be thought to be the order of error resulting from approximation of the derivative. Recall the definition of a derivative:
$$\partial_i f(x) \approx \frac{f(x + \epsilon \Delta x_i) - f(x)}{\epsilon}$$

and the second derivative:
$$\partial_{ij} f(x) \approx \frac{\partial_j f(x + \epsilon \Delta x_i) - \partial_j f(x)}{\epsilon},$$

where $x_i$ is equal to $x$ only in the $i^{th}$ coordinate and $0$ elsewhere. Hence if you use an $\epsilon$ approximation for each derivative computation, you will get an $\epsilon^2$ error inherently in your optimization algorithm.

Because it is sort of included in your question, I would like to add that first order algorithms do not necessarily converge linearly. Stochastic Gradient Descent (SGD) converges sub-linearly and many second order algorithms do not converge quadratically.