For pandas, would anyone know, if any datatype apart from
(i) float64
, int64
(and other variants of np.number
like float32
, int8
etc.)
(ii) bool
(iii) datetime64
, timedelta64
such as string columns, always have a dtype
of object
?
Alternatively, I want to know, if there are any datatype apart from (i), (ii) and (iii) in the list above that pandas
does not make it's dtype
an object
?
Best Answer
pandas
borrows its dtypes fromnumpy
. For demonstration of this see the following:You can find the list of valid
numpy.dtypes
in the documentation:pandas
should support these types. Using theastype
method of apandas.Series
object with any of the above options as the input argument will result inpandas
trying to convert theSeries
to that type (or at the very least falling back toobject
type);'u'
is the only one that I seepandas
not understanding at all:This is a
numpy
error that results because the'u'
needs to be followed by a number specifying the number of bytes per item in (which needs to be valid):To summarise, the
astype
methods ofpandas
objects will try and do something sensible with any argument that is valid fornumpy.dtype
. Note thatnumpy.dtype('f')
is the same asnumpy.dtype('float32')
andnumpy.dtype('f8')
is the same asnumpy.dtype('float64')
etc. Same goes for passing the arguments topandas
astype
methods.To locate the respective data type classes in NumPy, the Pandas docs recommends this:
Output:
Pandas accepts these classes as valid types. For example,
dtype={'A': np.float}
.NumPy docs contain more details and a chart: