Comments (13)
I'm also seeing problem using load_table_from_dataframe with datetime columns. If I specificy the schema as part of the LoadJobConfig (specifying the column as DATE, it works fine.
However if I fail to specify the schema, it identifies the column as DATETIME and creates invalid dates:
Row | startDate
1 | 0021-04-25T22:13:48.360448
2 | 0021-04-25T22:13:48.360448
3 | 0021-08-13T11:42:29.163840
As opposed to :
Row | startDate
1 | 2017-07-01
2 | 2017-07-01
3 | 2018-01-01
Pandas dtype is datetime64[ns]
from python-bigquery.
I've been having the same issue with load_table_from_dataframe
since updating dependencies. I have uncovered the real problem. Some of my recent timestamps throw errors like this one:
Cannot return an invalid datetime value of 1575982560000000 microseconds relative to the Unix epoch
What's actually going on here is that I have a pandas DataFrame with values like this:
'2019-12-10 12:56:00', dtype='datetime64[ns]
which, in ns, is 1575982560000000000
.
In other words, somewhere between being a dataframe and ending up in BigQuery, the underlying value stored for this date is being interpreted as microseconds since the Unix epoch rather than nanoseconds!
I recently had an issue with the same code throwing errors after a pyarrow update, which might be related - I ended up fixing that one by rounding my timestamps to ms-precision before uploading to BigQuery (see https://stackoverflow.com/questions/53333742/schema-conflict-when-storing-dataframes-with-datetime-objects-using-load-table-f for an example).
EDIT: it appears that something has changed in terms of the BigQuery type being inferred from pandas. Previously, the type used was TIMESTAMP
. Now, it appears to be DATETIME
. I am trying to fix my specific issue by using a partial schema - in case it helps anyone else:
job_config = LoadJobConfig()
dt_columns = df.columns[df.dtypes == "datetime64[ns]"]
job_config.schema = [SchemaField(col, "TIMESTAMP") for col in dt_columns]
from python-bigquery.
@QuinRiva I have tried to reproduce the issue related to load_table_from_dataframe
but unable to do that, here is my sample code and dataframe:
contractId | contractTypeId | startDate | endDate | created
2 | 2 | 2019-12-06 | 2019-12-06 | 2019-12-06 17:09:11.317689+00:00
1 | 1 | 2019-12-06 | 2019-12-06 | 2019-12-06 17:09:11.317689+00:00
df.types
contractId int64
contractTypeId int64
startDate object
endDate object
created datetime64[ns, UTC]
Sample code:
from google.cloud import bigquery
client = bigquery.Client()
destination_table = bigquery.table('your-project.your_dataset.your_table')
client.load_table_from_dataframe (df,destination_table)
It creates table with the valid dates and datetime.
If it doesn't help than could you please share reproducible code?
from python-bigquery.
@tswast , @plamut please comment, is it related to pyarrow?
from python-bigquery.
Actually, let's specify a second issue for load_table_from_dataframe
. googleapis/google-cloud-python#9996
from python-bigquery.
@QuinRiva is the behavior of to_dataframe
any different when using the BigQuery Storage API?
You might also want to try setting the dtypes
argument to to_dataframe
. We don't have any logic to convert from the API response types to pandas dtypes
unless the dtypes
argument is specified.
from python-bigquery.
@tswast I expect the current behavior with the BigQuery Storage API that the dtype will still be "object" but at least with Arrow (and I assume Avro) the actual objects will python datetime.date object instead of strings.
@QuinRiva could you specify which dtype/object type you are expecting for the date columns?
from python-bigquery.
@emkornfield Another issue has filed by user #136 which is duplicate of this issue.
As per comment received from user on the issue, Manually setting the dtype argument works but user wants that automated.
from python-bigquery.
@HemangChothani I think this might be fixed after e75ff82 (is this in the latest release?). Otherwise I think it is possible this might be going through a different code path and not using pyarrow through the storage API (and not using arrow data so it still might be maintaining the string). pyarrow will covert date types to datetime.date
objects without intervention when using pa.record_batch.to_dataframe()
from python-bigquery.
@emkornfield I have tried with the master branch as feature has not released, but still to_dataframe
is the culprit.
pyarrow works fine as to_arrow
returns date32
datatype[pyarrow.RecordBatch dt: date32[day]]
but line given below convert to df and return df.dtype ['column'] = object
python-bigquery/google/cloud/bigquery/table.py
Line 1725 in de9998f
from python-bigquery.
@HemangChothani I expect it to be an object (but the object should be datetime.date). If conversion to dtype=datetime64 is desired behavior I think you need to call record_batch.to_pandas(date_as_object=False)
from python-bigquery.
@emkornfield I have completely ignored that and expect it to be a datetime64
, thanks for the clarification, is there any plan to expose this parameter for users, if not than can we close this issue as it's resolved?
from python-bigquery.
is there any plan to expose this parameter for users, if not than can we close this issue as it's resolved?
The parameter is public in the pyarrow as far as I can tell. It seems like at least some users of BQ expect date to become datetime64 I would therefor think it is a good idea to expose this as a parameter for BQ, but I don't know the exact policies that are followed.
from python-bigquery.
Related Issues (20)
- integrate with `lower-bound-checker` to ensure constraints files stay in sync with `setup.py` HOT 7
- Support inserting data into BigQuery directly from a Polars DataFrame HOT 4
- BigQuery Python client fails to query JSON field with heterogeneous data types
- Cannot append to REQUIRED field when using `client.load_table_from_file` without providing table schema HOT 7
- Should a rateLimitExceeded have a "429 Too Many Requests" instead of "403 Forbidden"? HOT 1
- snapshot_definition.snapshot_time throws exception
- Add a warning in the type mapping if no conversion function is found
- `table_constraints` has no setter?
- Retry didn't happen
- BUG: `read_pandas` Error: 'values' Attribute Missing in `ChunkedArray` HOT 4
- totalBytesProcessed is not exposed from RowIterator HOT 1
- BUG: `bq_to_arrow_field` in `_pandas_helper.py` always sets `pyarrow.field` to nullable
- [BUG] Undefined symbol in __all__
- RecursionError when pickling Table object
- Should a rateLimitExceeded have a "429 Too Many Requests" instead of "403 Forbidden"? HOT 2
- Cannot append to REPEATED field when using `client.load_table_from_file` with parquet file HOT 1
- ArrayQueryParameter.from_api_repr does not support empty array
- No retry on Impersonation Token Bigquery HOT 2
- support adbc APIs in DB-API package
- Select columns with `extract_table`? HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from python-bigquery.