Code Monkey home page Code Monkey logo

flights's Introduction

I'm looking for maintainers. Open Issue โ€บ



flights (fast-flights)

The fast, robust, strongly-typed Google Flights scraper (API) implemented in Python. Based on Base64-encoded Protobuf string.

$ pip install fast-flights

Basic

To use fast-flights, you'll first create a filter (inherited from ?tfs=) to perform a request. Then, add flight_data, trip, seat and passengers info to use the API directly.

Honorable mention: I like birds. Yes, I like birds.

from fast_flights import FlightData, Passengers, create_filter, get_flights

# Create a new filter
filter = create_filter(
    flight_data=[
        # Include more if it's not a one-way trip
        FlightData(
            date="2024-07-02",  # Date of departure
            from_airport="TPE", 
            to_airport="MYJ"
        ),
        # ... include more for round trips
    ],
    trip="one-way",  # Trip (round-trip, one-way)
    seat="economy",  # Seat (economy, premium-economy, business or first)
    passengers=Passengers(
        adults=2,
        children=1,
        infants_in_seat=0,
        infants_on_lap=0
    ),
)

# Get flights with a filter
result = get_flights(filter)

# The price is currently... low/typical/high
print("The price is currently", result.current_price)

Information: Display additional information.

# Get the first flight
flight = result.flights[0]

flight.is_best
flight.name
flight.departure
flight.arrival
flight.arrival_time_ahead
flight.duration
flight.stops
flight.delay?  # may not be present
flight.price

Useless enums: Additionally, you can use the Airport enum to search for airports in code (as you type)! See _generated_enum.py in source.

Airport.TAIPEI
              |---------------------------------|
              | TAIPEI_SONGSHAN_AIRPORT         |
              | TAPACHULA_INTERNATIONAL_AIRPORT |
              | TAMPA_INTERNATIONAL_AIRPORT     |
              | ... 5 more                      |
              |---------------------------------|

Cookies & Consent

For EU regions, if you didn't consent to Google's Terms of Service, you'll ultimately get blocked. You can use the built-in Cookies class to pass through this check:

from fast_flights import Cookies

cookies = Cookies.new(locale="de").to_dict()
get_flights(filter, cookies=cookies)

See issue #1

Allow Looping Last Item

In some rare cases, looping into the last item (internally) would lead to an unknown exit. If you believe your computer is a good boy, disable this restriction by adding the dangerously_allow_looping_last_item option:

get_flights(filter, dangerously_allow_looping_last_item=True)

About Preflights

We may request to the server twice as sometimes the initial request would not return any results. When this happens, it counts as a preflight agent and we'll send another request to the server as they build data. You can think of this as a "cold start."


The documentation was here. Who the hell moved it?!


How it's made

The other day, I was making a chat-interface-based trip recommendation app and wanted to add a feature that can search for flights available for booking. My personal choice is definitely Google Flights since Google always has the best and most organized data on the web. Therefore, I searched for APIs on Google.

๐Ÿ”Ž Search
google flights api

The results? Bad. It seems like they discontinued this service and it now lives in the Graveyard of Google.

๐Ÿงโ€โ™‚๏ธ duffel.com
Google Flights API: How did it work & what happened to it?

The Google Flights API offered developers access to aggregated airline data, including flight times, availability, and prices. Over a decade ago, Google announced the acquisition of ITA Software Inc. which it used to develop its API. However, in 2018, Google ended access to the public-facing API and now only offers access through the QPX enterprise product.

That's awful! I've also looked for free alternatives but their rate limits and pricing are just ๐Ÿ˜ฌ (not a good fit/deal for everyone).


However, Google Flights has their UI โ€“ flights.google.com. So, maybe I could just use Developer Tools to log the requests made and just replicate all of that? Undoubtedly not! Their requests are just full of numbers and unreadable text, so that's not the solution.

Perhaps, we could scrape it? I mean, Google allowed many companies like Serpapi to scrape their web just pretending like nothing happened... So let's scrape our own.

๐Ÿ”Ž Search
google flights api scraper pypi

Excluding the ones that are not active, I came across hugoglvs/google-flights-scraper on Pypi. I thought to myself: "aint no way this is the solution!"

I checked hugoglvs's code on GitHub, and I immediately detected "playwright," my worst enemy. One word can describe it well: slow. Two words? Extremely slow. What's more, it doesn't even run on the ๐Ÿ—ป Edge because of configuration errors, missing libraries... etc. I could just reverse try.playwright.tech and use a better environment, but that's just too risky if they added Cloudflare as an additional security barrier ๐Ÿ˜ณ.

Life tells me to never give up. Let's just take a look at their URL params...

https://www.google.com/travel/flights/search?tfs=CBwQAhoeEgoyMDI0LTA1LTI4agcIARIDVFBFcgcIARIDTVlKGh4SCjIwMjQtMDUtMzBqBwgBEgNNWUpyBwgBEgNUUEVAAUgBcAGCAQsI____________AZgBAQ&hl=en
Param Content My past understanding
hl en Sets the language.
tfs CBwQAhoeEgoyMDI0LTA1LTI4agcIARIDโ€ฆ What is this???? ๐Ÿคฎ๐Ÿคฎ

I removed the ?tfs= parameter and found out that this is the control of our request! And it looks so base64-y.

If we decode it to raw text, we can still see the dates, but we're not quite there โ€” there's too much unwanted Unicode text.

Or maybe it's some kind of a data-storing method Google uses? What if it's something like JSON? Let's look it up.

๐Ÿ”Ž Search
google's json alternative

๐Ÿฃ Result
Solution: The Power of Protocol Buffers

LinkedIn turned to Protocol Buffers, often referred to as protobuf, a binary serialization format developed by Google. The key advantage of Protocol Buffers is its efficiency, compactness, and speed, making it significantly faster than JSON for serialization and deserialization.

Gotcha, Protobuf! Let's feed it to an online decoder and see how it does:

๐Ÿ”Ž Search
protobuf decoder

๐Ÿฃ Result
protobuf-decoder.netlify.app

I then pasted the Base64-encoded string to the decoder and no way! It DID return valid data!

annotated, Protobuf Decoder screenshot

I immediately recognized the values โ€” that's my data, that's my query!

So, I wrote some simple Protobuf code to decode the data.

syntax = "proto3"

message Airport {
    string name = 2;
}

message FlightInfo {
    string date = 2;
    Airport dep_airport = 13;
    Airport arr_airport = 14;
}

message GoogleSucks {
    repeated FlightInfo = 3;
}

It works! Now, I won't consider myself an "experienced Protobuf developer" but rather a complete beginner.

I have no idea what I wrote but... it worked! And here it is, fast-flights.


Contributing

Yes, please: github.com/AWeirdDev/flights


flights's People

Contributors

aweirddev avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

flights's Issues

Any request returns - RuntimeError: No flights found. (preflight checked)

Steps Performed:

  1. New, clean conda environment
  2. pip3 install fast-flights
  3. Copied example code
  4. Ran example code

Output:

/Users/ollman/Library/Python/3.9/lib/python/site-packages/urllib3/__init__.py:35: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020
  warnings.warn(
Traceback (most recent call last):
  File "/Users/ollman/flight-track/app.py", line 26, in <module>
    result =get_flights(filter)
  File "/Users/ollman/Library/Python/3.9/lib/python/site-packages/fast_flights/core.py", line 125, in get_flights
    return get_flights(
  File "/Users/ollman/Library/Python/3.9/lib/python/site-packages/fast_flights/core.py", line 133, in get_flights
    raise RuntimeError(
RuntimeError: No flights found. (preflight checked)
Possible reasons:
- Invalid query (e.g., date is in the past or cannot be booked)
- Invalid airport

Attempted Fixes:

  • Found japan_scan.py from #1 , same result.
  • Performed uninstall and reinstall of selectolax, same result.

Cookies & parse_response issues with V1

V1 works the same using the v0 functionality, but when I try the new formatting for the get_flights, two issues:

  1. the cookies=Cookies.new() line, the Cookies reference is not working
  2. the parse_response function is breaking somehow, it's trynig to split a none value.

I'm testing it in an isolated environment with a simple main script, and installed it with pip only here is my test code

from fast_flights import FlightData, Passengers, create_filter, get_flights

Create a new filter

filter = create_filter(
flight_data=[
# Include more if it's not a one-way trip
FlightData(
date="2024-09-02", # Date of departure
from_airport="RSW",
to_airport="DCA"
),
# ... include more for round trips and multi-city trips
],
trip="one-way", # Trip (round-trip, one-way, multi-city)
seat="economy", # Seat (economy, premium-economy, business or first)
passengers=Passengers(
adults=1,
children=0,
infants_in_seat=0,
infants_on_lap=0
),
)

Get flights with a filter

result = get_flights(
filter,
dangerously_allow_looping_last_item=True,
cookies=Cookies.new().to_dict(),
currency="USD",
language="en"
)

The price is currently... low/typical/high

print("The price is currently", result.current_price)

Display the first flight

print(result.flights[0])

No results in sample code

Hi,

First of all well done - awesome project!

I get the following error when I try to run the sample code:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[2], line 31
     28 print("The price is currently", result.current_price)
     30 # Display the first flight
---> 31 print(result.flights[0])

IndexError: list index out of range

result is Result(current_price='', flights=[])

plz halp :)

P.S. tried adding you on discord but your id isn't working. DM me on twitter (just followed you) - I'm looking to use this API for something fun :)

No results for multi-city

from fast_flights import FlightData, Passengers, create_filter, get_flights

# Create a new filter
filter = create_filter(
    flight_data=[
        # Include more if it's not a one-way trip
        FlightData(
            date='2024-06-19',  # Date of departure
            from_airport="IST",
            to_airport="LAX"
        ),
        FlightData(
            date='2024-07-09',  # Date of departure
            from_airport="NYC",
            to_airport="IST"
        ),
        # ... include more for round trips and multi-city trips
    ],
    trip="multi-city",  # Trip (round-trip, one-way, multi-city)
    seat="economy",  # Seat (economy, premium-economy, business or first)
    passengers=Passengers(
        adults=1,
        children=0,
        infants_in_seat=0,
        infants_on_lap=0
    ),
)

result = get_flights(filter)

# The price is currently... low/typical/high
print("The price is currently", result.current_price)

# Display the first flight
print(result.flights[0])

returns no flights however if I manually use

https://www.google.com/travel/flights?tfs=GhoSCjIwMjQtMDYtMTlqBRIDSVNUcgUSA0xBWBoaEgoyMDI0LTA3LTA5agUSA05ZQ3IFEgNJU1RCAQFIAZgBAw%3D%3D&hl=en&tfu=EgQIABABIgA

which is the generated link there are flights

I think the problem is results take about 2-3 secs to populate but response.get returns almost immediately without waiting

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.