API Streaming
Learn when you can stream back to your front end
Streaming in Tailwinds allows for real-time token delivery as they become available, enhancing the responsiveness and user experience of your AI applications. This guide will walk you through configuring and using API streaming with Tailwinds.
How Streaming Works
When streaming is enabled for a prediction request, Tailwinds sends tokens as data-only server-sent events as soon as they are generated. This approach provides a more dynamic and interactive experience for users.
Configuring Streaming
Here's how you can implement streaming using Python's requests
library:
Understanding the Event Stream
A prediction's event stream consists of the following event types:
start
Indicates the start of streaming
token
Emitted when a new token is available
error
Emitted if an error occurs during prediction
end
Signals the end of the prediction stream
metadata
Contains chatId, messageId, etc. Sent after all tokens and before the end event
sourceDocuments
Emitted when the flow returns sources from a vector store
usedTools
Emitted when the flow uses tools during prediction
Example of a Token Event
Best Practices
Error Handling: Always implement proper error handling to manage potential issues during streaming.
Buffering: Consider implementing a buffer on the client-side to smooth out the display of incoming tokens.
Timeout Management: Set appropriate timeouts to handle cases where the stream might unexpectedly end.
User Interface: Design your UI to gracefully handle incoming streamed data, providing a smooth experience for the end-user.
Last updated
Was this helpful?