The automated posts of random photos started well, however I wanted to have customized messages for each photo. That means writing a short message for each one. Not difficult, but tedious. Enter Gemini and flet.
I used this prompt to create a skeleton script of what I wanted. Of course the script did not work out of the box even after a few attempts. Still, good enough to start the debugging and make something useful.
I would like to create a flet app which allows the following:
-select folder with images (jpg and png file formats). Do not use filedialog for this. Use filepicker
-for each photo, the user can create image captions which are saved in separate text files (same name as photo, just with .txt extension)
I ended up with less than 80 lines of code to create this mini tool, which made the whole task a lot easier. At some point I may give a try to one of the image captioning ai models in Huggingface. Just not sure I want my photos captioned without supervision and the posted.
PS: I downloaded the Salesforce/blip-image-captioning-base model from Huggingface and gave it a try. Considering the low effort the hit rate is not bad. Still, not something I would trust being posted without supervision. The automated text generated for the five photos above was:
a photography of a rose
a photography of a road in paris
a photography of a clay sculpture of a man with a boxing glove
a photography of a jet engine
a photography of the tokyo tower
PPS: For comparison, I got myself an API key for Google Gemini and used "Describe this photo. The resulting text must have at most 250 characters." Wow. This is not even the latest model, as I used Gemini 1.5 Flash
A single, red rose, partially open, is shown against a blurred green background. The rose's petals are a blend of deep red and lighter pink, showcasing delicate texture. The stem and leaves are visible.
A high-angle, long shot reveals a wide, tree-lined avenue cutting through a lush green park. Vehicles travel along the road, leading toward a distant cityscape under a cloudy sky. A television tower is visible on the horizon.
A small ceramic figurine, resembling a gnome, sits on a white surface. It has a large red nose, brown body, and a dark green hat. The background is plain white.
Black and white photo of a large turbofan engine. The engine's fan blades are prominent, filling most of the frame. It's displayed indoors, in what appears to be an airport terminal.
Tokyo Tower, a red and white lattice tower, rises above lush green trees under a partly sunny sky. The tower is partially obscured by the foliage.
P³S 5th Feb: Posts will now have the AI generated description as Alt text
No comments:
Post a Comment