Friday 18 November 2022

Por si las moscas

Aprovechando que puedo usar la API de twitter para el bot del calendario, he creado un programita para descargarme las listas de seguidos y seguidores en twitter (muy modestas). Nunca se sabe. 

def write_file(tweepy_function, screen_name, file_name):
    #https://stackoverflow.com/questions/52450621/get-list-of-followers-and-following-for-group-of-users-tweepy
    csvFile = open(file_name+'.csv', 'w')
    csvWriter = csv.writer(csvFile)
    c = 1
    for page in tweepy.Cursor(tweepy_function, screen_name=screen_name, count=200).pages(50):
        for user in page:
            csvWriter.writerow([user.screen_name, user.id, user.url, user.location, user.followers_count, user.friends_count, user.description.encode('utf-8')])
            print(c, ',', user.name, ', @'+user.screen_name, user.url)
            c+=1
    csvFile.close()
    return None

def main():
    api = get_tweepy_api() #tweepy.API(auth)
    screen_name = "your_handle_without_@"
    write_file(api.get_followers, screen_name, 'followers')
    write_file(api.get_friends, screen_name, 'following')

Saturday 12 November 2022

Progress report

Writing a book takes time and while it has taken me a bit longer than expected, I am making progress. So much so that I am already thinking of printing it. However this post is not about content and rather about technical issues.

I found a print on demand service, very aptly called Book on Demand which seems reasonable. Truth be told, I did not spend a lot of time doing research comparing features and prices. For what I want (printing about five books for the family) any book service will do. This one offers what I assume is the usual selection of soft cover and hardcover, paper weights, color vs B&W and a long list of other options. You can pay for many things, or do it all yourself (my case). Not quite unexpectedly, my book has a ton of photos (more on that later), and it would be nice to print them in color, except that obviously not every page has a photo, only about one third of them. I am currently at over 300 pages (A5), and that´s a lot of pages to print black text in "color", which would be too expensive. Luckily, there is an option to enter the list of pages which should be printed in color, which reduces the price significantly.

The way I saw it there were two main options:

Option A: Finish the pdf and make a note manually of which pages have photos. This would take about ten minutes, maybe fifteen minutes?

Option B: Write some code that inspects the pdf and spits out the list of pages with photos. Who knows how long this takes, for sure more than 15 minutes as there are likely many unknown unknowns, plus I have no idea about structure of pdf files.

It goes without saying that I went for Option B. Adn even writing this post took more than 15 minutes.

On to the tech details. I ended up installing PyPDF2 and following this. But for some reason my pdf has many images in all pages (I generate it from LibreOffice, no idea if that makes a difference).

Since the easy way did not work, I did it the hard one, aka the fun way, opening the pdf file as a text file (ignoring utf-8 errors and not even bothering to uncompress the pdf file first) and peeking into it until I figured out the following.

There is an object which gives some information about pages in the pdf (my pdf currently has 317 pages). I just had to figure out what those numbers followed by '0 R' were about.

1487 0 obj
<</Type/Pages
/Resources 1524 0 R
/MediaBox[ 0 0 419 595 ]
/Kids[ 1 0 R 4 0 R 7 0 R 10 0 R 13 0 R 16 0 R...
56 0 R 60 0 R 64 0 R 69 0 R 74 0 R 78 0 R 81 0 R...
122 0 R 125 0 R 128 0 R 131 0 R 135 0 R 138 0 R...
...
1045 0 R 1048 0 R 1051 0 R 1054 0 R 1057 0 R 1060 0 R...
...1120 0 R 1123 0 R 1126 0 R 1129 0 R ]
/Count 317>>
endobj
990 0 R 993 0 R 996 0 R 999 0 R 1002 0 R 1008 0 R...

Turns out that searching for "images" I found the list of my inserted images. Something like 

34 0 obj
<</Type/XObject/Subtype/Image/Width 881 /Height 644 ... /Length 122470>>
stream
JFIF
O]NV
...

And putting two and two together I figured out that the first list has the first object that goes in that page, so first page has objects 1 to 3, second page objects 4 to 6, etc. I know that the first page with an image is the 11th one (at least in this iteration of the book), which would include objects 31 to 34 and, bingo, the first image is object 34. To do another I checked the last page with a photo in my pdf (288th page), and the object range also includes the object listed for the last image. So that seems to be it, actually a lot easier than expected.

This takes less than 30 lines of code, and this remind me of the Automate the Boring Stuff with Python book, by Al Sweigart. And yes, it took me more than 15 minutes, but I am sure I will make changes to the output pdf, and if nothing else, this has been way more satisfying than doing it manually. I don´t get to code any more at work and sometimes I miss it.


Sellos (VIII)


 

Ùltima entrada (por ahora) de esta serie programada con sellos de Italia, Grecia, España y Alemania

Friday 11 November 2022

Sellos (VII)

 


Sellos de Egipto, un montón de sellos de Grecia :-)

Thursday 10 November 2022

Sellos (VI)

 


Sellos de España, Suiza, Paises Bajos y Francia.

Wednesday 9 November 2022

Sellos (V)

 

Sellos de España, Singapur y Alemania . De estos me quedo con Mortadelo y Filemón y el de Feliz Año, al que la foto no hace justicia (en general todas las fotos dejan que desear)

Tuesday 8 November 2022

Monday 7 November 2022

Sellos (III)

 


Más sellos de Italia, Francia, India y Reino Unido

Sunday 6 November 2022

Sellos (II)




Más sellos, incluyendo algunos de sitios como Irlanda o Estados Unidos, y otros más cercanos como Illescas, Leganés o Salamanca.

Saturday 5 November 2022

Sellos (I)

 


Poniendo un poco de orden, me he entretenido en hacer fotos de algunos sellos del montón de cartas que tengo guardadas (no todas porque eso sería una tarea hercúlea, aunque quizás lo haga poco a poco). Por ahora he resistido la tentación de leer las cartas (algunas no he podido evitarlo, confieso).


Tuesday 1 November 2022

Playing with AI tools

Good ideas often come from exchanges with interesting people. In my case, I saw this tweet from Molinos last Summer and thought it should be relatively straight forward to do something similar as a DIY project, but instead of quotes with the time, quotes with the date. The only issue was how to figure out a quote for every day. Scraping huge amount of books is an option although, honestly, it looks like a bit of a daunting task. There are Wikipedia articles of things that happened on a given date. And of course there are always searches in Google that can be parsed. I was just finishing my holidays and I did not have neither the time nor the energy to try anything at the time, so I put it in the back of my mind.

Fast forward to today, I am on holidays and I recently saw a post of somebody who had added a GPT-3 function to a google sheet to autocomplete columns. I had the Eureka moment that GPT-3 could probably complete the task with an adequate prompt, so I decided to give it a try and was amazed how easi it was to accomplish something (not saying that the something was any good, keep reading). These are some of the autogenerated results for a few dates:

"The first of January is the day on which we all start afresh, with new aspirations and hopes for the coming year." - The Alchemist, Paulo Coelho

"Second of November, huh?"
-The Catcher in the Rye, J.D. Salinger

"It was the third of November. The wind was howling outside, and the rain was beating and splashing against the windows."
-The Adventures of Sherlock Holmes, Arthur Conan Doyle

"The Fourth of November is a date which we should never forget."
- A Tale of Two Cities, Charles Dickens

"Remember, remember the Fifth of November, the Gunpowder Treason and Plot, I know of no reason why the Gunpowder Treason should ever be forgot."
-V for Vendetta, Alan Moore

The one question I have though is if the quotes are real or if they are autogenerated but fake. A bit of googling did not really answer the question since just because you don´t find something it does not mean that it doesn´t exist. And of course there may be different versions of a book if it has been published multiple times. 


I have been changing the prompt to try and provide as clear instructions as possible. This is the latest prompt I am using with the davinci engine:

Quote from a book in the public domain including  either "first of November" or "November, 1st" and mandatorily using the exact text in the book. Next to the quote, add the title and author of the book from where it was extracted.  The quote must be less than 200 characters. If no exact quote available containing the required text, don´t invent a quote, just write "quote not found". Case of words is not relevant.


GPT-3 is not deterministic, so you can run several times the same prompt and see what comes out. For example, this is an interesting coincidence...

"First of November was a cold day."

The Tale of Peter Rabbit by Beatrix Potter


"First of November was a cold day."

The Call of the Wild, Jack London


"First of November was a cold day."

-The Adventures of Sherlock Holmes, Arthur Conan Doyle


The books and authors definitely exist. Whether the same sentence exists in all books, I don't know. I tweaked the prompt from "the exact text in the book" to "the exact text from the book" and got this.

"First of November, 1892," he said. "I shall never forget it."

The Adventures of Sherlock Holmes

by Arthur Conan Doyle


"First of November was come, and many little birds were dead; but Mrs. Partington said they might have died any day, so it was no matter."

The Life and Adventures of Mrs. Partington

by Kate Douglas Wiggin


I wonder if I should do a twitter bot similar to the CalendarPuzzle bot.


Edit: Instead of creating another bot, I just added the quote to the existing one. Let´s see how it works in the next few days.