nlpconnect/vit-gpt2-image-captioning vs fxmarty/tiny-doc-qa-vision-encoder-decoder | What are the differences? | StackShare