I have extensive experience with Python and R.



The libraries/technologies that I use most often are:

NLTK – Useful for data mining, POS (part-of-speech) tagging, and syntactic parsing in my corpus research.

Pandas – Filtering and presenting results from language data mining.

Numpy – Used in conjunction with NLTK and Pandas for mathematical manipulations and for summarizing data with basic statistics.



I utilize this software environment to render complex statistical analyses, ranging from analysis of variance, regression analysis, PCA, to cluster analysis.

Following are recent articles that I have co-authored that showcase the types of analysis that I have conducted and reported.


Collentine, J. G., & Collentine, K. (2020). Organic models for measuring Spanish learners’ linguistic complexity. In Current Theoretical and Applied Perspectives on Hispanic and Lusophone Linguistics (pp. 39–62). Amsterdam: John Benjamins.


Collentine, K., & Collentine, J. G. (2020). A corpus analysis of the structural elaboration of Spanish heritage language learners. In Variation and Evolution: Aspects of language contact and contrast across the Spanish-speaking world (pp. 56–73). Amsterdam: John Benjamins.


Collentine, J. G., & Asención-Delaney, Y. (2020). L2 Discourse Functions of the Spanish Subjunctive. In Routledge Handbook of Corpus Approaches to Discourse Analysis (pp. 252–268). New York, NY: Routledge.